Commit Graph

249 Commits

Author SHA1 Message Date
Martin KaFai Lau
cdf3464e6c ipv6: Fix dst_entry refcnt bugs in ip6_tunnel
Problems in the current dst_entry cache in the ip6_tunnel:

1. ip6_tnl_dst_set is racy.  There is no lock to protect it:
   - One major problem is that the dst refcnt gets messed up. F.e.
     the same dst_cache can be released multiple times and then
     triggering the infamous dst refcnt < 0 warning message.
   - Another issue is the inconsistency between dst_cache and
     dst_cookie.

   It can be reproduced by adding and removing the ip6gre tunnel
   while running a super_netperf TCP_CRR test.

2. ip6_tnl_dst_get does not take the dst refcnt before returning
   the dst.

This patch:
1. Create a percpu dst_entry cache in ip6_tnl
2. Use a spinlock to protect the dst_cache operations
3. ip6_tnl_dst_get always takes the dst refcnt before returning

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-15 14:53:05 -07:00
Martin KaFai Lau
f230d1e891 ipv6: Rename the dst_cache helper functions in ip6_tunnel
It is a prep work to fix the dst_entry refcnt bugs in
ip6_tunnel.

This patch rename:
1. ip6_tnl_dst_check() to ip6_tnl_dst_get() to better
   reflect that it will take a dst refcnt in the next patch.
2. ip6_tnl_dst_store() to ip6_tnl_dst_set() to have a more
   conventional name matching with ip6_tnl_dst_get().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-15 14:53:04 -07:00
Tom Herbert
42240901f7 ipv6: Implement different admin modes for automatic flow labels
Change the meaning of net.ipv6.auto_flowlabels to provide a mode for
automatic flow labels generation. There are four modes:

0: flow labels are disabled
1: flow labels are enabled, sockets can opt-out
2: flow labels are allowed, sockets can opt-in
3: flow labels are enabled and enforced, no opt-out for sockets

np->autoflowlabel is initialized according to the sysctl value.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 17:07:11 -07:00
Tom Herbert
67800f9b1f ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel
We can't call skb_get_hash here since the packet is not complete to do
flow_dissector. Create hash based on flowi6 instead.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 17:07:11 -07:00
Martin KaFai Lau
b197df4f0f ipv6: Add rt6_get_cookie() function
Instead of doing the rt6->rt6i_node check whenever we need
to get the route's cookie.  Refactor it into rt6_get_cookie().
It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
percpu rt6_info later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25 13:25:34 -04:00
David Miller
79b16aadea udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb().
That was we can make sure the output path of ipv4/ipv6 operate on
the UDP socket rather than whatever random thing happens to be in
skb->sk.

Based upon a patch by Jiri Pirko.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2015-04-07 15:29:08 -04:00
Nicolas Dichtel
ecf2c06a88 ip6tnl,gre6,vti6: implement ndo_get_iflink
Don't use dev->iflink anymore.

CC: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-02 14:04:59 -04:00
Jiri Benc
67b61f6c13 netlink: implement nla_get_in_addr and nla_get_in6_addr
Those are counterparts to nla_put_in_addr and nla_put_in6_addr.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Jiri Benc
930345ea63 netlink: implement nla_put_in_addr and nla_put_in6_addr
IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:58:35 -04:00
Ian Morris
53b24b8f94 ipv6: coding style: comparison for inequality with NULL
The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x != NULL and sometimes as x. x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:51:54 -04:00
Ian Morris
63159f29be ipv6: coding style: comparison for equality with NULL
The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-31 13:51:54 -04:00
David S. Miller
0fa74a4be4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/emulex/benet/be_main.c
	net/core/sysctl_net_core.c
	net/ipv4/inet_diag.c

The be_main.c conflict resolution was really tricky.  The conflict
hunks generated by GIT were very unhelpful, to say the least.  It
split functions in half and moved them around, when the real actual
conflict only existed solely inside of one function, that being
be_map_pci_bars().

So instead, to resolve this, I checked out be_main.c from the top
of net-next, then I applied the be_main.c changes from 'net' since
the last time I merged.  And this worked beautifully.

The inet_diag.c and sysctl_net_core.c conflicts were simple
overlapping changes, and were easily to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-20 18:51:09 -04:00
Nicolas Dichtel
37355565ba ip6_tunnel: fix error code when tunnel exists
After commit 2b0bb01b6e, the kernel returns -ENOBUFS when user tries to add
an existing tunnel with ioctl API:
$ ip -6 tunnel add ip6tnl1 mode ip6ip6 dev eth1
add tunnel "ip6tnl0" failed: No buffer space available

It's confusing, the right error is EEXIST.

This patch also change a bit the code returned:
 - ENOBUFS -> ENOMEM
 - ENOENT -> ENODEV

Fixes: 2b0bb01b6e ("ip6_tunnel: Return an error when adding an existing tunnel.")
CC: Steffen Klassert <steffen.klassert@secunet.com>
Reported-by: Pierre Cheynier <me@pierre-cheynier.net>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-03-17 15:01:18 -04:00
Ian Morris
49e64dcda2 ipv6: remove dead debug code from ip6_tunnel.c
The IP6_TNL_TRACE macro is no longer used anywhere in the code so remove definition.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-24 16:25:25 -05:00
Nicolas Dichtel
1728d4fabd tunnels: advertise link netns via netlink
Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-19 14:32:03 -05:00
Ian Morris
e5d08d718a ipv6: coding style improvements (remove assignment in if statements)
This change has no functional impact and simply addresses some coding
style issues detected by checkpatch. Specifically this change
adjusts "if" statements which also include the assignment of a
variable.

No changes to the resultant object files result as determined by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-23 21:00:56 -05:00
David S. Miller
4e84b496fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-11-06 22:01:18 -05:00
Steffen Klassert
ea3dc9601b ip6_tunnel: Add support for wildcard tunnel endpoints.
This patch adds support for tunnels with local or
remote wildcard endpoints. With this we get a
NBMA tunnel mode like we have it for ipv4 and
sit tunnels.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 14:19:20 -05:00
Steffen Klassert
d50051407f ipv6: Allow sending packets through tunnels with wildcard endpoints
Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit
packets through an ipv6 tunnel. This capability is set when the
tunnel gets configured, based on the tunnel endpoint addresses.

On tunnels with wildcard tunnel endpoints, we need to do the
capabiltiy checking on a per packet basis like it is done in
the receive path.

This patch extends ip6_tnl_xmit_ctl() to take local and remote
addresses as parameters to allow for per packet capabiltiy
checking.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-06 14:19:19 -05:00
Steffen Klassert
6c6151daaf ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function.
ip6_tnl_dev_init() sets the dev->iflink via a call to
ip6_tnl_link_config(). After that, register_netdevice()
sets dev->iflink = -1. So we loose the iflink configuration
for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the
ndo_init function. Then ip6_tnl_dev_init() is called after
dev->iflink is set to -1 from register_netdevice().

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03 15:42:24 -05:00
Alexey Andriyanov
acf722f734 ip6_tunnel: allow to change mode for the ip6tnl0
The fallback device is in ipv6 mode by default.
The mode can not be changed in runtime, so there
is no way to decapsulate ip4in6 packets coming from
various sources without creating the specific tunnel
ifaces for each peer.

This allows to update the fallback tunnel device, but only
the mode could be changed. Usual command should work for the
fallback device: `ip -6 tun change ip6tnl0 mode any`

The fallback device can not be hidden from the packet receiver
as a regular tunnel, but there is no need for synchronization
as long as we do single assignment.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexey Andriyanov <alan@al-an.info>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-30 16:09:20 -04:00
Eric Dumazet
0287587884 net: better IFF_XMIT_DST_RELEASE support
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 13:22:11 -04:00
David S. Miller
739e4a758e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	net/netfilter/nfnetlink.c

Both r8152 and nfnetlink conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02 11:25:43 -07:00
Steffen Klassert
2b0bb01b6e ip6_tunnel: Return an error when adding an existing tunnel.
ip6_tnl_locate() should not return an existing tunnel if
create is true. Otherwise it is possible to add the same
tunnel multiple times without getting an error.

So return NULL if the tunnel that should be created already
exists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-28 16:19:46 -04:00
Ian Morris
67ba4152e8 ipv6: White-space cleansing : Line Layouts
This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-24 22:37:52 -07:00
Tom Gundersen
c835a67733 net: set name_assign_type in alloc_netdev()
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15 16:12:48 -07:00
Tom Herbert
cb1ce2ef38 ipv6: Implement automatic flow label generation on transmit
Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.

Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.

By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.

It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.

Performance impact:

Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.

Automatic flow labels disabled:

  TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

  UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

Automatic flow labels enabled:

  TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

  UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-07 21:14:21 -07:00
David S. Miller
54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
Tom Gundersen
f98f89a010 net: tunnels - enable module autoloading
Enable the module alias hookup to allow tunnel modules to be autoloaded on demand.

This is in line with how most other netdev kinds work, and will allow userspace
to create tunnels without having CAP_SYS_MODULE.

Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21 15:46:52 -04:00
Susant Sahani
c8965932a2 ip6_tunnel: fix potential NULL pointer dereference
The function ip6_tnl_validate assumes that the rtnl
attribute IFLA_IPTUN_PROTO always be filled . If this
attribute is not filled by  the userspace application
kernel get crashed with NULL pointer dereference. This
patch fixes the potential kernel crash when
IFLA_IPTUN_PROTO is missing .

Signed-off-by: Susant Sahani <susant@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 00:27:19 -04:00
Nicolas Dichtel
74462f0d4a ip6_tunnel: use the right netns in ioctl handler
Because the netdevice may be in another netns than the i/o netns, we should
use the i/o netns instead of dev_net(dev).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-16 15:16:02 -04:00
Eric W. Biederman
57a7744e09 net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irq
Replace the bh safe variant with the hard irq safe variant.

We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.

Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-14 22:41:36 -04:00
WANG Cong
1c213bd24a net: introduce netdev_alloc_pcpu_stats() for drivers
There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 15:49:55 -05:00
Li RongQing
d76ed22b22 ipv6: move IPV6_TCLASS_SHIFT into ipv6.h and define a helper
Two places defined IPV6_TCLASS_SHIFT, so we should move it into ipv6.h,
and use this macro as possible. And define ip6_tclass helper to return
tclass

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-15 15:53:18 -08:00
David S. Miller
56a4342dfe Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
	net/ipv6/ip6_tunnel.c
	net/ipv6/ip6_vti.c

ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.

qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-06 17:37:45 -05:00
Li RongQing
8f84985fec net: unify the pcpu_tstats and br_cpu_netstats as one
They are same, so unify them as one, pcpu_sw_netstats.

Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-04 20:10:24 -05:00
Li RongQing
abb6013cca ipv6: fix the use of pcpu_tstats in ip6_tunnel
when read/write the 64bit data, the correct lock should be hold.

Fixes: 87b6d218f3 ("tunnel: implement 64 bits statistics")

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02 19:37:21 -05:00
Florent Fourcot
3308de2b84 ipv6: add ip6_flowlabel helper
And use it if possible.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Florent Fourcot
37cfee909c ipv6: move IPV6_TCLASS_MASK definition in ipv6.h
Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-09 21:03:49 -05:00
Linus Torvalds
1ee2dcc224 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Mostly these are fixes for fallout due to merge window changes, as
  well as cures for problems that have been with us for a much longer
  period of time"

 1) Johannes Berg noticed two major deficiencies in our genetlink
    registration.  Some genetlink protocols we passing in constant
    counts for their ops array rather than something like
    ARRAY_SIZE(ops) or similar.  Also, some genetlink protocols were
    using fixed IDs for their multicast groups.

    We have to retain these fixed IDs to keep existing userland tools
    working, but reserve them so that other multicast groups used by
    other protocols can not possibly conflict.

    In dealing with these two problems, we actually now use less state
    management for genetlink operations and multicast groups.

 2) When configuring interface hardware timestamping, fix several
    drivers that simply do not validate that the hwtstamp_config value
    is one the driver actually supports.  From Ben Hutchings.

 3) Invalid memory references in mwifiex driver, from Amitkumar Karwar.

 4) In dev_forward_skb(), set the skb->protocol in the right order
    relative to skb_scrub_packet().  From Alexei Starovoitov.

 5) Bridge erroneously fails to use the proper wrapper functions to make
    calls to netdev_ops->ndo_vlan_rx_{add,kill}_vid.  Fix from Toshiaki
    Makita.

 6) When detaching a bridge port, make sure to flush all VLAN IDs to
    prevent them from leaking, also from Toshiaki Makita.

 7) Put in a compromise for TCP Small Queues so that deep queued devices
    that delay TX reclaim non-trivially don't have such a performance
    decrease.  One particularly problematic area is 802.11 AMPDU in
    wireless.  From Eric Dumazet.

 8) Fix crashes in tcp_fastopen_cache_get(), we can see NULL socket dsts
    here.  Fix from Eric Dumzaet, reported by Dave Jones.

 9) Fix use after free in ipv6 SIT driver, from Willem de Bruijn.

10) When computing mergeable buffer sizes, virtio-net fails to take the
    virtio-net header into account.  From Michael Dalton.

11) Fix seqlock deadlock in ip4_datagram_connect() wrt.  statistic
    bumping, this one has been with us for a while.  From Eric Dumazet.

12) Fix NULL deref in the new TIPC fragmentation handling, from Erik
    Hugne.

13) 6lowpan bit used for traffic classification was wrong, from Jukka
    Rissanen.

14) macvlan has the same issue as normal vlans did wrt.  propagating LRO
    disabling down to the real device, fix it the same way.  From Michal
    Kubecek.

15) CPSW driver needs to soft reset all slaves during suspend, from
    Daniel Mack.

16) Fix small frame pacing in FQ packet scheduler, from Eric Dumazet.

17) The xen-netfront RX buffer refill timer isn't properly scheduled on
    partial RX allocation success, from Ma JieYue.

18) When ipv6 ping protocol support was added, the AF_INET6 protocol
    initialization cleanup path on failure was borked a little.  Fix
    from Vlad Yasevich.

19) If a socket disconnects during a read/recvmsg/recvfrom/etc that
    blocks we can do the wrong thing with the msg_name we write back to
    userspace.  From Hannes Frederic Sowa.  There is another fix in the
    works from Hannes which will prevent future problems of this nature.

20) Fix route leak in VTI tunnel transmit, from Fan Du.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (106 commits)
  genetlink: make multicast groups const, prevent abuse
  genetlink: pass family to functions using groups
  genetlink: add and use genl_set_err()
  genetlink: remove family pointer from genl_multicast_group
  genetlink: remove genl_unregister_mc_group()
  hsr: don't call genl_unregister_mc_group()
  quota/genetlink: use proper genetlink multicast APIs
  drop_monitor/genetlink: use proper genetlink multicast APIs
  genetlink: only pass array to genl_register_family_with_ops()
  tcp: don't update snd_nxt, when a socket is switched from repair mode
  atm: idt77252: fix dev refcnt leak
  xfrm: Release dst if this dst is improper for vti tunnel
  netlink: fix documentation typo in netlink_set_err()
  be2net: Delete secondary unicast MAC addresses during be_close
  be2net: Fix unconditional enabling of Rx interface options
  net, virtio_net: replace the magic value
  ping: prevent NULL pointer dereference on write to msg_name
  bnx2x: Prevent "timeout waiting for state X"
  bnx2x: prevent CFC attention
  bnx2x: Prevent panic during DMAE timeout
  ...
2013-11-19 15:50:47 -08:00
Nicolas Dichtel
1e9f3d6f1c ip6tnl: fix use after free of fb_tnl_dev
Bug has been introduced by commit bb8140947a ("ip6tnl: allow to use rtnl ops
on fb tunnel").

When ip6_tunnel.ko is unloaded, FB device is delete by rtnl_link_unregister()
and then we try to use the pointer in ip6_tnl_destroy_tunnels().

Let's add an handler for dellink, which will never remove the FB tunnel. With
this patch it will no more be possible to remove it via 'ip link del ip6tnl0',
but it's safer.

The same fix was already proposed by Willem de Bruijn <willemb@google.com> for
sit interfaces.

CC: Willem de Bruijn <willemb@google.com>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14 17:04:38 -05:00
John Stultz
827da44c61 net: Explicitly initialize u64_stats_sync structures for lockdep
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.

The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.

This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.

Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.

Feedback would be appreciated!

Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-06 12:40:25 +01:00
Oussama Ghorbel
582442d6d5 ipv6: Allow the MTU of ipip6 tunnel to be set below 1280
The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530

This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
-In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
-In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.

Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-07 12:32:26 -04:00
Nicolas Dichtel
bb8140947a ip6tnl: allow to use rtnl ops on fb tunnel
rtnl ops where introduced by c075b13098 ("ip6tnl: advertise tunnel param via
rtnl"), but I forget to assign rtnl ops to fb tunnels.

Now that it is done, we must remove the explicit call to
unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
is valid since commit 0bd8762824 ("ip6tnl: add x-netns support")).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-01 12:55:53 -04:00
Ding Zhi
0d2ede929f ip6_tunnels: raddr and laddr are inverted in nl msg
IFLA_IPTUN_LOCAL and IFLA_IPTUN_REMOTE were inverted.

Introduced by c075b13098 (ip6tnl: advertise tunnel param via rtnl).

Signed-off-by: Ding Zhi <zhi.ding@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-16 21:36:12 -04:00
David S. Miller
06c54055be Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c
	net/bridge/br_multicast.c
	net/ipv6/sit.c

The conflicts were minor:

1) sit.c changes overlap with change to ip_tunnel_xmit() signature.

2) br_multicast.c had an overlap between computing max_delay using
   msecs_to_jiffies and turning MLDV2_MRC() into an inline function
   with a name using lowercase instead of uppercase letters.

3) stmmac had two overlapping changes, one which conditionally allocated
   and hooked up a dma_cfg based upon the presence of the pbl OF property,
   and another one handling store-and-forward DMA made.  The latter of
   which should not go into the new of_find_property() basic block.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05 14:58:52 -04:00
Nicolas Dichtel
ea23192e8e tunnels: harmonize cleanup done on skb on rx path
The goal of this patch is to harmonize cleanup done on a skbuff on rx path.
Before this patch, behaviors were different depending of the tunnel type.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:26 -04:00
Nicolas Dichtel
963a88b31d tunnels: harmonize cleanup done on skb on xmit path
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path.
Before this patch, behaviors were different depending of the tunnel type.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:25 -04:00
Nicolas Dichtel
8b27f27797 skb: allow skb_scrub_packet() to be used by tunnels
This function was only used when a packet was sent to another netns. Now, it can
also be used after tunnel encapsulation or decapsulation.

Only skb_orphan() should not be done when a packet is not crossing netns.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04 00:27:25 -04:00
Nicolas Dichtel
e837735ec4 ip6_tunnel: ensure to always have a link local address
When an Xin6 tunnel is set up, we check other netdevices to inherit the link-
local address. If none is available, the interface will not have any link-local
address. RFC4862 expects that each interface has a link local address.

Now than this kind of tunnels supports x-netns, it's easy to fall in this case
(by creating the tunnel in a netns where ethernet interfaces stand and then
moving it to a other netns where no ethernet interface is available).

RFC4291, Appendix A suggests two methods: the first is the one currently
implemented, the second is to generate a unique identifier, so that we can
always generate the link-local address. Let's use eth_random_addr() to generate
this interface indentifier.

I remove completly the previous method, hence for the whole life of the
interface, the link-local address remains the same (previously, it depends on
which ethernet interfaces were up when the tunnel interface was set up).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-20 23:45:42 -07:00
Hannes Frederic Sowa
3d483058c8 ipv6: wire up skb->encapsulation
When pushing a new header before current one call skb_reset_inner_headers
to record the position of the inner headers in the various ipv6 tunnel
protocols.

We later need this to correctly identify the addresses needed to send
back an error in the xfrm layer.

This change is safe, because skb->protocol is always checked before
dereferencing data from the inner protocol.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-08-19 09:37:46 +02:00
Nicolas Dichtel
0bd8762824 ip6tnl: add x-netns support
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15 01:00:20 -07:00
Pravin B Shelar
c544193214 GRE: Refactor GRE tunneling code.
Following patch refactors GRE code into ip tunneling code and GRE
specific code. Common tunneling code is moved to ip_tunnel module.
ip_tunnel module is written as generic library which can be used
by different tunneling implementations.

ip_tunnel module contains following components:
 - packet xmit and rcv generic code. xmit flow looks like
   (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
 - hash table of all devices.
 - lookup for tunnel devices.
 - control plane operations like device create, destroy, ioctl, netlink
   operations code.
 - registration for tunneling modules, like gre, ipip etc.
 - define single pcpu_tstats dev->tstats.
 - struct tnl_ptk_info added to pass parsed tunnel packet parameters.

ipip.h header is renamed to ip_tunnel.h

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:18 -04:00
Cong Wang
e8f72ea4a1 ipv6: introduce ip6tunnel_xmit() helper
Similar to iptunnel_xmit(), group these operations into a
helper function.

This by the way fixes the missing u64_stats_update_begin()
and u64_stats_update_end() for 32 bit arch.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 16:53:34 -04:00
YOSHIFUJI Hideaki / 吉藤英明
3e4e4c1f2d ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.
This is not only for readability but also for optimization.
What we do here is to build the 32bit word at the beginning of the ipv6
header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and
we do not need to read the tclass portion of the target buffer.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13 20:17:13 -05:00
Nicolas Dichtel
f4e0b4c5e1 ip6tnl/sit: drop packet if ECN present with not-ECT
This patch reports the change made by Stephen Hemminger in ipip and gre[6] in
commit eccc1bb8d4 (tunnel: drop packet if ECN present with not-ECT).

Goal is to handle RFC6040, Section 4.2:

Default Tunnel Egress Behaviour.
 o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
      propagate any other ECN codepoint onwards.  This is because the
      inner Not-ECT marking is set by transports that rely on dropped
      packets as an indication of congestion and would not understand or
      respond to any other ECN codepoint [RFC4774].  Specifically:

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         CE, the decapsulator MUST drop the packet.

      *  If the inner ECN field is Not-ECT and the outer ECN field is
         Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
         outgoing packet with the ECN field cleared to Not-ECT.

The patch takes benefits from common function added in net/inet_ecn.h.

Like it was done for Xin4 tunnels, it adds logging to allow detecting broken
systems that set ECN bits incorrectly when tunneling (or an intermediate
router might be changing the header). Errors are also tracked via
rx_frame_error.

CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-28 11:37:11 -05:00
Eric W. Biederman
af31f412c7 net: Allow userns root to control ipv6
Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed while
resource control is left unchanged.

Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.

Allow creation of ipv6 raw sockets.

Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.

Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.

Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.

Allow setting the multicast routing socket options on non multicast
routing sockets.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.

Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:32:45 -05:00
Nicolas Dichtel
1ff05fb711 ip6tnl: fix sparse warnings in ip6_tnl_netlink_parms()
This change fixes a sparse warning triggered by casting the flowinfo from
netlink messages in an u32 instead of be32. This change corrects that in order
to resolve the sparse warning.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-15 13:46:29 -05:00
Nicolas Dichtel
0b11245722 ip6tnl: add support of link creation via rtnl
This patch add the support of 'ip link .. type ip6tnl'.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 22:02:37 -05:00
Nicolas Dichtel
b58d731acc ip6tnl: rename rtnl functions for consistency
Functions in this file start with ip6_tnl_.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 22:02:37 -05:00
Nicolas Dichtel
cfa323b6b9 ip6tnl/rtnl: add IFLA_IPTUN_PROTO on dump
IPv6 tunnels can have three mode: 4in6, 6in6 and xin6.
This information was missing in the netlink message.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 22:02:37 -05:00
Amerigo Wang
aa0010f880 net: convert __IPTUNNEL_XMIT() to an inline function
__IPTUNNEL_XMIT() is an ugly macro, convert it to a static
inline function, so make it more readable.

IPTUNNEL_XMIT() is unused, just remove it.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-14 18:49:50 -05:00
Nicolas Dichtel
c075b13098 ip6tnl: advertise tunnel param via rtnl
It is usefull for daemons that monitor link event to have the full parameters of
these interfaces when a rtnl message is sent.
It allows also to dump them via rtnetlink.

It is based on what is done for GRE tunnels.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-09 19:36:20 -05:00
Amerigo Wang
94e187c015 ipv6: introduce ip6_rt_put()
As suggested by Eric, we could introduce a helper function
for ipv6 too, to avoid checking if rt is NULL before
dst_release().

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-03 14:59:05 -04:00
Dan Carpenter
5ef5d6c569 gre: information leak in ip6_tnl_ioctl()
There is a one byte hole between p->hop_limit and p->flowinfo where
stack memory is leaked to the user.  This was introduced in c12b395a46
"gre: Support GRE over IPv6".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-20 02:21:30 -07:00
xeb@mail.ru
c12b395a46 gre: Support GRE over IPv6
GRE over IPv6 implementation.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:28:32 -07:00
Eric Dumazet
ddbe503203 ipv6: add ipv6_addr_hash() helper
Introduce ipv6_addr_hash() helper doing a XOR on all bits
of an IPv6 address, with an optimized x86_64 version.

Use it in flow dissector, as suggested by Andrew McGregor,
to reduce hash collision probabilities in fq_codel (and other
users of flow dissector)

Use it in ip6_tunnel.c and use more bit shuffling, as suggested
by David Laight, as existing hash was ignoring most of them.

Use it in sunrpc and use more bit shuffling, using hash_32().

Use it in net/ipv6/addrconf.c, using hash_32() as well.

As a cleanup, use it in net/ipv4/tcp_metrics.c

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrew McGregor <andrewmcgr@gmail.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18 11:28:46 -07:00
David S. Miller
6700c2709c net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()
This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more.  In the
future the routes will be without destination address, source address,
etc. keying.  One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information.  This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here.  Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 03:29:28 -07:00
David S. Miller
1ed5c48f23 net: Remove checks for dst_ops->redirect being NULL.
No longer necessary.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12 00:41:25 -07:00
David S. Miller
ec18d9a269 ipv6: Add redirect support to all protocol icmp error handlers.
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12 00:25:15 -07:00
Ben Hutchings
2c53040f01 net: Fix (nearly-)kernel-doc comments for various functions
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:13:45 -07:00
Ville Nuorvala
d0087b29f7 ipv6_tunnel: Allow receiving packets on the fallback tunnel if they pass sanity checks
At Facebook, we do Layer-3 DSR via IP-in-IP tunneling. Our load balancers wrap
an extra IP header on incoming packets so they can be routed to the backend.
In the v4 tunnel driver, when these packets fall on the default tunl0 device,
the behavior is to decapsulate them and drop them back on the stack. So our
setup is that tunl0 has the VIP and eth0 has (obviously) the backend's real
address.

In IPv6 we do the same thing, but the v6 tunnel driver didn't have this same
behavior - if you didn't have an explicit tunnel setup, it would drop the
packet.

This patch brings that v4 feature to the v6 driver.

The same IPv6 address checks are performed as with any normal tunnel,
but as the fallback tunnel endpoint addresses are unspecified, the checks
must be performed on a per-packet basis, rather than at tunnel
configuration time.

[Patch description modified by phil@ipom.com]

Signed-off-by: Ville Nuorvala <ville.nuorvala@gmail.com>
Tested-by: Phil Dibowitz <phil@ipom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-29 00:52:32 -07:00
Eric Dumazet
92113bfde2 ipv6: bool conversions phase1
ipv6_opt_accepted() returns a bool, and can use const pointers

ipv6_addr_equal(), ipv6_addr_any(), ipv6_addr_loopback(),
ipv6_addr_orchid() return a bool.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-18 02:24:13 -04:00
Joe Perches
91df42bedc net: ipv4 and ipv6: Convert printk(KERN_DEBUG to pr_debug
Use the current debugging style and enable dynamic_debug.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 01:01:03 -04:00
Joe Perches
f32138319c net: ipv6: Standardize prefixes for message logging
Add #define pr_fmt(fmt) as appropriate.

Add "IPv6: " to appropriate files.

Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG).
Standardize on "%s: " not "%s(): " when emitting __func__.
Use "%s: ", __func__ instead of embedding function name.
Coalesce formats, align arguments.

ADDRCONF output is now prefixed with "IPv6: "

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 01:01:03 -04:00
Joe Perches
e87cc4728f net: Convert net_ratelimit uses to net_<level>_ratelimited
Standardize the net core ratelimited logging functions.

Coalesce formats, align arguments.
Change a printk then vprintk sequence to use printf extension %pV.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15 13:45:03 -04:00
Eric Dumazet
9ff264492f ip6_tunnel: dont drop packet but consume it
When we need to reallocate skb, we dont drop a packet.
Call consume_skb() to not confuse dropwatch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-19 14:23:55 -04:00
Eric Dumazet
95c9617472 net: cleanup unsigned to unsigned int
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15 12:44:40 -04:00
Eric Dumazet
cf778b00e9 net: reintroduce missing rcu_assign_pointer() calls
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to
RCU_INIT_POINTER) did a lot of incorrect changes, since it did a
complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x,
y).

We miss needed barriers, even on x86, when y is not NULL.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12 12:26:56 -08:00
David S. Miller
d191854282 ipv6: Kill rt6i_dev and rt6i_expires defines.
It just obscures that the netdevice pointer and the expires value are
implemented in the dst_entry sub-object of the ipv6 route.

And it makes grepping for dst_entry member uses much harder too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-28 20:19:20 -05:00
Alexey Dobriyan
4e3fd7a06d net: remove ipv6_addr_copy()
C assignment can handle struct in6_addr copying.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-22 16:43:32 -05:00
David S. Miller
efd0bf97de Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The forcedeth changes had a conflict with the conversion over
to atomic u64 statistics in net-next.

The libertas cfg.c code had a conflict with the bss reference
counting fix by John Linville in net-next.

Conflicts:
	drivers/net/ethernet/nvidia/forcedeth.c
	drivers/net/wireless/libertas/cfg.c
2011-11-21 13:50:33 -05:00
Josh Boyer
731abb9cb2 ip6_tunnel: copy parms.name after register_netdevice
Commit 1c5cae815d removed an explicit call to dev_alloc_name in ip6_tnl_create
because register_netdevice will now create a valid name.  This works for the
net_device itself.

However the tunnel keeps a copy of the name in the parms structure for the
ip6_tnl associated with the tunnel.  parms.name is set by copying the net_device
name in ip6_tnl_dev_init_gen.  That function is called from ip6_tnl_dev_init in
ip6_tnl_create, but it is done before register_netdevice is called so the name
is set to a bogus value in the parms.name structure.

This shows up if you do a simple tunnel add, followed by a tunnel show:

[root@localhost ~]# ip -6 tunnel add remote fec0::100 local fec0::200
[root@localhost ~]# ip -6 tunnel show
ip6tnl0: ipv6/ipv6 remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
ip6tnl%d: ipv6/ipv6 remote fec0::100 local fec0::200 encaplimit 4 hoplimit 64 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
[root@localhost ~]#

Fix this by moving the strcpy out of ip6_tnl_dev_init_gen, and calling it after
register_netdevice has successfully returned.

Cc: stable@vger.kernel.org
Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-14 00:24:06 -05:00
Eric Dumazet
8ce120f118 net: better pcpu data alignment
Tunnels can force an alignment of their percpu data to reduce number of
cache lines used in fast path, or read in .ndo_get_stats()

percpu_alloc() is a very fine grained allocator, so any small hole will
be used anyway.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-08 15:10:59 -05:00
Eric Dumazet
d24f22f3df ip6_tunnel: add optional fwmark inherit
Add IP6_TNL_F_USE_ORIG_FWMARK to ip6_tunnel, so that ip6_tnl_xmit2()
makes a route lookup taking into account skb->fwmark and doesnt cache
lookup result.

This permits more flexibility in policies and firewall setups.

To setup such a tunnel, "fwmark inherit" option should be added to "ip
-f inet6 tunnel" command.

Reported-by: Anders Franzen <Anders.Franzen@ericsson.com>
CC: Hans Schillström <hans.schillstrom@ericsson.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-09-20 14:50:00 -04:00
Stephen Hemminger
a9b3cd7f32 rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
When assigning a NULL value to an RCU protected pointer, no barrier
is needed. The rcu_assign_pointer, used to handle that but will soon
change to not handle the special case.

Convert all rcu_assign_pointer of NULL value.

//smpl
@@ expression P; @@

- rcu_assign_pointer(P, NULL)
+ RCU_INIT_POINTER(P, NULL)

// </smpl>

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-02 04:29:23 -07:00
Eric Dumazet
89b0212697 ip6tnl: avoid touching dst refcount in ip6_tnl_xmit2()
Even using percpu stats, we still hit tunnel dst_entry refcount in
ip6_tnl_xmit2()

Since we are in RCU locked section, we can use skb_dst_set_noref() and
avoid these atomic operations, leaving dst shared on cpus.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-01 00:12:00 -07:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Jiri Pirko
1c5cae815d net: call dev_alloc_name from register_netdevice
Force dev_alloc_name() to be called from register_netdevice() by
dev_get_valid_name(). That allows to remove multiple explicit
dev_alloc_name() calls.

The possibility to call dev_alloc_name in advance remains.

This also fixes veth creation regresion caused by
84c49d8c3e

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-05 10:57:45 -07:00
David S. Miller
31e4543db2 ipv4: Make caller provide on-stack flow key to ip_route_output_ports().
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-03 20:25:42 -07:00
Eric Dumazet
b71d1d426d inet: constify ip headers and in6_addr
Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-22 11:04:14 -07:00
David S. Miller
4c9483b2fb ipv6: Convert to use flowi6 where applicable.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:54 -08:00
David S. Miller
1d28f42c1b net: Put flowi_* prefix on AF independent members of struct flowi
I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:44 -08:00
David S. Miller
78fbfd8a65 ipv4: Create and use route lookup helpers.
The idea here is this minimizes the number of places one has to edit
in order to make changes to how flows are defined and used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-12 15:08:42 -08:00
David S. Miller
33175d84ee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bnx2x/bnx2x_cmn.c
2011-03-10 14:26:00 -08:00
stephen hemminger
6dfbd87a20 ip6ip6: autoload ip6 tunnel
Add necessary alias to autoload ip6ip6 tunnel module.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-10 14:18:48 -08:00
David S. Miller
b23dd4fe42 ipv4: Make output route lookup return rtable directly.
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 14:31:35 -08:00
David S. Miller
452edd598f xfrm: Return dst directly from xfrm_lookup()
Instead of on the stack.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-03-02 13:27:41 -08:00
David S. Miller
fe6c791570 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
	net/llc/af_llc.c
2010-12-08 13:47:38 -08:00
Anders Franzen
381601e5bb Make the ip6_tunnel reflect the true mtu.
The ip6_tunnel always assumes it consumes 40 bytes (ip6 hdr) of the mtu of the
underlaying device. So for a normal ethernet bearer, the mtu of the ip6_tunnel is
1460.
However, when creating a tunnel the encap limit option is enabled by default, and it
consumes 8 bytes more, so the true mtu shall be 1452.

I dont really know if this breaks some statement in some RFC, so this is a request for
comments.

Signed-off-by: Anders Franzen <anders.franzen@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 10:55:47 -08:00