Commit Graph

15019 Commits

Author SHA1 Message Date
stephen hemminger
5611551103 dst: don't inline dst_ifdown
The function dst_ifdown is called only two places but in a non-
performance critical code path, there is no reason to inline it.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 03:32:44 -07:00
Eric Dumazet
acbbc07145 net: uninline skb_bond_should_drop()
skb_bond_should_drop() is too big to be inlined.

This patch reduces kernel text size, and its compilation time as well
(shrinking include/linux/netdevice.h)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 03:32:42 -07:00
stephen hemminger
8595805aaf IPv6: only notify protocols if address is compeletely gone
The notifier for address down should only be called if address is completely
gone, not just being marked as tentative on link transistion. The code
in net-next would case bonding/sctp/s390 to see address disappear on link
down, but they would never see it reappear on link up.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 02:29:28 -07:00
stephen hemminger
d1f84c63a4 ipv6: additional ref count for hash list unnecessary
Since an address in hash list has to already have a ref count,
no additional ref count is needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 02:29:28 -07:00
stephen hemminger
27bdb2abcc IPv6: keep tentative addresses in hash table
When link goes down, want address to be preserved but in a tentative
state, therefore it has to stay in hash list.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 02:29:27 -07:00
stephen hemminger
93fa159abe IPv6: keep route for tentative address
Recent changes preserve IPv6 address when link goes down (good).
But would cause address to point to dead dst entry (bad).
The simplest fix is to just not delete route if address is
being held for later use.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 02:29:27 -07:00
Eric Dumazet
b6c6712a42 net: sk_dst_cache RCUification
With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
work.

sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)

This rwlock is readlocked for a very small amount of time, and dst
entries are already freed after RCU grace period. This calls for RCU
again :)

This patch converts sk_dst_lock to a spinlock, and use RCU for readers.

__sk_dst_get() is supposed to be called with rcu_read_lock() or if
socket locked by user, so use appropriate rcu_dereference_check()
condition (rcu_read_lock_held() || sock_owned_by_user(sk))

This patch avoids two atomic ops per tx packet on UDP connected sockets,
for example, and permits sk_dst_lock to be much less dirtied.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 01:41:33 -07:00
Eric Dumazet
7a161ea924 net: Dont use netdev_warn()
Dont use netdev_warn() in dev_cap_txqueue() and get_rps_cpu() so that we
can catch following warnings without crash.

bond0.2240 received packet on queue 6, but number of RX queues is 1
bond0.2240 received packet on queue 11, but number of RX queues is 1

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 01:41:32 -07:00
Richard Cochran
ed85b565b8 packet: support for TX time stamps on RAW sockets
Enable the SO_TIMESTAMPING socket infrastructure for raw packet sockets.
We introduce PACKET_TX_TIMESTAMP for the control message cmsg_type.

Similar support for UDP and CAN sockets was added in commit
51f31cabe3

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 01:30:48 -07:00
David S. Miller
2e8e18ef52 tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb
Back in commit 04a0551c87
("loopback: Drop obsolete ip_summed setting") we stopped
setting CHECKSUM_UNNECESSARY in the loopback xmit.

This is because such a setting was a lie since it implies that the
checksum field of the packet is properly filled in.

Instead what happens normally is that CHECKSUM_PARTIAL is set and
skb->csum is calculated as needed.

But this was only happening for TCP data packets (via the
skb->ip_summed assignment done in tcp_sendmsg()).  It doesn't
happen for non-data packets like ACKs etc.

Fix this by setting skb->ip_summed in the common non-data packet
constructor.  It already is setting skb->csum to zero.

But this reminds us that we still have things like ip_output.c's
ip_dev_loopback_xmit() which sets skb->ip_summed to the value
CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not
valid.  So we'll have to address that at some point too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:13 -07:00
Herbert Xu
bb29624614 inet: Remove unused send_check length argument
inet: Remove unused send_check length argument

This patch removes the unused length argument from the send_check
function in struct inet_connection_sock_af_ops.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yinghai <yinghai.lu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:09 -07:00
Herbert Xu
8ad50d96db tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv6
tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv6

This patch moves the common code between tcp_v6_send_check and
tcp_v6_gso_send_check into a new function __tcp_v6_send_check.

It then uses the new function in tcp_v6_send_synack as well as
tcp_v6_send_response so that they handle CHECKSUM_PARTIAL properly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yinghai <yinghai.lu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:08 -07:00
Herbert Xu
419f9f8960 tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4
tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4

This patch moves the common code between tcp_v4_send_check and
tcp_v4_gso_send_check into a new function __tcp_v4_send_check.

It then uses the new function in tcp_v4_send_synack so that it
handles CHECKSUM_PARTIAL properly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yinghai <yinghai.lu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:08 -07:00
David S. Miller
871039f02f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/stmmac/stmmac_main.c
	drivers/net/wireless/wl12xx/wl1271_cmd.c
	drivers/net/wireless/wl12xx/wl1271_main.c
	drivers/net/wireless/wl12xx/wl1271_spi.c
	net/core/ethtool.c
	net/mac80211/scan.c
2010-04-11 14:53:53 -07:00
David S. Miller
4a1032faac Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-04-11 02:44:30 -07:00
David S. Miller
ae4e8d63b5 Revert "tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"
This reverts commit 2626419ad5.

It causes regressions for people with IGB cards.  Connection
requests don't complete etc.  The true cause of the issue is
still not known, but we should sort this out in net-next-2.6
not net-2.6

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 02:40:49 -07:00
David S. Miller
2626419ad5 tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb
Back in commit 04a0551c87
("loopback: Drop obsolete ip_summed setting") we stopped
setting CHECKSUM_UNNECESSARY in the loopback xmit.

This is because such a setting was a lie since it implies that the
checksum field of the packet is properly filled in.

Instead what happens normally is that CHECKSUM_PARTIAL is set and
skb->csum is calculated as needed.

But this was only happening for TCP data packets (via the
skb->ip_summed assignment done in tcp_sendmsg()).  It doesn't
happen for non-data packets like ACKs etc.

Fix this by setting skb->ip_summed in the common non-data packet
constructor.  It already is setting skb->csum to zero.

But this reminds us that we still have things like ip_output.c's
ip_dev_loopback_xmit() which sets skb->ip_summed to the value
CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not
valid.  So we'll have to address that at some point too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-08 11:32:30 -07:00
Jorge Boncompte [DTI2]
1223c67c09 udp: fix for unicast RX path optimization
Commits 5051ebd275 and
5051ebd275 ("ipv[46]: udp: optimize unicast RX
path") broke some programs.

	After upgrading a L2TP server to 2.6.33 it started to fail, tunnels going up an
down, after the 10th tunnel came up. My modified rp-l2tp uses a global
unconnected socket bound to (INADDR_ANY, 1701) and one connected socket per
tunnel after parameter negotiation.

	After ten sockets were open and due to mixed parameters to
udp[46]_lib_lookup2() kernel started to drop packets.

Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-08 11:29:13 -07:00
chavey
97f8aefbbf net: fix ethtool coding style errors and warnings
Fix coding style errors and warnings output while running checkpatch.pl
on the files net/core/ethtool.c and include/linux/ethtool.h

Signed-off-by: chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:54:42 -07:00
John Hughes
ddd0451fc8 x.25 attempts to negotiate invalid throughput
The current X.25 code has some bugs in throughput negotiation:

   1. It does negotiation in all cases, usually there is no need
   2. It incorrectly attempts to negotiate the throughput class in one
      direction only.  There are separate throughput classes for input
      and output and if either is negotiated both mist be negotiates.

This is bug https://bugzilla.kernel.org/show_bug.cgi?id=15681

This bug was first reported by Daniel Ferenci to the linux-x25 mailing
list on 6/8/2004, but is still present.

The current (2.6.34) x.25 code doesn't seem to know that the X.25
throughput facility includes two values, one for the required
throughput outbound, one for inbound.

This causes it to attempt to negotiate throughput 0x0A, which is
throughput 9600 inbound and the illegal value "0" for inbound
throughput.

Because of this some X.25 devices (e.g. Cisco 1600) refuse to connect
to Linux X.25.

The following patch fixes this behaviour.  Unless the user specifies a
required throughput it does not attempt to negotiate.  If the user
does not specify a throughput it accepts the suggestion of the remote
X.25 system.  If the user requests a throughput then it validates both
the input and output throughputs and correctly negotiates them with
the remote end.

Signed-off-by: John Hughes <john@calva.com>
Tested-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:33:02 -07:00
John Hughes
f5eb917b86 x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.
Here is a patch to stop X.25 examining fields beyond the end of the packet.

For example, when a simple CALL ACCEPTED was received:

	10 10 0f

x25_parse_facilities was attempting to decode the FACILITIES field, but this
packet contains no facilities field.

Signed-off-by: John Hughes <john@calva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:29:25 -07:00
Herbert Xu
fd218cf955 bridge: Fix IGMP3 report parsing
The IGMP3 report parsing is looking at the wrong address for
group records.  This patch fixes it.

Reported-by: Banyeer <banyeer@yahoo.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:20:47 -07:00
Eric Dumazet
298b9e44be net: include linux/proc_fs.h in dev_addr_lists.c
As pointed by Randy Dunlap, we must include linux/proc_fs.h in
net/core/dev_addr_lists.c, regardless of CONFIG_PROC_FS

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>, 
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 16:46:36 -07:00
David S. Miller
005c93b5d8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-04-07 16:41:03 -07:00
Timo Teräs
8e4795605d flow: delayed deletion of flow cache entries
Speed up lookups by freeing flow cache entries later. After
virtualizing flow cache entry operations, the flow cache may now
end up calling policy or bundle destructor which can be slowish.

As gc_list is more effective with double linked list, the flow cache
is converted to use common hlist and list macroes where appropriate.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:20 -07:00
Timo Teräs
285ead175c xfrm: remove policy garbage collection
Policies are now properly reference counted and destroyed from
all code paths. The delayed gc is just an overhead now and can
be removed.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:19 -07:00
Timo Teräs
80c802f307 xfrm: cache bundles instead of policies for outgoing flows
__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.

This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:19 -07:00
Timo Teräs
fe1a5f031e flow: virtualize flow cache entry methods
This allows to validate the cached object before returning it.
It also allows to destruct object properly, if the last reference
was held in flow cache. This is also a prepartion for caching
bundles in the flow cache.

In return for virtualizing the methods, we save on:
- not having to regenerate the whole flow cache on policy removal:
  each flow matching a killed policy gets refreshed as the getter
  function notices it smartly.
- we do not have to call flow_cache_flush from policy gc, since the
  flow cache now properly deletes the object if it had any references

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:18 -07:00
David S. Miller
4a35ecf8bf Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bonding/bond_main.c
	drivers/net/via-velocity.c
	drivers/net/wireless/iwlwifi/iwl-agn.c
2010-04-06 23:53:30 -07:00
Hagen Paul Pfeifer
842509b859 socket: remove duplicate declaration of struct timespec
struct timespec ts was alreay defined. Reuse the previously
defined one and reduce the memory footprint on the stack by
16 bytes.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-06 19:50:20 -07:00
Jon Paul Maloy
c6537d6742 TIPC: Updated topology subscription protocol according to latest spec
This patch makes it explicit in the API that all fields in subscriptions and events exchanged with the Topology Server must be in
network byte order.
It also ensures that all fields of a subscription are compared when cancelling a subscription, in order to avoid inadvertent
cancelling of the wrong subscription.
Finally, the tipc module version is updated to 2.0.0, to reflect the API change.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-06 19:50:19 -07:00
Johannes Berg
0379185b6c mac80211: annotate station rcu dereferences
The new RCU lockdep support warns about these
in some contexts -- make it aware of the locks
used to protect all this. Different locks are
used in different contexts which unfortunately
means we can't get perfect checking.

Also remove rcu_dereference() from two places
that don't actually dereference the pointers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-06 15:53:30 -04:00
Javier Cardona
1cb561f837 mac80211: Handle mesh action frames in ieee80211_rx_h_action
This fixes the problem introduced in commit
8404080568 which broke mesh peer link establishment.

changes:
v2 	Added missing break (Johannes)
v3 	Broke original patch into two (Johannes)

Signed-off-by: Javier Cardona <javier@cozybit.com>
Cc: stable@kernel.org
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-06 15:53:28 -04:00
Linus Torvalds
cb4361c1dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
  smc91c92_cs: fix the problem of "Unable to find hardware address"
  r8169: clean up my printk uglyness
  net: Hook up cxgb4 to Kconfig and Makefile
  cxgb4: Add main driver file and driver Makefile
  cxgb4: Add remaining driver headers and L2T management
  cxgb4: Add packet queues and packet DMA code
  cxgb4: Add HW and FW support code
  cxgb4: Add register, message, and FW definitions
  netlabel: Fix several rcu_dereference() calls used without RCU read locks
  bonding: fix potential deadlock in bond_uninit()
  net: check the length of the socket address passed to connect(2)
  stmmac: add documentation for the driver.
  stmmac: fix kconfig for crc32 build error
  be2net: fix bug in vlan rx path for big endian architecture
  be2net: fix flashing on big endian architectures
  be2net: fix a bug in flashing the redboot section
  bonding: bond_xmit_roundrobin() fix
  drivers/net: Add missing unlock
  net: gianfar - align BD ring size console messages
  net: gianfar - initialize per-queue statistics
  ...
2010-04-06 08:34:06 -07:00
YOSHIFUJI Hideaki / 吉藤英明
2f787b0b76 mac80211: Ensure initializing private mc_list in prepare_multicast().
Fix kernel panic by NULL pointer dereference in the context of
ieee80211_ops->prepare_multicast().

This bug was introduced by commit 22bedad3c.. ("net: convert
multicast list to list_head").

Call __hw_addr_init() in ieee80211_alloc_hw() to initialize
list_head of private device multicast list, like we do in
bond_init().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-06 00:12:30 -07:00
Eric Dumazet
e4008276fd net: Add a missing local_irq_enable()
As noticed by Changli Gao, we must call local_irq_enable() after
rps_unlock()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-05 15:42:39 -07:00
Tom Herbert
5a6d234e73 rps: fixed missed rps_unlock
Fix spin_unlock_irq which needs to be rps_unlock.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-05 14:37:55 -07:00
Linus Torvalds
749d229761 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: saving negative to unsigned char
  9p: return on mutex_lock_interruptible()
  9p: Creating files with names too long should fail with ENAMETOOLONG.
  9p: Make sure we are able to clunk the cached fid on umount
  9p: drop nlink remove
  fs/9p: Clunk the fid resulting from partial walk of the name
  9p: documentation update
  9p: Fix setting of protocol flags in v9fs_session_info structure.
2010-04-05 13:42:54 -07:00
Dan Carpenter
3dc9fef67f 9p: saving negative to unsigned char
Saving -EINVAL as unsigned char truncates the high bits and changes it
into 234 instead of -22.  This breaks the test for "if (ret == -EINVAL)"
in parse_opts().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05 14:37:28 -05:00
Aneesh Kumar K.V
6d96d3ab7a 9p: Make sure we are able to clunk the cached fid on umount
dcache prune happen on umount. So we cannot mark the client
satus disconnect. That will prevent a 9p call to the server

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05 10:37:36 -05:00
Eric Dumazet
7bddd0db62 l2tp: unmanaged L2TPv3 tunnels fixes
Followup to commit 789a4a2c 
(l2tp: Add support for static unmanaged L2TPv3 tunnels)

One missing init in l2tp_tunnel_sock_create() could access random kernel
memory, and a bit field should be unsigned.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-04 01:02:46 -07:00
Brian Haley
486f50ca79 SCTP: Change to use ipv6_addr_copy()
Change SCTP IPv6 code to use ipv6_addr_copy()

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 15:10:21 -07:00
Eric Dumazet
1f8438a853 icmp: Account for ICMP out errors
When ip_append() fails because of socket limit or memory shortage,
increment ICMP_MIB_OUTERRORS counter, so that "netstat -s" can report
these errors.

LANG=C netstat -s | grep "ICMP messages failed"
    0 ICMP messages failed

For IPV6, implement ICMP6_MIB_OUTERRORS counter as well.

# grep Icmp6OutErrors /proc/net/dev_snmp6/*
/proc/net/dev_snmp6/eth0:Icmp6OutErrors                   	0
/proc/net/dev_snmp6/lo:Icmp6OutErrors                   	0

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 15:09:04 -07:00
David S. Miller
f66ef2d064 l2tp: Fix L2TP_DEBUGFS ifdef tests.
We have to check CONFIG_L2TP_DEBUGFS_MODULE as well as
CONFIG_L2TP_DEBUGFS.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 15:01:37 -07:00
David S. Miller
f481c0d862 l2tp: Add missing semicolon to MODULE_ALIAS() in l2tp_netlink.c
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:58:07 -07:00
James Chapman
789a4a2c61 l2tp: Add support for static unmanaged L2TPv3 tunnels
This patch adds support for static (unmanaged) L2TPv3 tunnels, where
the tunnel socket is created by the kernel rather than being created
by userspace. This means L2TP tunnels and sessions can be created
manually, without needing an L2TP control protocol implemented in
userspace. This might be useful where the user wants a simple ethernet
over IP tunnel.

A patch to iproute2 adds a new command set under "ip l2tp" to make use
of this feature. This will be submitted separately.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:56:08 -07:00
James Chapman
0ad6614048 l2tp: Add debugfs files for dumping l2tp debug info
The existing pppol2tp driver exports debug info to
/proc/net/pppol2tp. Rather than adding info to that file for the new
functionality added in this patch series, we add new files in debugfs,
leaving the old /proc file for backwards compatibility (L2TPv2 only).

Currently only one file is provided: l2tp/tunnels, which lists
internal debug info for all l2tp tunnels and sessions. More files may
be added later. The info is for debug and problem analysis only -
userspace apps should use netlink to obtain status about l2tp tunnels
and sessions.

Although debugfs does not support net namespaces, the tunnels and
sessions dumped in l2tp/tunnels are only those in the net namespace of
the process reading the file.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:56:07 -07:00
James Chapman
d9e31d17ce l2tp: Add L2TP ethernet pseudowire support
This driver presents a regular net_device for each L2TP ethernet
pseudowire instance. These interfaces are named l2tpethN by default,
though userspace can specify an alternative name when the L2TP
session is created, if preferred. When the pseudowire is established,
regular Linux networking utilities may be used to configure the
interface, i.e. give it IP address info or add it to a bridge. Any
data passed over the interface is carried over an L2TP tunnel.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:56:06 -07:00
James Chapman
e02d494d2c l2tp: Convert rwlock to RCU
Reader/write locks are discouraged because they are slower than spin
locks. So this patch converts the rwlocks used in the per_net structs
to rcu.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:56:06 -07:00
James Chapman
309795f4be l2tp: Add netlink control API for L2TP
In L2TPv3, we need to create/delete/modify/query L2TP tunnel and
session contexts. The number of parameters is significant. So let's
use netlink. Userspace uses this API to control L2TP tunnel/session
contexts in the kernel.

The previous pppol2tp driver was managed using [gs]etsockopt(). This
API is retained for backwards compatibility. Unlike L2TPv2 which
carries only PPP frames, L2TPv3 can carry raw ethernet frames or other
frame types and these do not always have an associated socket
family. Therefore, we need a way to use L2TP sessions that doesn't
require a socket type for each supported frame type. Hence netlink is
used.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-03 14:56:05 -07:00