For 82576 MAC type, max_adj is reported as 1000000000 ppb. However, if
this value is passed to igb_ptp_adjfreq_82576, incvalue overflows out of
INCVALUE_82576_MASK, resulting in setting of zero TIMINCA.incvalue, stopping
the PHC (instead of going at twice the nominal speed).
Fix the advertised max_adj value to the largest value hardware can handle.
As there is no min_adj value available (-max_adj is used instead), this will
also prevent stopping the clock intentionally. It's probably not a big deal,
other igb MAC types don't support stopping the clock, either.
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Trivial sparse warning.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
igb is ineffective at setting a lower total VFs because:
int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
{
...
/* Shouldn't change if VFs already enabled */
if (dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
return -EBUSY;
Swap init ordering.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The max_vfs= option has always been self limiting to the number of VFs
supported by the device. fa44f2f1 added SR-IOV configuration via
sysfs, but in the process broke this self correction factor. The
failing path is:
igb_probe
igb_sw_init
if (max_vfs > 7) {
adapter->vfs_allocated_count = 7;
...
igb_probe_vfs
igb_enable_sriov(, max_vfs)
if (num_vfs > 7) {
err = -EPERM;
...
This leaves vfs_allocated_count = 7 and vf_data = NULL, so we bomb out
when igb_probe finally calls igb_reset. It seems like a really bad
idea, and somewhat pointless, to set vfs_allocated_count separate from
vf_data, but limiting max_vfs is enough to avoid the null pointer.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix a problem in i350 where anti spoofing configuration was written into a
wrong register.
Signed-off-by: Lior Levy <lior.levy@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the ixgbevf driver is opened the request to allocate MSIX irq
vectors may fail. In that case the driver will call ixgbevf_down()
which will call ixgbevf_irq_disable() to clear the HW interrupt
registers and calls synchronize_irq() using the msix_entries pointer in
the adapter structure. However, when the function to request the MSIX
irq vectors failed it had already freed the msix_entries which causes
an OOPs from using the NULL pointer in synchronize_irq().
The calls to pci_disable_msix() and to free the msix_entries memory
should not occur if device open fails. Instead they should be called
during device driver removal to balance with the call to
pci_enable_msix() and the call to allocate msix_entries memory
during the device probe and driver load.
Signed-off-by: Li Xun <xunleer.li@huawei.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned. This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned. This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.
Dave, I think this one should go on the -stable pile.
Special thanks to Jan for coming up with a reproducer for this
problem.
Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On SACK reneging the sender immediately retransmits and forces a
timeout but disables Eifel (undo). If the (buggy) receiver does not
drop any packet this can trigger a false slow-start retransmit storm
driven by the ACKs of the original packets. This can be detected with
undo and TCP timestamps.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix for incorrect assignment of signed expression to unsigned variable.
Signed-off-by: Kumar Amit Mehta <gmate.amit@gmail.com>
Acked-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When I tried to set mac address of a bridge interface to a mac
address which already learned on this bridge, I got system hang.
The cause is straight forward: function br_fdb_change_mac_address
calls fdb_insert with NULL source nbp. Then an fdb lookup is
performed. If an fdb entry is found and it's local, it's OK. But
if it's not local, source is dereferenced for printk without NULL
check.
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vlan_vid_del() could possibly free ->vlan_info after a RCU grace
period, however, we may still refer to the freed memory area
by 'grp' pointer. Found by code inspection.
This patch moves vlan_vid_del() as behind as possible.
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false
positive, in socket clone path, run from softirq context :
[ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80()
[ 3641.668811] Call Trace:
[ 3641.671254] <IRQ> [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0
[ 3641.677871] [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20
[ 3641.683683] [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80
[ 3641.689668] [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450
[ 3641.695222] [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170
[ 3641.701213] [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820
[ 3641.707663] [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670
[ 3641.713354] [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0
[ 3641.719425] [<ffffffff807af63a>] tcp_check_req+0x3da/0x530
[ 3641.724979] [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80
[ 3641.730964] [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0
[ 3641.736430] [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0
[ 3641.741985] [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0
Its safe at this point because the parent socket owns a reference
on the netstamp_needed, so we cant have a 0 -> 1 transition, which
requires to lock a mutex.
Instead of refining the check, lets remove it, as all known callers
are safe. If it ever changes in the future, static_key_slow_inc()
will complain anyway.
Reported-by: Laurent Chavey <chavey@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.
Current code leads to following bad bursty behavior :
20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119
While the expected behavior is more like :
20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow the common pattern and define *_DIAG_MAX like:
[...]
__XXX_DIAG_MAX,
};
Because everyone is used to do:
struct nlattr *attrs[XXX_DIAG_MAX+1];
nla_parse([...], XXX_DIAG_MAX, [...]
Reported-by: Thomas Graf <tgraf@suug.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz says:
====================
Here's a batch of mlx4 driver fixes for 3.9, mostly SRIOV/Flow-steering
related. Series done against the net tree as of commit 5a3da1f
"inet: limit length of fragment queue hash table bucket lists
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
VF QPs must not be released when they have steering rules attached to them.
For that end, introduce a reference count field to the QP object in the
SRIOV resource tracker which is incremented/decremented when steering rules
are attached/detached to it. QPs can be released by VF only when their
ref count is zero.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the resource tracker code paths was wrongly using int and not u64
for resource tracking IDs, fix it.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the ethtool flow steering rules cleanup to be carried out before
releasing the RX QPs.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the resource tracker cleanup flow, the DMFS rules must be deleted before we
destroy the QPs, else the HW may attempt doing packet steering to non existent QPs.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the mask is wrongly set in the MAP_EQ wrapper, fix that.
Without the fix any EQ number above 511 is mapped to one below 511.
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error check in cpsw_probe_dt() has an '&&' where an '||' is
meant to be. This causes a NULL pointer dereference when incomplet DT
data is passed to the driver ('phy_id' property for cpsw_emac1
missing).
Signed-off-by: Lothar Waßmann <LW@KARO-electronics.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Freeing netdev without free_netdev() leads to net, tx leaks.
And it may lead to dereferencing freed pointer.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
I present to you another batch of fixes intended for the 3.9 stream...
On the bluetooth bits, Gustavo says:
"I put together 3 fixes intended for 3.9, there are support for two
new devices and a NULL dereference fix in the SCO code."
Amitkumar Karwar fixes a command queueing race in mwifiex.
Bing Zhao provides a pair of mwifiex related to cleaning-up before
a shutdown.
Felix Fietkau provides an ath9k fix for a regression caused by an
earlier calibration fix, and another ath9k fix to avoid race conditions
that unnecessarily lead to chip resets.
Jussi Kivilinna prevents and skbuff leak in rtlwifi.
Stanislaw Gruszka corrects a length paramater for a DMA buffer mapping
operation in iwlegacy.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following warnings that happen when building with W=1 option:
drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_free_buffers':
drivers/net/ethernet/freescale/fec.c:1337:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_alloc_buffers':
drivers/net/ethernet/freescale/fec.c:1361:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
drivers/net/ethernet/freescale/fec.c: In function 'fec_enet_init':
drivers/net/ethernet/freescale/fec.c:1631:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This makes sure that release_sock is called for all error conditions in
irda_getsockopt.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
One must check the result of ioremap() -- in this case it prevents potential
kernel oops when initializing TSU registers further on...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sh_mdio_init() allocates pointer to 'struct bb_info' but only stores it locally,
so that sh_mdio_release() can't free it on driver unload. Add the pointer to
'struct bb_info' to 'struct sh_eth_private', so that sh_mdio_init() can save
'bitbang' variable for sh_mdio_release() to be able to free it later...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Parkin says:
====================
This l2tp bugfix patchset addresses a number of issues.
The first five patches in the series prevent l2tp sessions pinning an l2tp
tunnel open. This occurs because the l2tp tunnel is torn down in the tunnel
socket destructor, but each session holds a tunnel socket reference which
prevents tunnels with sessions being deleted. The solution I've implemented
here involves adding a .destroy hook to udp code, as discussed previously on
netdev[1].
The subsequent seven patches address futher bugs exposed by fixing the problem
above, or exposed through stress testing the implementation above. Patch 11
(avoid deadlock in l2tp stats update) isn't directly related to tunnel/session
lifetimes, but it does prevent deadlocks on i386 kernels running on 64 bit
hardware.
This patchset has been tested on 32 and 64 bit preempt/non-preempt kernels,
using iproute2, openl2tp, and custom-made stress test code.
[1] http://comments.gmane.org/gmane.linux.network/259169
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If we postpone unhashing of l2tp sessions until the structure is freed, we
risk:
1. further packets arriving and getting queued while the pseudowire is being
closed down
2. the recv path hitting "scheduling while atomic" errors in the case that
recv drops the last reference to a session and calls l2tp_session_free
while in atomic context
As such, l2tp sessions should be unhashed from l2tp_core data structures early
in the teardown process prior to calling pseudowire close. For pseudowires
like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp's u64_stats writers were incorrectly synchronised, making it possible to
deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp
code netlink commands while passing data through l2tp sessions.
Previous discussion on netdev determined that alternative solutions such as
spinlock writer synchronisation or per-cpu data would bring unjustified
overhead, given that most users interested in high volume traffic will likely
be running 64bit kernels on 64bit hardware.
As such, this patch replaces l2tp's use of u64_stats with atomic_long_t,
thereby avoiding the deadlock.
Ref:
http://marc.info/?l=linux-netdev&m=134029167910731&w=2http://marc.info/?l=linux-netdev&m=134079868111131&w=2
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If userspace deletes a ppp pseudowire using the netlink API, either by
directly deleting the session or by deleting the tunnel that contains the
session, we need to tear down the corresponding pppox channel.
Rather than trying to manage two pppox unbind codepaths, switch the netlink
and l2tp_core session_close handlers to close via. the l2tp_ppp socket
.release handler.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall
and l2tp_session_delete. Pseudowire implementations which are deleted only
via. l2tp_core l2tp_session_delete calls can dispense with their own code for
flushing the reorder queue.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an l2tp session is deleted, it is necessary to delete skbs in-flight
on the session's reorder queue before taking it down.
Rather than having each pseudowire implementation reaching into the
l2tp_session struct to handle this itself, provide a function in l2tp_core to
purge the session queue.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is valid for an existing struct sock object to have a NULL sk_socket
pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When looking up the tunnel socket in struct l2tp_tunnel, hold a reference
whether the socket was created by the kernel or by userspace.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a user deletes a tunnel using netlink, all the sessions in the tunnel
should also be deleted. Since running sessions will pin the tunnel socket
with the references they hold, have the l2tp_tunnel_delete close all sessions
in a tunnel before finally closing the tunnel socket.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
socket being closed from userspace. We need to do the same thing for
IP-encapsulation sockets.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a
tunnel when a UDP-encapsulation socket is destroyed. We need to do something
similar for IP-encapsulation sockets.
Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to
call it from their .destroy handlers.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
L2TP sessions hold a reference to the tunnel socket to prevent it going away
while sessions are still active. However, since tunnel destruction is handled
by the sock sk_destruct callback there is a catch-22: a tunnel with sessions
cannot be deleted since each session holds a reference to the tunnel socket.
If userspace closes a managed tunnel socket, or dies, the tunnel will persist
and it will be neccessary to individually delete the sessions using netlink
commands. This is ugly.
To prevent this occuring, this patch leverages the udp encapsulation socket
destroy callback to gain early notification when the tunnel socket is closed.
This allows us to safely close the sessions running in the tunnel, dropping
the tunnel socket references in the process. The tunnel socket is then
destroyed as normal, and the tunnel resources deallocated in sk_destruct.
While we're at it, ensure that l2tp_tunnel_closeall correctly drops session
references to allow the sessions to be deleted rather than leaking.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Users of udp encapsulation currently have an encap_rcv callback which they can
use to hook into the udp receive path.
In situations where a encapsulation user allocates resources associated with a
udp encap socket, it may be convenient to be able to also hook the proto
.destroy operation. For example, if an encap user holds a reference to the
udp socket, the destroy hook might be used to relinquish this reference.
This patch adds a socket destroy hook into udp, which is set and enabled
in the same way as the existing encap_rcv hook.
Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trigger BUG_ON if a group name is longer than GENL_NAMSIZ.
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are:
* Restrict IPv6 stateless NPT targets to the mangle table. Many users are
complaining that this target does not work in the nat table, which is the
wrong table for it, from Florian Westphal.
* Fix possible use before initialization in the netns init path of several
conntrack protocol trackers (introduced recently while improving conntrack
netns support), from Gao Feng.
* Fix incorrect initialization of copy_range in nfnetlink_queue, spotted
by Eric Dumazet during the NFWS2013, patch from myself.
* Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov.
* Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu
not required anymore after change introduced in 3.7, again from Julian.
* Fix SYN looping in IPVS state sync if the backup is used a real server
in DR/TUN modes, this required a new /proc entry to disable the director
function when acting as backup, also from Julian.
* Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by
Paul Bolle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kconfig symbol IP_NF_QUEUE is unused since commit
d16cf20e2f ("netfilter: remove ip_queue
support"). Let's remove it too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>