Commit Graph

49907 Commits

Author SHA1 Message Date
Johannes Berg
736a80bbfd mac80211: mesh: drop frames appearing to be from us
If there are multiple mesh stations with the same MAC address,
they will both get confused and start throwing warnings.

Obviously in this case nothing can actually work anyway, so just
drop frames that look like they're from ourselves early on.

Reported-by: Gui Iribarren <gui@altermundi.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-01-04 15:51:53 +01:00
Peter Große
3a3713ec36 mac80211: Fix setting TX power on monitor interfaces
Instead of calling ieee80211_recalc_txpower on monitor interfaces
directly, call it using the virtual monitor interface, if one exists.

In case of a single monitor interface given, reject setting TX power,
if no virtual monitor interface exists.

That being checked, don't warn in ieee80211_bss_info_change_notify,
after setting TX power on a monitor interface.

Fixes warning:
------------[ cut here ]------------
 WARNING: CPU: 0 PID: 2193 at net/mac80211/driver-ops.h:167
 ieee80211_bss_info_change_notify+0x111/0x190 Modules linked in: uvcvideo
 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
rndis_host cdc_ether usbnet mii tp_smapi(O) thinkpad_ec(O) ohci_hcd vboxpci(O)
 vboxnetadp(O) vboxnetflt(O) v boxdrv(O) x86_pkg_temp_thermal kvm_intel kvm
 irqbypass iwldvm iwlwifi ehci_pci ehci_hcd tpm_tis tpm_tis_core tpm CPU: 0
 PID: 2193 Comm: iw Tainted: G           O    4.12.12-gentoo #2 task:
 ffff880186fd5cc0 task.stack: ffffc90001b54000 RIP:
 0010:ieee80211_bss_info_change_notify+0x111/0x190 RSP: 0018:ffffc90001b57a10
 EFLAGS: 00010246 RAX: 0000000000000006 RBX: ffff8801052ce840 RCX:
 0000000000000064 RDX: 00000000fffffffc RSI: 0000000000040000 RDI:
 ffff8801052ce840 RBP: ffffc90001b57a38 R08: 0000000000000062 R09:
 0000000000000000 R10: ffff8802144b5000 R11: ffff880049dc4614 R12:
 0000000000040000 R13: 0000000000000064 R14: ffff8802105f0760 R15:
 ffffc90001b57b48 FS:  00007f92644b4580(0000) GS:ffff88021e200000(0000)
 knlGS:0000000000000000 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f9263c109f0 CR3: 00000001df850000 CR4: 00000000000406f0
 Call Trace:
  ieee80211_recalc_txpower+0x33/0x40
  ieee80211_set_tx_power+0x40/0x180
  nl80211_set_wiphy+0x32e/0x950

Reported-by: Peter Große <pegro@friiks.de>
Signed-off-by: Peter Große <pegro@friiks.de>

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-01-04 15:27:48 +01:00
Hao Chen
3ea15452ee nl80211: Check for the required netlink attribute presence
nl80211_nan_add_func() does not check if the required attribute
NL80211_NAN_FUNC_FOLLOW_UP_DEST is present when processing
NL80211_CMD_ADD_NAN_FUNCTION request. This request can be issued
by users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attribute presence.

Signed-off-by: Hao Chen <flank3rsky@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-01-04 15:22:02 +01:00
Marcelo Ricardo Leitner
79d0895140 sctp: fix error path in sctp_stream_init
syzbot noticed a NULL pointer dereference panic in sctp_stream_free()
which was caused by an incomplete error handling in sctp_stream_init().
By not clearing stream->outcnt, it made a for() in sctp_stream_free()
think that it had elements to free, but not, leading to the panic.

As suggested by Xin Long, this patch also simplifies the error path by
moving it to the only if() that uses it.

See-also: https://www.spinics.net/lists/netdev/msg473756.html
See-also: https://www.spinics.net/lists/netdev/msg465024.html
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: f952be79ce ("sctp: introduce struct sctp_stream_out_ext")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-03 11:29:42 -05:00
Mohamed Ghannam
c095508770 RDS: Heap OOB write in rds_message_alloc_sgs()
When args->nr_local is 0, nr_pages gets also 0 due some size
calculation via rds_rm_size(), which is later used to allocate
pages for DMA, this bug produces a heap Out-Of-Bound write access
to a specific memory region.

Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-03 11:23:05 -05:00
Ingo Molnar
475c5ee193 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

- Updates to use cond_resched() instead of cond_resched_rcu_qs()
  where feasible (currently everywhere except in kernel/rcu and
  in kernel/torture.c).  Also a couple of fixes to avoid sending
  IPIs to offline CPUs.

- Updates to simplify RCU's dyntick-idle handling.

- Updates to remove almost all uses of smp_read_barrier_depends()
  and read_barrier_depends().

- Miscellaneous fixes.

- Torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03 14:14:18 +01:00
Florian Fainelli
21602e1a55 net: dsa: Fix dsa_legacy_register() return value
We need to make the dsa_legacy_register() stub return 0 in order for
dsa_init_module() to successfully register and continue registering the
ETH_P_XDSA packet handler.

Fixes: 2a93c1a365 ("net: dsa: Allow compiling out legacy support")
Reported-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 21:52:48 -05:00
Jon Maloy
f9c935db80 tipc: fix problems with multipoint-to-point flow control
In commit 04d7b574b2 ("tipc: add multipoint-to-point flow control") we
introduced a protocol for preventing buffer overflow when many group
members try to simultaneously send messages to the same receiving member.

Stress test of this mechanism has revealed a couple of related bugs:

- When the receiving member receives an advertisement REMIT message from
  one of the senders, it will sometimes prematurely activate a pending
  member and send it the remitted advertisement, although the upper
  limit for active senders has been reached. This leads to accumulation
  of illegal advertisements, and eventually to messages being dropped
  because of receive buffer overflow.

- When the receiving member leaves REMITTED state while a received
  message is being read, we miss to look at the pending queue, to
  activate the oldest pending peer. This leads to some pending senders
  being starved out, and never getting the opportunity to profit from
  the remitted advertisement.

We fix the former in the function tipc_group_proto_rcv() by returning
directly from the function once it becomes clear that the remitting
peer cannot leave REMITTED state at that point.

We fix the latter in the function tipc_group_update_rcv_win() by looking
up and activate the longest pending peer when it becomes clear that the
remitting peer now can leave REMITTED state.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 21:52:07 -05:00
Stephen Hemminger
71891e2dab ethtool: do not print warning for applications using legacy API
In kernel log ths message appears on every boot:
 "warning: `NetworkChangeNo' uses legacy ethtool link settings API,
  link modes are only partially reported"

When ethtool link settings API changed, it started complaining about
usages of old API. Ironically, the original patch was from google but
the application using the legacy API is chrome.

Linux ABI is fixed as much as possible. The kernel must not break it
and should not complain about applications using legacy API's.
This patch just removes the warning since using legacy API's
in Linux is perfectly acceptable.

Fixes: 3f1ac7a700 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 21:49:17 -05:00
Masami Hiramatsu
a56c1470c2 net: dccp: Remove dccpprobe module
Remove DCCP probe module since jprobe has been deprecated.
That function is now replaced by dccp/dccp_probe trace-event.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 14:27:30 -05:00
Masami Hiramatsu
ee549be6f0 net: dccp: Add DCCP sendmsg trace event
Add DCCP sendmsg trace event (dccp/dccp_probe) for
replacing dccpprobe. User can trace this event via
ftrace or perftools.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 14:27:30 -05:00
Masami Hiramatsu
fa4475f792 net: sctp: Remove debug SCTP probe module
Remove SCTP probe module since jprobe has been deprecated.
That function is now replaced by sctp/sctp_probe and
sctp/sctp_probe_path trace-events.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 14:27:29 -05:00
Masami Hiramatsu
103d750c88 net: sctp: Add SCTP ACK tracking trace event
Add SCTP ACK tracking trace event to trace the changes of SCTP
association state in response to incoming packets.
It is used for debugging SCTP congestion control algorithms,
and will replace sctp_probe module.

Note that this event a bit tricky. Since this consists of 2
events (sctp_probe and sctp_probe_path) so you have to enable
both events as below.

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/sctp/sctp_probe/enable
  # echo 1 > events/sctp/sctp_probe_path/enable

Or, you can enable all the events under sctp.

  # echo 1 > events/sctp/enable

Since sctp_probe_path event is always invoked from sctp_probe
event, you can not see any output if you only enable
sctp_probe_path.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 14:27:29 -05:00
Masami Hiramatsu
6987990c3e net: tcp: Remove TCP probe module
Remove TCP probe module since jprobe has been deprecated.
That function is now replaced by tcp/tcp_probe trace-event.
You can use it via ftrace or perftools.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 14:27:29 -05:00
Masami Hiramatsu
c3fde1bd28 net: tcp: Add trace events for TCP congestion window tracing
This adds an event to trace TCP stat variables with
slightly intrusive trace-event. This uses ftrace/perf
event log buffer to trace those state, no needs to
prepare own ring-buffer, nor custom user apps.

User can use ftrace to trace this event as below;

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/tcp/tcp_probe/enable
  (run workloads)
  # cat trace

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 14:27:29 -05:00
Kristian Evensen
bbb6189df4 inet_diag: Add equal-operator for ports
inet_diag currently provides less/greater than or equal operators for
comparing ports when filtering sockets. An equal comparison can be
performed by combining the two existing operators, or a user can for
example request a port range and then do the final filtering in
userspace. However, these approaches both have drawbacks. Implementing
equal using LE/GE causes the size and complexity of a filter to grow
quickly as the number of ports increase, while it on busy machines would
be great if the kernel only returns information about relevant sockets.

This patch introduces source and destination port equal operators.
INET_DIAG_BC_S_EQ is used to match a source port, INET_DIAG_BC_D_EQ a
destination port, and usage is the same as for the existing port
operators.  I.e., the port to match is stored in the no-member of the
next inet_diag_bc_op-struct in the filter.

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:54:04 -05:00
Julia Lawall
e0b10844d9 openvswitch: drop unneeded newline
OVS_NLERR prints a newline at the end of the message string, so the
message string does not need to include a newline explicitly.  Done
using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:49:53 -05:00
Julia Lawall
62262ffd95 net: dccp: drop unneeded newline
DCCP_CRIT prints some other text and then a newline after the message
string, so the message string does not need to include a newline
explicitly.  Done using Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:49:32 -05:00
Wei Yongjun
9540d97761 net: sched: fix skb leak in dev_requeue_skb()
When dev_requeue_skb() is called with bulked skb list, only the
first skb of the list will be requeued to qdisc layer, and leak
the others without free them.

TCP is broken due to skb leak since no free skb will be considered
as still in the host queue and never be retransmitted. This happend
when dev_requeue_skb() called from qdisc_restart().
  qdisc_restart
  |-- dequeue_skb
  |-- sch_direct_xmit()
      |-- dev_requeue_skb() <-- skb may bluked

Fix dev_requeue_skb() to requeue the full bluked list. Also change
to use __skb_queue_tail() in __dev_requeue_skb() to avoid skb out
of order.

Fixes: a53851e2c3 ("net: sched: explicit locking in gso_cpu fallback")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:48:29 -05:00
Roi Dayan
3bb23421a5 net/sched: Fix update of lastuse in act modules implementing stats_update
We need to update lastuse to to the most updated value between what
is already set and the new value.
If HW matching fails, i.e. because of an issue, the stats are not updated
but it could be that software did match and updated lastuse.

Fixes: 5712bf9c5c ("net/sched: act_mirred: Use passed lastuse argument")
Fixes: 9fea47d93b ("net/sched: act_gact: Update statistics when offloaded to hardware")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:27:52 -05:00
Nogah Frankel
44edf2f897 net: sched: Move offload check till after dump call
Move the check of the offload state to after the qdisc dump action was
called, so the qdisc could update it if it was changed.

Fixes: 7a4fa29106 ("net: sched: Add TCA_HW_OFFLOAD")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:17:45 -05:00
Nogah Frankel
8234af2db3 net_sch: red: Fix the new offload indication
Update the offload flag, TCQ_F_OFFLOADED, in each dump call (and ignore
the offloading function return value in relation to this flag).
This is done because a qdisc is being initialized, and therefore offloaded
before being grafted. Since the ability of the driver to offload the qdisc
depends on its location, a qdisc can be offloaded and un-offloaded by graft
calls, that doesn't effect the qdisc itself.

Fixes: 428a68af3a ("net: sched: Move to new offload indication in RED"
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 13:17:45 -05:00
Xin Long
2fa771be95 ip6_tunnel: allow ip6gre dev mtu to be set below 1280
Commit 582442d6d5 ("ipv6: Allow the MTU of ipip6 tunnel to be set
below 1280") fixed a mtu setting issue. It works for ipip6 tunnel.

But ip6gre dev updates the mtu also with ip6_tnl_change_mtu. Since
the inner packet over ip6gre can be ipv4 and it's mtu should also
be allowed to set below 1280, the same issue also exists on ip6gre.

This patch is to fix it by simply changing to check if parms.proto
is IPPROTO_IPV6 in ip6_tnl_change_mtu instead, to make ip6gre to
go to 'else' branch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 12:36:14 -05:00
Eli Cooper
23263ec86a ip6_tunnel: disable dst caching if tunnel is dual-stack
When an ip6_tunnel is in mode 'any', where the transport layer
protocol can be either 4 or 41, dst_cache must be disabled.

This is because xfrm policies might apply to only one of the two
protocols. Caching dst would cause xfrm policies for one protocol
incorrectly used for the other.

Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 12:31:12 -05:00
David S. Miller
55a5ec9b77 Revert "net: core: dev_get_valid_name is now the same as dev_alloc_name_ns"
This reverts commit 87c320e515.

Changing the error return code in some situations turns out to
be harmful in practice.  In particular Michael Ellerman reports
that DHCP fails on his powerpc machines, and this revert gets
things working again.

Johannes Berg agrees that this revert is the best course of
action for now.

Fixes: 029b6d1405 ("Revert "net: core: maybe return -EEXIST in __dev_alloc_name"")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 11:50:12 -05:00
Greg Kroah-Hartman
87ad3722bf Merge 4.15-rc6 into staging-next
We need the staging fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 15:02:04 +01:00
Sabrina Dubroca
2f10a61cee xfrm: fix rcu usage in xfrm_get_type_offload
request_module can sleep, thus we cannot hold rcu_read_lock() while
calling it. The function also jumps back and takes rcu_read_lock()
again (in xfrm_state_get_afinfo()), resulting in an imbalance.

This codepath is triggered whenever a new offloaded state is created.

Fixes: ffdb5211da ("xfrm: Auto-load xfrm offload modules")
Reported-by: syzbot+ca425f44816d749e8eb49755567a75ee48cf4a30@syzkaller.appspotmail.com
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-31 16:29:24 +01:00
Eric Biggers
4e765b4972 af_key: fix buffer overread in parse_exthdrs()
If a message sent to a PF_KEY socket ended with an incomplete extension
header (fewer than 4 bytes remaining), then parse_exthdrs() read past
the end of the message, into uninitialized memory.  Fix it by returning
-EINVAL in this case.

Reproducer:

	#include <linux/pfkeyv2.h>
	#include <sys/socket.h>
	#include <unistd.h>

	int main()
	{
		int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
		char buf[17] = { 0 };
		struct sadb_msg *msg = (void *)buf;

		msg->sadb_msg_version = PF_KEY_V2;
		msg->sadb_msg_type = SADB_DELETE;
		msg->sadb_msg_len = 2;

		write(sock, buf, 17);
	}

Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30 09:52:08 +01:00
Eric Biggers
06b335cb51 af_key: fix buffer overread in verify_address_len()
If a message sent to a PF_KEY socket ended with one of the extensions
that takes a 'struct sadb_address' but there were not enough bytes
remaining in the message for the ->sa_family member of the 'struct
sockaddr' which is supposed to follow, then verify_address_len() read
past the end of the message, into uninitialized memory.  Fix it by
returning -EINVAL in this case.

This bug was found using syzkaller with KMSAN.

Reproducer:

	#include <linux/pfkeyv2.h>
	#include <sys/socket.h>
	#include <unistd.h>

	int main()
	{
		int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
		char buf[24] = { 0 };
		struct sadb_msg *msg = (void *)buf;
		struct sadb_address *addr = (void *)(msg + 1);

		msg->sadb_msg_version = PF_KEY_V2;
		msg->sadb_msg_type = SADB_DELETE;
		msg->sadb_msg_len = 3;
		addr->sadb_address_len = 1;
		addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;

		write(sock, buf, 24);
	}

Reported-by: Alexander Potapenko <glider@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30 09:52:07 +01:00
Florian Westphal
862591bf4f xfrm: skip policies marked as dead while rehashing
syzkaller triggered following KASAN splat:

BUG: KASAN: slab-out-of-bounds in xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
read of size 2 at addr ffff8801c8e92fe4 by task kworker/1:1/23 [..]
Workqueue: events xfrm_hash_rebuild [..]
 __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:428
 xfrm_hash_rebuild+0xdbe/0xf00 net/xfrm/xfrm_policy.c:618
 process_one_work+0xbbf/0x1b10 kernel/workqueue.c:2112
 worker_thread+0x223/0x1990 kernel/workqueue.c:2246 [..]

The reproducer triggers:
1016                 if (error) {
1017                         list_move_tail(&walk->walk.all, &x->all);
1018                         goto out;
1019                 }

in xfrm_policy_walk() via pfkey (it sets tiny rcv space, dump
callback returns -ENOBUFS).

In this case, *walk is located the pfkey socket struct, so this socket
becomes visible in the global policy list.

It looks like this is intentional -- phony walker has walk.dead set to 1
and all other places skip such "policies".

Ccing original authors of the two commits that seem to expose this
issue (first patch missed ->dead check, second patch adds pfkey
sockets to policies dumper list).

Fixes: 880a6fab8f ("xfrm: configure policy hash table thresholds by netlink")
Fixes: 12a169e7d8 ("ipsec: Put dumpers on the dump list")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Timo Teras <timo.teras@iki.fi>
Cc: Christophe Gouault <christophe.gouault@6wind.com>
Reported-by: syzbot <bot+c028095236fcb6f4348811565b75084c754dc729@syzkaller.appspotmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30 09:18:47 +01:00
Herbert Xu
257a4b018d xfrm: Forbid state updates from changing encap type
Currently we allow state updates to competely replace the contents
of x->encap.  This is bad because on the user side ESP only sets up
header lengths depending on encap_type once when the state is first
created.  This could result in the header lengths getting out of
sync with the actual state configuration.

In practice key managers will never do a state update to change the
encapsulation type.  Only the port numbers need to be changed as the
peer NAT entry is updated.

Therefore this patch adds a check in xfrm_state_update to forbid
any changes to the encap_type.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-12-30 09:18:47 +01:00
David S. Miller
6bb8824732 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/ipv6/ip6_gre.c is a case of parallel adds.

include/trace/events/tcp.h is a little bit more tricky.  The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-29 15:42:26 -05:00
Tom Herbert
d66fa9ec53 strparser: Call sock_owned_by_user_nocheck
strparser wants to check socket ownership without producing any
warnings. As indicated by the comment in the code, it is permissible
for owned_by_user to return true.

Fixes: 43a0c6751a ("strparser: Stream parser for messages")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-and-tested-by: <syzbot+c91c53af67f9ebe599a337d2e70950366153b295@syzkaller.appspotmail.com>
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 14:28:22 -05:00
Willem de Bruijn
f72c4ac695 skbuff: in skb_copy_ubufs unclone before releasing zerocopy
skb_copy_ubufs must unclone before it is safe to modify its
skb_shared_info with skb_zcopy_clear.

Commit b90ddd5687 ("skbuff: skb_copy_ubufs must release uarg even
without user frags") ensures that all skbs release their zerocopy
state, even those without frags.

But I forgot an edge case where such an skb arrives that is cloned.

The stack does not build such packets. Vhost/tun skbs have their
frags orphaned before cloning. TCP skbs only attach zerocopy state
when a frag is added.

But if TCP packets can be trimmed or linearized, this might occur.
Tracing the code I found no instance so far (e.g., skb_linearize
ends up calling skb_zcopy_clear if !skb->data_len).

Still, it is non-obvious that no path exists. And it is fragile to
rely on this.

Fixes: b90ddd5687 ("skbuff: skb_copy_ubufs must release uarg even without user frags")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 14:26:22 -05:00
Jiri Pirko
8ec6957403 net: sched: don't set extack message in case the qdisc will be created
If the qdisc is not found here, it is going to be created. Therefore,
this is not an error path. Remove the extack message set and don't
confuse user with error message in case the qdisc was created
successfully.

Fixes: 0921559811 ("net: sched: sch_api: handle generic qdisc errors")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:18:31 -05:00
Parthasarathy Bhuvaragan
517d7c79bd tipc: fix hanging poll() for stream sockets
In commit 42b531de17 ("tipc: Fix missing connection request
handling"), we replaced unconditional wakeup() with condtional
wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND.

This breaks the applications which do a connect followed by poll
with POLLOUT flag. These applications are not woken when the
connection is ESTABLISHED and hence sleep forever.

In this commit, we fix it by including the POLLOUT event for
sockets in TIPC_CONNECTING state.

Fixes: 42b531de17 ("tipc: Fix missing connection request handling")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:15:26 -05:00
David S. Miller
fcffe2edbd Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-28

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Fix incorrect state pruning related to recognition of zero initialized
   stack slots, where stacksafe exploration would mistakenly return a
   positive pruning verdict too early ignoring other slots, from Gianluca.

2) Various BPF to BPF calls related follow-up fixes. Fix an off-by-one
   in maximum call depth check, and rework maximum stack depth tracking
   logic to fix a bypass of the total stack size check reported by Jann.
   Also fix a bug in arm64 JIT where prog->jited_len was uninitialized.
   Addition of various test cases to BPF selftests, from Alexei.

3) Addition of a BPF selftest to test_verifier that is related to BPF to
   BPF calls which demonstrates a late caller stack size increase and
   thus out of bounds access. Fixed above in 2). Test case from Jann.

4) Addition of correlating BPF helper calls, BPF to BPF calls as well
   as BPF maps to bpftool xlated dump in order to allow for better
   BPF program introspection and debugging, from Daniel.

5) Fixing several bugs in BPF to BPF calls kallsyms handling in order
   to get it actually to work for subprogs, from Daniel.

6) Extending sparc64 JIT support for BPF to BPF calls and fix a couple
   of build errors for libbpf on sparc64, from David.

7) Allow narrower context access for BPF dev cgroup typed programs in
   order to adapt to LLVM code generation. Also adjust memlock rlimit
   in the test_dev_cgroup BPF selftest, from Yonghong.

8) Add netdevsim Kconfig entry to BPF selftests since test_offload.py
   relies on netdevsim device being available, from Jakub.

9) Reduce scope of xdp_do_generic_redirect_map() to being static,
   from Xiongwei.

10) Minor cleanups and spelling fixes in BPF verifier, from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 20:40:32 -05:00
Willem de Bruijn
8ddab50839 tcp: do not allocate linear memory for zerocopy skbs
Zerocopy payload is now always stored in frags, and space for headers
is reversed, so this memory is unused.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 16:44:14 -05:00
Willem de Bruijn
02583adeff tcp: place all zerocopy payload in frags
This avoids an unnecessary copy of 1-2KB and improves tso_fragment,
which has to fall back to tcp_fragment if skb->len != skb_data_len.

It also avoids a surprising inconsistency in notifications:
Zerocopy packets sent over loopback have their frags copied, so set
SO_EE_CODE_ZEROCOPY_COPIED in the notification. But this currently
does not happen for small packets, because when all data fits in the
linear fragment, data is not copied in skb_orphan_frags_rx.

Reported-by: Tom Deseyn <tom.deseyn@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 16:44:13 -05:00
Willem de Bruijn
111856c758 tcp: push full zerocopy packets
Skbs that reach MAX_SKB_FRAGS cannot be extended further. Do the
same for zerocopy frags as non-zerocopy frags and set the PSH bit.
This improves GRO assembly.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 16:44:13 -05:00
Willem de Bruijn
bf5c25d608 skbuff: in skb_segment, call zerocopy functions once per nskb
This is a net-next follow-up to commit 268b790679 ("skbuff: orphan
frags before zerocopy clone"), which fixed a bug in net, but added a
call to skb_zerocopy_clone at each frag to do so.

When segmenting skbs with user frags, either the user frags must be
replaced with private copies and uarg released, or the uarg must have
its refcount increased for each new skb.

skb_orphan_frags does the first, except for cases that can handle
reference counting. skb_zerocopy_clone then does the second.

Call these once per nskb, instead of once per frag.

That is, in the common case. With a frag list, also refresh when the
origin skb (frag_skb) changes.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 16:44:13 -05:00
Tonghao Zhang
8cb38a6024 sctp: Replace use of sockets_allocated with specified macro.
The patch(180d8cd942) replaces all uses of struct sock fields'
memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem
to accessor macros. But the sockets_allocated field of sctp sock is
not replaced at all. Then replace it now for unifying the code.

Fixes: 180d8cd942 ("foundations of per-cgroup memory pressure controlling.")
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Tonghao Zhang <zhangtonghao@didichuxing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 13:47:52 -05:00
Sowmini Varadhan
66261da169 rds: tcp: cleanup if kmem_cache_alloc fails in rds_tcp_conn_alloc()
If kmem_cache_alloc() fails in the middle of the for() loop,
cleanup anything that might have been allocated so far.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 13:37:27 -05:00
Sowmini Varadhan
b319109396 rds: tcp: initialize t_tcp_detached to false
Commit f10b4cff98 ("rds: tcp: atomically purge entries from
rds_tcp_conn_list during netns delete") adds the field t_tcp_detached,
but this needs to be initialized explicitly to false.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 13:37:27 -05:00
Sowmini Varadhan
7ae0c649c4 rds; Reset rs->rs_bound_addr in rds_add_bound() failure path
If the rds_sock is not added to the bind_hash_table, we must
reset rs_bound_addr so that rds_remove_bound will not trip on
this rds_sock.

rds_add_bound() does a rds_sock_put() in this failure path, so
failing to reset rs_bound_addr will result in a socket refcount
bug, and will trigger a WARN_ON with the stack shown below when
the application subsequently tries to close the PF_RDS socket.

     WARNING: CPU: 20 PID: 19499 at net/rds/af_rds.c:496 \
		rds_sock_destruct+0x15/0x30 [rds]
       :
     __sk_destruct+0x21/0x190
     rds_remove_bound.part.13+0xb6/0x140 [rds]
     rds_release+0x71/0x120 [rds]
     sock_release+0x1a/0x70
     sock_close+0xe/0x20
     __fput+0xd5/0x210
     task_work_run+0x82/0xa0
     do_exit+0x2ce/0xb30
     ? syscall_trace_enter+0x1cc/0x2b0
     do_group_exit+0x39/0xa0
     SyS_exit_group+0x10/0x10
     do_syscall_64+0x61/0x1a0

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 13:37:27 -05:00
Lorenzo Bianconi
f15bc54eee l2tp: add peer_offset parameter
Introduce peer_offset parameter in order to add the capability
to specify two different values for payload offset on tx/rx side.
If just offset is provided by userspace use it for rx side as well
in order to maintain compatibility with older l2tp versions

Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 12:11:51 -05:00
Hangbin Liu
820da53575 l2tp: fix missing print session offset info
Report offset parameter in L2TP_CMD_SESSION_GET command if
it has been configured by userspace

Fixes: 309795f4be ("l2tp: Add netlink control API for L2TP")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 12:11:51 -05:00
David S. Miller
9f30e5c5c2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-12-22

1) Separate ESP handling from segmentation for GRO packets.
   This unifies the IPsec GSO and non GSO codepath.

2) Add asynchronous callbacks for xfrm on layer 2. This
   adds the necessary infrastructure to core networking.

3) Allow to use the layer2 IPsec GSO codepath for software
   crypto, all infrastructure is there now.

4) Also allow IPsec GSO with software crypto for local sockets.

5) Don't require synchronous crypto fallback on IPsec offloading,
   it is not needed anymore.

6) Check for xdo_dev_state_free and only call it if implemented.
   From Shannon Nelson.

7) Check for the required add and delete functions when a driver
   registers xdo_dev_ops. From Shannon Nelson.

8) Define xfrmdev_ops only with offload config.
   From Shannon Nelson.

9) Update the xfrm stats documentation.
   From Shannon Nelson.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 11:15:14 -05:00
David S. Miller
65bbbf6c20 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2017-12-22

1) Check for valid id proto in validate_tmpl(), otherwise
   we may trigger a warning in xfrm_state_fini().
   From Cong Wang.

2) Fix a typo on XFRMA_OUTPUT_MARK policy attribute.
   From Michal Kubecek.

3) Verify the state is valid when encap_type < 0,
   otherwise we may crash on IPsec GRO .
   From Aviv Heller.

4) Fix stack-out-of-bounds read on socket policy lookup.
   We access the flowi of the wrong address family in the
   IPv4 mapped IPv6 case, fix this by catching address
   family missmatches before we do the lookup.

5) fix xfrm_do_migrate() with AEAD to copy the geniv
   field too. Otherwise the state is not fully initialized
   and migration fails. From Antony Antony.

6) Fix stack-out-of-bounds with misconfigured transport
   mode policies. Our policy template validation is not
   strict enough. It is possible to configure policies
   with transport mode template where the address family
   of the template does not match the selectors address
   family. Fix this by refusing such a configuration,
   address family can not change on transport mode.

7) Fix a policy reference leak when reusing pcpu xdst
   entry. From Florian Westphal.

8) Reinject transport-mode packets through tasklet,
   otherwise it is possible to reate a recursion
   loop. From Herbert Xu.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 10:58:23 -05:00
Tommi Rantala
642a8439dd tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path
Calling tipc_mon_delete() before the monitor has been created will oops.
This can happen in tipc_enable_bearer() error path if tipc_disc_create()
fails.

[   48.589074] BUG: unable to handle kernel paging request at 0000000000001008
[   48.590266] IP: tipc_mon_delete+0xea/0x270 [tipc]
[   48.591223] PGD 1e60c5067 P4D 1e60c5067 PUD 1eb0cf067 PMD 0
[   48.592230] Oops: 0000 [#1] SMP KASAN
[   48.595610] CPU: 5 PID: 1199 Comm: tipc Tainted: G    B            4.15.0-rc4-pc64-dirty #5
[   48.597176] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[   48.598489] RIP: 0010:tipc_mon_delete+0xea/0x270 [tipc]
[   48.599347] RSP: 0018:ffff8801d827f668 EFLAGS: 00010282
[   48.600705] RAX: ffff8801ee813f00 RBX: 0000000000000204 RCX: 0000000000000000
[   48.602183] RDX: 1ffffffff1de6a75 RSI: 0000000000000297 RDI: 0000000000000297
[   48.604373] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff1dd1533
[   48.605607] R10: ffffffff8eafbb05 R11: fffffbfff1dd1534 R12: 0000000000000050
[   48.607082] R13: dead000000000200 R14: ffffffff8e73f310 R15: 0000000000001020
[   48.608228] FS:  00007fc686484800(0000) GS:ffff8801f5540000(0000) knlGS:0000000000000000
[   48.610189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   48.611459] CR2: 0000000000001008 CR3: 00000001dda70002 CR4: 00000000003606e0
[   48.612759] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   48.613831] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   48.615038] Call Trace:
[   48.615635]  tipc_enable_bearer+0x415/0x5e0 [tipc]
[   48.620623]  tipc_nl_bearer_enable+0x1ab/0x200 [tipc]
[   48.625118]  genl_family_rcv_msg+0x36b/0x570
[   48.631233]  genl_rcv_msg+0x5a/0xa0
[   48.631867]  netlink_rcv_skb+0x1cc/0x220
[   48.636373]  genl_rcv+0x24/0x40
[   48.637306]  netlink_unicast+0x29c/0x350
[   48.639664]  netlink_sendmsg+0x439/0x590
[   48.642014]  SYSC_sendto+0x199/0x250
[   48.649912]  do_syscall_64+0xfd/0x2c0
[   48.650651]  entry_SYSCALL64_slow_path+0x25/0x25
[   48.651843] RIP: 0033:0x7fc6859848e3
[   48.652539] RSP: 002b:00007ffd25dff938 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[   48.654003] RAX: ffffffffffffffda RBX: 00007ffd25dff990 RCX: 00007fc6859848e3
[   48.655303] RDX: 0000000000000054 RSI: 00007ffd25dff990 RDI: 0000000000000003
[   48.656512] RBP: 00007ffd25dff980 R08: 00007fc685c35fc0 R09: 000000000000000c
[   48.657697] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000d13010
[   48.658840] R13: 00007ffd25e009c0 R14: 0000000000000000 R15: 0000000000000000
[   48.662972] RIP: tipc_mon_delete+0xea/0x270 [tipc] RSP: ffff8801d827f668
[   48.664073] CR2: 0000000000001008
[   48.664576] ---[ end trace e811818d54d5ce88 ]---

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 10:55:00 -05:00