Commit Graph

61760 Commits

Author SHA1 Message Date
Dan Carpenter
aeadd93f2b net: sched: act_ipt: check for underflow in __tcf_ipt_init()
If "td->u.target_size" is larger than sizeof(struct xt_entry_target) we
return -EINVAL.  But we don't check whether it's smaller than
sizeof(struct xt_entry_target) and that could lead to an out of bounds
read.

Fixes: 7ba699c604 ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:34:14 -07:00
David S. Miller
2240c12d7d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2018-10-01

1) Make xfrmi_get_link_net() static to silence a sparse warning.
   From Wei Yongjun.

2) Remove a unused esph pointer definition in esp_input().
   From Haishuang Yan.

3) Allow the NIC driver to quietly refuse xfrm offload
   in case it does not support it, the SA is created
   without offload in this case.
   From Shannon Nelson.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:31:17 -07:00
David S. Miller
ee0b6f4834 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2018-10-01

1) Validate address prefix lengths in the xfrm selector,
   otherwise we may hit undefined behaviour in the
   address matching functions if the prefix is too
   big for the given address family.

2) Fix skb leak on local message size errors.
   From Thadeu Lima de Souza Cascardo.

3) We currently reset the transport header back to the network
   header after a transport mode transformation is applied. This
   leads to an incorrect transport header when multiple transport
   mode transformations are applied. Reset the transport header
   only after all transformations are already applied to fix this.
   From Sowmini Varadhan.

4) We only support one offloaded xfrm, so reset crypto_done after
   the first transformation in xfrm_input(). Otherwise we may call
   the wrong input method for subsequent transformations.
   From Sowmini Varadhan.

5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
   skb_dst_force does not really force a dst refcount anymore, it might
   clear it instead. xfrm code did not expect this, add a check to not
   dereference skb_dst() if it was cleared by skb_dst_force.

6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds
   read in xfrm_state_find. From Sean Tranchetti.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 22:29:25 -07:00
Yuchung Cheng
041a14d267 tcp: start receiver buffer autotuning sooner
Previously receiver buffer auto-tuning starts after receiving
one advertised window amount of data. After the initial receiver
buffer was raised by patch a337531b94 ("tcp: up initial rmem to
128KB and SYN rwin to around 64KB"), the reciver buffer may take
too long to start raising. To address this issue, this patch lowers
the initial bytes expected to receive roughly the expected sender's
initial window.

Fixes: a337531b94 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 15:45:26 -07:00
Eric Dumazet
1ad98e9d1b tcp/dccp: fix lockdep issue when SYN is backlogged
In normal SYN processing, packets are handled without listener
lock and in RCU protected ingress path.

But syzkaller is known to be able to trick us and SYN
packets might be processed in process context, after being
queued into socket backlog.

In commit 06f877d613 ("tcp/dccp: fix other lockdep splats
accessing ireq_opt") I made a very stupid fix, that happened
to work mostly because of the regular path being RCU protected.

Really the thing protecting ireq->ireq_opt is RCU read lock,
and the pseudo request refcnt is not relevant.

This patch extends what I did in commit 449809a66c ("tcp/dccp:
block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
pair in the paths that might be taken when processing SYN from
socket backlog (thus possibly in process context)

Fixes: 06f877d613 ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 15:42:13 -07:00
David S. Miller
c8424ddd97 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree:

1) Skip ip_sabotage_in() for packet making into the VRF driver,
   otherwise packets are dropped, from David Ahern.

2) Clang compilation warning uncovering typo in the
   nft_validate_register_store() call from nft_osf, from Stefan Agner.

3) Double sizeof netlink message length calculations in ctnetlink,
   from zhong jiang.

4) Missing rb_erase() on batch full in rbtree garbage collector,
   from Taehee Yoo.

5) Calm down compilation warning in nf_hook(), from Florian Westphal.

6) Missing check for non-null sk in xt_socket before validating
   netns procedence, from Flavio Leitner.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-01 15:41:01 -07:00
Roman Gushchin
8bad74f984 bpf: extend cgroup bpf core to allow multiple cgroup storage types
In order to introduce per-cpu cgroup storage, let's generalize
bpf cgroup core to support multiple cgroup storage types.
Potentially, per-node cgroup storage can be added later.

This commit is mostly a formal change that replaces
cgroup_storage pointer with a array of cgroup_storage pointers.
It doesn't actually introduce a new storage type,
it will be done later.

Each bpf program is now able to have one cgroup storage of each type.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01 16:18:32 +02:00
Yu Zhao
1db5852945 cfg80211: fix use-after-free in reg_process_hint()
reg_process_hint_country_ie() can free regulatory_request and return
REG_REQ_ALREADY_SET. We shouldn't use regulatory_request after it's
called. KASAN error was observed when this happens.

BUG: KASAN: use-after-free in reg_process_hint+0x839/0x8aa [cfg80211]
Read of size 4 at addr ffff8800c430d434 by task kworker/1:3/89
<snipped>
Workqueue: events reg_todo [cfg80211]
Call Trace:
 dump_stack+0xc1/0x10c
 ? _atomic_dec_and_lock+0x1ad/0x1ad
 ? _raw_spin_lock_irqsave+0xa0/0xd2
 print_address_description+0x86/0x26f
 ? reg_process_hint+0x839/0x8aa [cfg80211]
 kasan_report+0x241/0x29b
 reg_process_hint+0x839/0x8aa [cfg80211]
 reg_todo+0x204/0x5b9 [cfg80211]
 process_one_work+0x55f/0x8d0
 ? worker_detach_from_pool+0x1b5/0x1b5
 ? _raw_spin_unlock_irq+0x65/0xdd
 ? _raw_spin_unlock_irqrestore+0xf3/0xf3
 worker_thread+0x5dd/0x841
 ? kthread_parkme+0x1d/0x1d
 kthread+0x270/0x285
 ? pr_cont_work+0xe3/0xe3
 ? rcu_read_unlock_sched_notrace+0xca/0xca
 ret_from_fork+0x22/0x40

Allocated by task 2718:
 set_track+0x63/0xfa
 __kmalloc+0x119/0x1ac
 regulatory_hint_country_ie+0x38/0x329 [cfg80211]
 __cfg80211_connect_result+0x854/0xadd [cfg80211]
 cfg80211_rx_assoc_resp+0x3bc/0x4f0 [cfg80211]
smsc95xx v1.0.6
 ieee80211_sta_rx_queued_mgmt+0x1803/0x7ed5 [mac80211]
 ieee80211_iface_work+0x411/0x696 [mac80211]
 process_one_work+0x55f/0x8d0
 worker_thread+0x5dd/0x841
 kthread+0x270/0x285
 ret_from_fork+0x22/0x40

Freed by task 89:
 set_track+0x63/0xfa
 kasan_slab_free+0x6a/0x87
 kfree+0xdc/0x470
 reg_process_hint+0x31e/0x8aa [cfg80211]
 reg_todo+0x204/0x5b9 [cfg80211]
 process_one_work+0x55f/0x8d0
 worker_thread+0x5dd/0x841
 kthread+0x270/0x285
 ret_from_fork+0x22/0x40
<snipped>

Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-01 09:14:03 +02:00
Felix Fietkau
211710ca74 mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys
key->sta is only valid after ieee80211_key_link, which is called later
in this function. Because of that, the IEEE80211_KEY_FLAG_RX_MGMT is
never set when management frame protection is enabled.

Fixes: e548c49e6d ("mac80211: add key flag for management keys")
Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-01 09:13:48 +02:00
Stefan Seyfried
848e616e66 cfg80211: fix wext-compat memory leak
cfg80211_wext_giwrate and sinfo.pertid might allocate sinfo.pertid via
rdev_get_station(), but never release it. Fix that.

Fixes: 8689c051a2 ("cfg80211: dynamically allocate per-tid stats for station info")
Signed-off-by: Stefan Seyfried <seife+kernel@b1-systems.com>
[johannes: fix error path, use cfg80211_sinfo_release_content(), add Fixes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-01 09:11:36 +02:00
Trond Myklebust
571ed1fd23 SUNRPC: Replace krb5_seq_lock with a lockless scheme
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:18 -04:00
Trond Myklebust
0c1c19f46e SUNRPC: Lockless lookup of RPCSEC_GSS mechanisms
Use RCU protected lookups for discovering the supported mechanisms.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:17 -04:00
Trond Myklebust
4e4c3bef44 SUNRPC: Remove rpc_authflavor_lock in favour of RCU locking
Module removal is RCU safe by design, so we really have no need to
lock the auth_flavors[] array. Substitute a lockless scheme to
add/remove entries in the array, and then use rcu.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:17 -04:00
Trond Myklebust
ec846469ba SUNRPC: Unexport xdr_partial_copy_from_skb()
It is no longer used outside of net/sunrpc/socklib.c

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
4f54614975 SUNRPC: Clean up xs_udp_data_receive()
Simplify the retry logic.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
550aebfe1c SUNRPC: Allow AF_LOCAL sockets to use the generic stream receive
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
c50b8ee02f SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive()
In preparation for sharing with AF_LOCAL.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
277e4ab7d5 SUNRPC: Simplify TCP receive code by switching to using iterators
Most of this code should also be reusable with other socket types.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
9d96acbc7f SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()
Add a bvec array to struct xdr_buf, and have the client allocate it
when we need to receive data into pages.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
431f6eb357 SUNRPC: Add a label for RPC calls that require allocation on receive
If the RPC call relies on the receive call allocating pages as buffers,
then let's label it so that we
a) Don't leak memory by allocating pages for requests that do not expect
   this behaviour
b) Can optimise for the common case where calls do not require allocation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
79c99152a3 SUNRPC: Convert the xprt->sending queue back to an ordinary wait queue
We no longer need priority semantics on the xprt->sending queue, because
the order in which tasks are sent is now dictated by their position in
the send queue.
Note that the backlog queue remains a priority queue, meaning that
slot resources are still managed in order of task priority.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
f42f7c2830 SUNRPC: Fix priority queue fairness
Fix up the priority queue to not batch by owner, but by queue, so that
we allow '1 << priority' elements to be dequeued before switching to
the next priority queue.
The owner field is still used to wake up requests in round robin order
by owner to avoid single processes hogging the RPC layer by loading the
queues.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
95f7691daa SUNRPC: Convert xprt receive queue to use an rbtree
If the server is slow, we can find ourselves with quite a lot of entries
on the receive queue. Converting the search from an O(n) to O(log(n))
can make a significant difference, particularly since we have to hold
a number of locks while searching.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
bd79bc579c SUNRPC: Don't take transport->lock unnecessarily when taking XPRT_LOCK
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
adfa71446d SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:16 -04:00
Trond Myklebust
c544577dad SUNRPC: Clean up transport write space handling
Treat socket write space handling in the same way we now treat transport
congestion: by denying the XPRT_LOCK until the transport signals that it
has free buffer space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
36bd7de949 SUNRPC: Turn off throttling of RPC slots for TCP sockets
The theory was that we would need to grab the socket lock anyway, so we
might as well use it to gate the allocation of RPC slots for a TCP
socket.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
f05d54ecf6 SUNRPC: Allow soft RPC calls to time out when waiting for the XPRT_LOCK
This no longer causes them to lose their place in the transmission queue.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
89f90fe1ad SUNRPC: Allow calls to xprt_transmit() to drain the entire transmit queue
Rather than forcing each and every RPC task to grab the socket write
lock in order to send itself, we allow whichever task is holding the
write lock to attempt to drain the entire transmit queue.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
86aeee0eb6 SUNRPC: Enqueue swapper tagged RPCs at the head of the transmit queue
Avoid memory starvation by giving RPCs that are tagged with the
RPC_TASK_SWAPPER flag the highest priority.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
75891f502f SUNRPC: Support for congestion control when queuing is enabled
Both RDMA and UDP transports require the request to get a "congestion control"
credit before they can be transmitted. Right now, this is done when
the request locks the socket. We'd like it to happen when a request attempts
to be transmitted for the first time.
In order to support retransmission of requests that already hold such
credits, we also want to ensure that they get queued first, so that we
don't deadlock with requests that have yet to obtain a credit.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
918f3c1fe8 SUNRPC: Improve latency for interactive tasks
One of the intentions with the priority queues was to ensure that no
single process can hog the transport. The field task->tk_owner therefore
identifies the RPC call's origin, and is intended to allow the RPC layer
to organise queues for fairness.
This commit therefore modifies the transmit queue to group requests
by task->tk_owner, and ensures that we round robin among those groups.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
dcbbeda836 SUNRPC: Move RPC retransmission stat counter to xprt_transmit()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
5f2f6bd987 SUNRPC: Simplify xprt_prepare_transmit()
Remove the checks for whether or not we need to transmit, and whether
or not a reply has been received. Those are already handled in
call_transmit() itself.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
04b3b88fbf SUNRPC: Don't reset the request 'bytes_sent' counter when releasing XPRT_LOCK
If the request is still on the queue, this will be incorrect behaviour.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
50f484e298 SUNRPC: Treat the task and request as separate in the xprt_ops->send_request()
When we shift to using the transmit queue, then the task that holds the
write lock will not necessarily be the same as the one being transmitted.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
902c58872e SUNRPC: Fix up the back channel transmit
Fix up the back channel code to recognise that it has already been
transmitted, so does not need to be called again.
Also ensure that we set req->rq_task.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
762e4e67b3 SUNRPC: Refactor RPC call encoding
Move the call encoding so that it occurs before the transport connection
etc.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
944b042921 SUNRPC: Add a transmission queue for RPC requests
Add the queue that will enforce the ordering of RPC task transmission.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:15 -04:00
Trond Myklebust
ef3f54347f SUNRPC: Distinguish between the slot allocation list and receive queue
When storing a struct rpc_rqst on the slot allocation list, we currently
use the same field 'rq_list' as we use to store the request on the
receive queue. Since the structure is never on both lists at the same
time, this is OK.
However, for clarity, let's make that a union with different names for
the different lists so that we can more easily distinguish between
the two states.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
78b576ced2 SUNRPC: Minor cleanup for call_transmit()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
7f3a1d1e18 SUNRPC: Refactor xprt_transmit() to remove wait for reply code
Allow the caller in clnt.c to call into the code to wait for a reply
after calling xprt_transmit(). Again, the reason is that the backchannel
code does not need this functionality.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
edc81dcd5b SUNRPC: Refactor xprt_transmit() to remove the reply queue code
Separate out the action of adding a request to the reply queue so that the
backchannel code can simply skip calling it altogether.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
75c84151a9 SUNRPC: Rename xprt->recv_lock to xprt->queue_lock
We will use the same lock to protect both the transmit and receive queues.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
ec37a58fba SUNRPC: Don't wake queued RPC calls multiple times in xprt_transmit
Rather than waking up the entire queue of RPC messages a second time,
just wake up the task that was put to sleep.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
5ce970393b SUNRPC: Test whether the task is queued before grabbing the queue spinlocks
When asked to wake up an RPC task, it makes sense to test whether or not
the task is still queued.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
359c48c04a SUNRPC: Add a helper to wake up a sleeping rpc_task and set its status
Add a helper that will wake up a task that is sleeping on a specific
queue, and will set the value of task->tk_status. This is mainly
intended for use by the transport layer to notify the task of an
error condition.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
cf9946cd61 SUNRPC: Refactor the transport request pinning
We are going to need to pin for both send and receive.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
4cd34e7c2e SUNRPC: Simplify dealing with aborted partially transmitted messages
If the previous message was only partially transmitted, we need to close
the socket in order to avoid corruption of the message stream. To do so,
we currently hijack the unlocking of the socket in order to schedule
the close.
Now that we track the message offset in the socket state, we can move
that kind of checking out of the socket lock code, which is needed to
allow messages to remain queued after dropping the socket lock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00
Trond Myklebust
6c7a64e5a4 SUNRPC: Add socket transmit queue offset tracking
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30 15:35:14 -04:00