linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-03 17:41:22 +00:00

Author	SHA1	Message	Date
David Howells	0e50d99990	rxrpc: Fix a couple of potential use-after-frees At the end of rxrpc_recvmsg(), if a call is found, the call is put and then a trace line is emitted referencing that call in a couple of places - but the call may have been deallocated by the time those traces happen. Fix this by stashing the call debug_id in a variable and passing that to the tracepoint rather than the call pointer. Fixes: `849979051c` ("rxrpc: Add a tracepoint to follow what recvmsg does") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-28 09:59:23 +00:00
David Howells	31d35a02ad	rxrpc: Fix the return value of rxrpc_new_incoming_call() Dan Carpenter sayeth[1]: The patch `5e6ef4f101`: "rxrpc: Make the I/O thread take over the call and local processor work" from Jan 23, 2020, leads to the following Smatch static checker warning: net/rxrpc/io_thread.c:283 rxrpc_input_packet() warn: bool is not less than zero. Fix this (for now) by changing rxrpc_new_incoming_call() to return an int with 0 or error code rather than bool. Note that the actual return value of rxrpc_input_packet() is currently ignored. I have a separate patch to clean that up. Fixes: `5e6ef4f101` ("rxrpc: Make the I/O thread take over the call and local processor work") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006123.html [1] Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	11e1706bc8	rxrpc: rxperf: Fix uninitialised variable Dan Carpenter sayeth[1]: The patch `75bfdbf2fc`: "rxrpc: Implement an in-kernel rxperf server for testing purposes" from Nov 3, 2022, leads to the following Smatch static checker warning: net/rxrpc/rxperf.c:337 rxperf_deliver_to_call() error: uninitialized symbol 'ret'. Fix this by initialising ret to 0. The value is only used for tracing purposes in the rxperf server. Fixes: `75bfdbf2fc` ("rxrpc: Implement an in-kernel rxperf server for testing purposes") Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2022-December/006124.html [1] Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	743d1768a0	rxrpc: Fix I/O thread stop The rxrpc I/O thread checks to see if there's any work it needs to do, and if not, checks kthread_should_stop() before scheduling, and if it should stop, breaks out of the loop and tries to clean up and exit. This can, however, race with socket destruction, wherein outstanding calls are aborted and released from the socket and then the socket unuses the local endpoint, causing kthread_stop() to be issued. The abort is deferred to the I/O thread and the event can by issued between the I/O thread checking if there's any work to be done (such as processing call aborts) and the stop being seen. This results in the I/O thread stopping processing of events whilst call cleanup events are still outstanding, leading to connections or other objects still being around and uncleaned up, which can result in assertions being triggered, e.g.: rxrpc: AF_RXRPC: Leaked client conn 00000000e8009865 {2} ------------[ cut here ]------------ kernel BUG at net/rxrpc/conn_client.c:64! Fix this by retrieving the kthread_should_stop() indication, then checking to see if there's more work to do, and going back round the loop if there is, and breaking out of the loop only if there wasn't. This was triggered by a syzbot test that produced some other symptom[1]. Fixes: `a275da62e8` ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/0000000000002b4a9f05ef2b616f@google.com/ [1] Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	c838f1a73d	rxrpc: Fix switched parameters in peer tracing Fix the switched parameters on rxrpc_alloc_peer() and rxrpc_get_peer(). The ref argument and the why argument got mixed. Fixes: `47c810a798` ("rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	608aecd16a	rxrpc: Fix locking issues in rxrpc_put_peer_locked() Now that rxrpc_put_local() may call kthread_stop(), it can't be called under spinlock as it might sleep. This can cause a problem in the peer keepalive code in rxrpc as it tries to avoid dropping the peer_hash_lock from the point it needs to re-add peer->keepalive_link to going round the loop again in rxrpc_peer_keepalive_dispatch(). Fix this by just dropping the lock when we don't need it and accepting that we'll have to take it again. This code is only called about every 20s for each peer, so not very often. This allows rxrpc_put_peer_unlocked() to be removed also. If triggered, this bug produces an oops like the following, as reproduced by a syzbot reproducer for a different oops[1]: BUG: sleeping function called from invalid context at kernel/sched/completion.c:101 ... RCU nest depth: 0, expected: 0 3 locks held by kworker/u9:0/50: #0: ffff88810e74a138 ((wq_completion)krxrpcd){+.+.}-{0:0}, at: process_one_work+0x294/0x636 #1: ffff8881013a7e20 ((work_completion)(&rxnet->peer_keepalive_work)){+.+.}-{0:0}, at: process_one_work+0x294/0x636 #2: ffff88817d366390 (&rxnet->peer_hash_lock){+.+.}-{2:2}, at: rxrpc_peer_keepalive_dispatch+0x2bd/0x35f ... Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f __might_resched+0x2cf/0x2f2 __wait_for_common+0x87/0x1e8 kthread_stop+0x14d/0x255 rxrpc_peer_keepalive_dispatch+0x333/0x35f rxrpc_peer_keepalive_worker+0x2e9/0x449 process_one_work+0x3c1/0x636 worker_thread+0x25f/0x359 kthread+0x1a6/0x1b5 ret_from_fork+0x1f/0x30 Fixes: `a275da62e8` ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/0000000000002b4a9f05ef2b616f@google.com/ [1] Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	8fbcc83334	rxrpc: Fix I/O thread startup getting skipped When starting a kthread, the __kthread_create_on_node() function, as called from kthread_run(), waits for a completion to indicate that the task_struct (or failure state) of the new kernel thread is available before continuing. This does not wait, however, for the thread function to be invoked and, indeed, will skip it if kthread_stop() gets called before it gets there. If this happens, though, kthread_run() will have returned successfully, indicating that the thread was started and returning the task_struct pointer. The actual error indication is returned by kthread_stop(). Note that this is ambiguous, as the caller cannot tell whether the -EINTR error code came from kthread() or from the thread function. This was encountered in the new rxrpc I/O thread, where if the system is being pounded hard by, say, syzbot, the check of KTHREAD_SHOULD_STOP can be delayed long enough for kthread_stop() to get called when rxrpc releases a socket - and this causes an oops because the I/O thread function doesn't get started and thus doesn't remove the rxrpc_local struct from the local_endpoints list. Fix this by using a completion to wait for the thread to actually enter rxrpc_io_thread(). This makes sure the thread can't be prematurely stopped and makes sure the relied-upon cleanup is done. Fixes: `a275da62e8` ("rxrpc: Create a per-local endpoint receive queue and I/O thread") Reported-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Hillf Danton <hdanton@sina.com> Link: https://lore.kernel.org/r/000000000000229f1505ef2b6159@google.com/ Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	eaa02390ad	rxrpc: Fix NULL deref in rxrpc_unuse_local() Fix rxrpc_unuse_local() to get the debug_id after checking to see if local is NULL. Fixes: `a2cf3264f3` ("rxrpc: Fold __rxrpc_unuse_local() into rxrpc_unuse_local()") Reported-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: syzbot+3538a6a72efa8b059c38@syzkaller.appspotmail.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	fdb99487b0	rxrpc: Fix security setting propagation Fix the propagation of the security settings from sendmsg to the rxrpc_call struct. Fixes: `f3441d4125` ("rxrpc: Copy client call parameters into rxrpc_call earlier") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	4feb2c4462	rxrpc: Fix missing unlock in rxrpc_do_sendmsg() One of the error paths in rxrpc_do_sendmsg() doesn't unlock the call mutex before returning. Fix it to do this. Note that this still doesn't get rid of the checker warning: ../net/rxrpc/sendmsg.c:617:5: warning: context imbalance in 'rxrpc_do_sendmsg' - wrong count at exit I think the interplay between the socket lock and the call's user_mutex may be too complicated for checker to analyse, especially as rxrpc_new_client_call_for_sendmsg(), which it calls, returns with the call's user_mutex if successful but unconditionally drops the socket lock. Fixes: `e754eba685` ("rxrpc: Provide a cmsg to specify the amount of Tx data for a call") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-12-19 09:51:31 +00:00
David Howells	b0346843b1	rxrpc: Transmit ACKs at the point of generation For ACKs generated inside the I/O thread, transmit the ACK at the point of generation. Where the ACK is generated outside of the I/O thread, it's offloaded to the I/O thread to transmit it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:43 +00:00
David Howells	a2cf3264f3	rxrpc: Fold __rxrpc_unuse_local() into rxrpc_unuse_local() Fold __rxrpc_unuse_local() into rxrpc_unuse_local() as the latter is now the only user of the former. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:43 +00:00
David Howells	5086d9a9df	rxrpc: Move the cwnd degradation after transmitting packets When we've gone for >1RTT without transmitting a packet, we should reduce the ssthresh and cut the cwnd by half (as suggested in RFC2861 sec 3.1). However, we may receive ACK packets in a batch and the first of these may cut the cwnd, preventing further transmission, and each subsequent one cuts the cwnd yet further, reducing it to the floor and killing performance. Fix this by moving the cwnd reset to after doing the transmission and resetting the base time such that we don't cut the cwnd by half again for at least another RTT. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:43 +00:00
David Howells	32cf8edb07	rxrpc: Trace/count transmission underflows and cwnd resets Add a tracepoint to log when a cwnd reset occurs due to lack of transmission on a call. Add stat counters to count transmission underflows (ie. when we have tx window space, but sendmsg doesn't manage to keep up), cwnd resets and transmission failures. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:42 +00:00
David Howells	3dd9c8b5f0	rxrpc: Remove the _bh annotation from all the spinlocks None of the spinlocks in rxrpc need a _bh annotation now as the RCU callback routines no longer take spinlocks and the bulk of the packet wrangling code is now run in the I/O thread, not softirq context. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:42 +00:00
David Howells	5e6ef4f101	rxrpc: Make the I/O thread take over the call and local processor work Move the functions from the call->processor and local->processor work items into the domain of the I/O thread. The call event processor, now called from the I/O thread, then takes over the job of cranking the call state machine, processing incoming packets and transmitting DATA, ACK and ABORT packets. In a future patch, rxrpc_send_ACK() will transmit the ACK on the spot rather than queuing it for later transmission. The call event processor becomes purely received-skb driven. It only transmits things in response to events. We use "pokes" to queue a dummy skb to make it do things like start/resume transmitting data. Timer expiry also results in pokes. The connection event processor, becomes similar, though crypto events, such as dealing with CHALLENGE and RESPONSE packets is offloaded to a work item to avoid doing crypto in the I/O thread. The local event processor is removed and VERSION response packets are generated directly from the packet parser. Similarly, ABORTs generated in response to protocol errors will be transmitted immediately rather than being pushed onto a queue for later transmission. Changes: ======== ver #2) - Fix a couple of introduced lock context imbalances. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:42 +00:00
David Howells	393a2a2007	rxrpc: Extract the peer address from an incoming packet earlier Extract the peer address from an incoming packet earlier, at the beginning of rxrpc_input_packet() and thence pass a pointer to it to various functions that use it as part of the lookup rather than doing it on several separate paths. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:42 +00:00
David Howells	cd21effb05	rxrpc: Reduce the use of RCU in packet input Shrink the region of rxrpc_input_packet() that is covered by the RCU read lock so that it only covers the connection and call lookup. This means that the bits now outside of that can call sleepable functions such as kmalloc and sendmsg. Also take a ref on the conn or call we're going to use before we drop the RCU read lock. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:41 +00:00
David Howells	2d1faf7a0c	rxrpc: Simplify skbuff accounting in receive path A received skbuff needs a ref when it gets put on a call data queue or conn packet queue, and rxrpc_input_packet() and co. jump through a lot of hoops to avoid double-dropping the skbuff ref so that we can avoid getting a ref when we queue the packet. Change this so that the skbuff ref is unconditionally dropped by the caller of rxrpc_input_packet(). An additional ref is then taken on the packet if it is pushed onto a queue. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:41 +00:00
David Howells	29fb4ec385	rxrpc: Remove RCU from peer->error_targets list Remove the RCU requirements from the peer's list of error targets so that the error distributor can call sleeping functions. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:41 +00:00
David Howells	cf37b59875	rxrpc: Move DATA transmission into call processor work item Move DATA transmission into the call processor work item. In a future patch, this will be called from the I/O thread rather than being itsown work item. This will allow DATA transmission to be driven directly by incoming ACKs, pokes and timers as those are processed. The Tx queue is also split: The queue of packets prepared by sendmsg is now places in call->tx_sendmsg and the packet dispatcher decants the packets into call->tx_buffer as space becomes available in the transmission window. This allows sendmsg to run ahead of the available space to try and prevent an underflow in transmission. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:41 +00:00
David Howells	f3441d4125	rxrpc: Copy client call parameters into rxrpc_call earlier Copy client call parameters into rxrpc_call earlier so that that can be used to convey them to the connection code - which can then be offloaded to the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:41 +00:00
David Howells	15f661dc95	rxrpc: Implement a mechanism to send an event notification to a call Provide a means by which an event notification can be sent to a call such that the I/O thread can process it rather than it being done in a separate workqueue. This will allow a lot of locking to be removed. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:41 +00:00
David Howells	81f2e8adc0	rxrpc: Don't use sk->sk_receive_queue.lock to guard socket state changes Don't use sk->sk_receive_queue.lock to guard socket state changes as the socket mutex is sufficient. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:41 +00:00
David Howells	4041a8ff65	rxrpc: Remove call->input_lock Remove call->input_lock as it was only necessary to serialise access to the state stored in the rxrpc_call struct by simultaneous softirq handlers presenting received packets. They now dump the packets in a queue and a single process-context handler now processes them. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:40 +00:00
David Howells	ff7348254e	rxrpc: Move error processing into the local endpoint I/O thread Move the processing of error packets into the local endpoint I/O thread, leaving the handover from UDP to merely transfer them into the local endpoint queue. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:40 +00:00
David Howells	446b3e1452	rxrpc: Move packet reception processing into I/O thread Split the packet input handler to make the softirq side just dump the received packet into the local endpoint receive queue and then call the remainder of the input function from the I/O thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:40 +00:00
David Howells	a275da62e8	rxrpc: Create a per-local endpoint receive queue and I/O thread Create a per-local receive queue to which, in a future patch, all incoming packets will be directed and an I/O thread that will process those packets and perform all transmission of packets. Destruction of the local endpoint is also moved from the local processor work item (which will be absorbed) to the thread. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:40 +00:00
David Howells	96b2d69b43	rxrpc: Split the receive code Split the code that handles packet reception in softirq mode as a prelude to moving all the packet processing beyond routing to the appropriate call and setting up of a new call out into process context. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:40 +00:00
David Howells	3cec055c56	rxrpc: Don't hold a ref for connection workqueue Currently, rxrpc gives the connection's work item a ref on the connection when it queues it - and this is called from the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). (1) Don't give a ref to the work item. (2) Simplify handling of service connections by adding a separate active count so that the refcount isn't also used for this. (3) Connection destruction for both client and service connections can then be cleaned up by putting rxrpc_put_connection() out of line and making a tidy progression through the destruction code (offloaded to a workqueue if put from softirq or processor function context). The RCU part of the cleanup then only deals with the freeing at the end. (4) Make rxrpc_queue_conn() return immediately if it sees the active count is -1 rather then queuing the connection. (5) Make sure that the cleanup routine waits for the work item to complete. (6) Stash the rxrpc_net pointer in the conn struct so that the rcu free routine can use it, even if the local endpoint has been freed. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the connection work item is mostly going to go away with the main event work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:40 +00:00
David Howells	3feda9d69c	rxrpc: Don't hold a ref for call timer or workqueue Currently, rxrpc gives the call timer a ref on the call when it starts it and this is passed along to the workqueue by the timer expiration function. The problem comes when queue_work() fails (ie. the work item is already queued): the timer routine must put the ref - but this may cause the cleanup code to run. This has the unfortunate effect that the cleanup code may then be run in softirq context - which means that any spinlocks it might need to touch have to be guarded to disable softirqs (ie. they need a "_bh" suffix). Fix this by: (1) Don't give a ref to the timer. (2) Making the expiration function not do anything if the refcount is 0. Note that this is more of an optimisation. (3) Make sure that the cleanup routine waits for timer to complete. However, this has a consequence that timer cannot give a ref to the work item. Therefore the following fixes are also necessary: (4) Don't give a ref to the work item. (5) Make the work item return asap if it sees the ref count is 0. (6) Make sure that the cleanup routine waits for the work item to complete. Unfortunately, neither the timer nor the work item can simply get around the problem by just using refcount_inc_not_zero() as the waits would still have to be done, and there would still be the possibility of having to put the ref in the expiration function. Note the call work item is going to go away with the work being transferred to the I/O thread, so the wait in (6) will become obsolete. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:39 +00:00
David Howells	9a36a6bc22	rxrpc: trace: Don't use __builtin_return_address for sk_buff tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the sk_buff tracepoint. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:39 +00:00
David Howells	fa3492abb6	rxrpc: Trace rxrpc_bundle refcount Add a tracepoint for the rxrpc_bundle refcounting. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:39 +00:00
David Howells	cb0fc0c972	rxrpc: trace: Don't use __builtin_return_address for rxrpc_call tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_call tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:39 +00:00
David Howells	7fa25105b2	rxrpc: trace: Don't use __builtin_return_address for rxrpc_conn tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_conn tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:39 +00:00
David Howells	47c810a798	rxrpc: trace: Don't use __builtin_return_address for rxrpc_peer tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_peer tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:38 +00:00
David Howells	0fde882fc9	rxrpc: trace: Don't use __builtin_return_address for rxrpc_local tracing In rxrpc tracing, use enums to generate lists of points of interest rather than __builtin_return_address() for the rxrpc_local tracepoint Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:38 +00:00
David Howells	f14febd8df	rxrpc: Extract the code from a received ABORT packet much earlier Extract the code from a received rx ABORT packet much earlier and in a single place and harmonise the responses to malformed ABORT packets. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:38 +00:00
David Howells	2cc800863c	rxrpc: Drop rxrpc_conn_parameters from rxrpc_connection and rxrpc_bundle Remove the rxrpc_conn_parameters struct from the rxrpc_connection and rxrpc_bundle structs and emplace the members directly. These are going to get filled in from the rxrpc_call struct in future. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:38 +00:00
David Howells	e969c92ce5	rxrpc: Remove the [_k]net() debugging macros Remove the _net() and knet() debugging macros in favour of tracepoints. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:38 +00:00
David Howells	2ebdb26e6a	rxrpc: Remove the [k_]proto() debugging macros Remove the kproto() and _proto() debugging macros in preference to using tracepoints for this. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:38 +00:00
David Howells	30efa3ce10	rxrpc: Remove handling of duplicate packets in recvmsg_queue We should not now see duplicate packets in the recvmsg_queue. At one point, jumbo packets that overlapped with already queued data would be added to the queue and dealt with in recvmsg rather than in the softirq input code, but now jumbo packets are split/cloned before being processed by the input code and the subpackets can be discarded individually. So remove the recvmsg-side code for handling this. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:38 +00:00
David Howells	49df54a6b2	rxrpc: Fix call leak When retransmitting a packet, rxrpc_resend() shouldn't be attaching a ref to the call to the txbuf as that pins the call and prevents the call from clearing the packet buffer. Signed-off-by: David Howells <dhowells@redhat.com> Fixes: `d57a3a1516` ("rxrpc: Save last ACK's SACK table rather than marking txbufs") cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:37 +00:00
David Howells	75bfdbf2fc	rxrpc: Implement an in-kernel rxperf server for testing purposes Implement an in-kernel rxperf server to allow kernel-based rxrpc services to be tested directly, unlike with AFS where they're accessed by the fileserver when the latter decides it wants to. This is implemented as a module that, if loaded, opens UDP port 7009 (afs3-rmtsys) and listens on it for incoming calls. Calls can be generated using the rxperf command shipped with OpenAFS, for example. Changes ======= ver #2) - Use min_t() instead of min(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: Jakub Kicinski <kuba@kernel.org>	2022-12-01 13:36:37 +00:00
David Howells	84924aac08	rxrpc: Fix checker warning Fix the following checker warning: ../net/rxrpc/key.c:692:9: error: subtraction of different types can't work (different address spaces) Checker is wrong in this case, but cast the pointers to unsigned long to avoid the warning. Whilst we're at it, reduce the assertions to WARN_ON() and return an error. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-12-01 13:36:37 +00:00
Jakub Kicinski	f2bb566f5c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net tools/lib/bpf/ringbuf.c `927cbb478a` ("libbpf: Handle size overflow for ringbuf mmap") `b486d19a0a` ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-11-29 13:04:52 -08:00
David S. Miller	8cf4f8c7d9	rxrpc fixes for net-next -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmN0mecACgkQ+7dXa6fL C2sHIg/8Ce7qGGeGBlDXtpHumvLAOKh/Aq45GC68M6ZyScckIOXUYKSHnM+3XWln lUcuidsTyjHK7YRXzSLYZ56WREbr3GelEF1jh4iTt+UxBUn0gNV5C5PJQBL4KWcR qU5ZVlnbOHb19XzRsWSMjAhdAulwnG7nhvuKB+Zo1mx7VVLKED9DCQ3A+Mm92Dm9 DjV/skzh0PI1zTBMdM7DolydftizGOO6yiFjhd8ktzIZj0TdifB63bVbMgoasQrO SO+ZT9F4l/swiv12qgsYUH09SFdp2fdX3gt4Lj1JhwmXq/iSmeiHnvpJdbUW7RiI jDKLiE0XpXwix29P26gq+Sdsb2pd7Ni3+YY6Qteln7RekIe6g3g2xwOLbkIgpTvc NcwAbn0CL+ZLLts/udeIKHL5+ux1HZAAaHwftgysCHULLvxP4NrIcWrzVqzOLA9V SH2MI6fYuOUbpgsoGxgv0+8f7MOrgUW2C9ySHjZfUPAqhAG8DinqX9gdUiYPMVF9 GrqrETmmaJCxuQaFQ8BsWKkP+KLfsi3UfEOwv7HdHjOqvCKSXOg5hHjv6Ctpp5Kv yTj2BcAHjKB8FtuJ4h30UzVLhF1gquud+lPiO3Gbvbjhp1G1EQPwtcYjUamUre+w lxZ870Z/jEbqEOrH7Xh1VvoKcgtp8Y9idJeU+8VNLL/r96nCF2E= =xivU -----END PGP SIGNATURE----- Merge tag 'rxrpc-next-20221116' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix oops and missing config conditionals The patches that were pulled into net-next previously[1] had some issues that this patchset fixes: (1) Fix missing IPV6 config conditionals. (2) Fix an oops caused by calling udpv6_sendmsg() directly on an AF_INET socket. (3) Fix the validation of network addresses on entry to socket functions so that we don't allow an AF_INET6 address if we've selected an AF_INET transport socket. Link: https://lore.kernel.org/r/166794587113.2389296.16484814996876530222.stgit@warthog.procyon.org.uk/ [1] ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-18 12:09:20 +00:00
David Howells	3bcd6c7eaa	rxrpc: Fix race between conn bundle lookup and bundle removal [ZDI-CAN-15975] After rxrpc_unbundle_conn() has removed a connection from a bundle, it checks to see if there are any conns with available channels and, if not, removes and attempts to destroy the bundle. Whilst it does check after grabbing client_bundles_lock that there are no connections attached, this races with rxrpc_look_up_bundle() retrieving the bundle, but not attaching a connection for the connection to be attached later. There is therefore a window in which the bundle can get destroyed before we manage to attach a new connection to it. Fix this by adding an "active" counter to struct rxrpc_bundle: (1) rxrpc_connect_call() obtains an active count by prepping/looking up a bundle and ditches it before returning. (2) If, during rxrpc_connect_call(), a connection is added to the bundle, this obtains an active count, which is held until the connection is discarded. (3) rxrpc_deactivate_bundle() is created to drop an active count on a bundle and destroy it when the active count reaches 0. The active count is checked inside client_bundles_lock() to prevent a race with rxrpc_look_up_bundle(). (4) rxrpc_unbundle_conn() then calls rxrpc_deactivate_bundle(). Fixes: `245500d853` ("rxrpc: Rewrite the client connection manager") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-15975 Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: zdi-disclosures@trendmicro.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-18 12:05:44 +00:00
Dan Carpenter	3846189483	rxrpc: uninitialized variable in rxrpc_send_ack_packet() The "pkt" was supposed to have been deleted in a previous patch. It leads to an uninitialized variable bug. Fixes: `72f0c6fb05` ("rxrpc: Allocate ACK records at proposal and queue for transmission") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-18 12:03:01 +00:00
Dan Carpenter	101c1bb6c5	rxrpc: fix rxkad_verify_response() The error handling for if skb_copy_bits() fails was accidentally deleted so the rxkad_decrypt_ticket() function is not called. Fixes: `5d7edbc923` ("rxrpc: Get rid of the Rx ring") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-18 12:02:05 +00:00
David Howells	66f6fd278c	rxrpc: Fix network address validation Fix network address validation on entry to uapi functions such as connect() for AF_RXRPC. The check for address compatibility with the transport socket isn't correct and allows an AF_INET6 address to be given to an AF_INET socket, resulting in an oops now that rxrpc is calling udp_sendmsg() directly. Sample program: #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <sys/socket.h> #include <arpa/inet.h> #include <linux/rxrpc.h> static unsigned char ctrl[256] = "\x18\x00\x00\x00\x00\x00\x00\x00\x10\x01\x00\x00\x01"; int main(void) { struct sockaddr_rxrpc srx = { .srx_family = AF_RXRPC, .transport_type = SOCK_DGRAM, .transport_len = 28, .transport.sin6.sin6_family = AF_INET6, }; struct mmsghdr vec = { .msg_hdr.msg_control = ctrl, .msg_hdr.msg_controllen = 0x18, }; int s; s = socket(AF_RXRPC, SOCK_DGRAM, AF_INET); if (s < 0) { perror("socket"); exit(1); } if (connect(s, (struct sockaddr *)&srx, sizeof(srx)) < 0) { perror("connect"); exit(1); } if (sendmmsg(s, &vec, 1, MSG_NOSIGNAL \| MSG_MORE) < 0) { perror("sendmmsg"); exit(1); } return 0; } If working properly, connect() should fail with EAFNOSUPPORT. Fixes: `ed472b0c87` ("rxrpc: Call udp_sendmsg() directly") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-16 08:05:11 +00:00
David Howells	6423ac2eb3	rxrpc: Fix oops from calling udpv6_sendmsg() on AF_INET socket If rxrpc sees an IPv6 address, it assumes it can call udpv6_sendmsg() on it - even if it got it on an IPv4 socket. Fix do_udp_sendmsg() to give an error in such a case. general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] ... RIP: 0010:ipv6_addr_v4mapped include/net/ipv6.h:749 [inline] RIP: 0010:udpv6_sendmsg+0xd0a/0x2c70 net/ipv6/udp.c:1361 ... Call Trace: do_udp_sendmsg net/rxrpc/output.c:27 [inline] do_udp_sendmsg net/rxrpc/output.c:21 [inline] rxrpc_send_abort_packet+0x73b/0x860 net/rxrpc/output.c:367 rxrpc_release_calls_on_socket+0x211/0x300 net/rxrpc/call_object.c:595 rxrpc_release_sock net/rxrpc/af_rxrpc.c:886 [inline] rxrpc_release+0x263/0x5a0 net/rxrpc/af_rxrpc.c:917 __sock_release+0xcd/0x280 net/socket.c:650 sock_close+0x18/0x20 net/socket.c:1365 __fput+0x27c/0xa90 fs/file_table.c:320 task_work_run+0x16b/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xb35/0x2a20 kernel/exit.c:820 do_group_exit+0xd0/0x2a0 kernel/exit.c:950 __do_sys_exit_group kernel/exit.c:961 [inline] __se_sys_exit_group kernel/exit.c:959 [inline] __x64_sys_exit_group+0x3a/0x50 kernel/exit.c:959 Fixes: `ed472b0c87` ("rxrpc: Call udp_sendmsg() directly") Reported-by: Eric Dumazet <edumazet@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-16 08:05:11 +00:00
David Howells	41cf3a9156	rxrpc: Fix missing IPV6 #ifdef Fix rxrpc_encap_err_rcv() to make the call to ipv6_icmp_error conditional on IPV6 support being enabled. Fixes: `b6c66c4324` ("rxrpc: Use the core ICMP/ICMP6 parsers") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org	2022-11-14 09:31:55 +00:00
David Howells	30d95efe06	rxrpc: Allocate an skcipher each time needed rather than reusing In the rxkad security class, allocate the skcipher used to do packet encryption and decription rather than allocating one up front and reusing it for each packet. Reusing the skcipher precludes doing crypto in parallel. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	1fc4fa2ac9	rxrpc: Fix congestion management rxrpc has a problem in its congestion management in that it saves the congestion window size (cwnd) from one call to another, but if this is 0 at the time is saved, then the next call may not actually manage to ever transmit anything. To this end: (1) Don't save cwnd between calls, but rather reset back down to the initial cwnd and re-enter slow-start if data transmission is idle for more than an RTT. (2) Preserve ssthresh instead, as that is a handy estimate of pipe capacity. Knowing roughly when to stop slow start and enter congestion avoidance can reduce the tendency to overshoot and drop larger amounts of packets when probing. In future, cwind growth also needs to be constrained when the window isn't being filled due to being application limited. Reported-by: Simon Wilkinson <sxw@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	6869ddb87d	rxrpc: Remove the rxtx ring The Rx/Tx ring is no longer used, so remove it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	d57a3a1516	rxrpc: Save last ACK's SACK table rather than marking txbufs Improve the tracking of which packets need to be transmitted by saving the last ACK packet that we receive that has a populated soft-ACK table rather than marking packets. Then we can step through the soft-ACK table and look at the packets we've transmitted beyond that to determine which packets we might want to retransmit. We also look at the highest serial number that has been acked to try and guess which packets we've transmitted the peer is likely to have seen. If necessary, we send a ping to retrieve that number. One downside that might be a problem is that we can't then compare the previous acked/unacked state so easily in rxrpc_input_soft_acks() - which is a potential problem for the slow-start algorithm. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	4e76bd406d	rxrpc: Remove call->lock call->lock is no longer necessary, so remove it. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	a4ea4c4776	rxrpc: Don't use a ring buffer for call Tx queue Change the way the Tx queueing works to make the following ends easier to achieve: (1) The filling of packets, the encryption of packets and the transmission of packets can be handled in parallel by separate threads, rather than rxrpc_sendmsg() allocating, filling, encrypting and transmitting each packet before moving onto the next one. (2) Get rid of the fixed-size ring which sets a hard limit on the number of packets that can be retained in the ring. This allows the number of packets to increase without having to allocate a very large ring or having variable-sized rings. [Note: the downside of this is that it's then less efficient to locate a packet for retransmission as we then have to step through a list and examine each buffer in the list.] (3) Allow the filler/encrypter to run ahead of the transmission window. (4) Make it easier to do zero copy UDP from the packet buffers. (5) Make it easier to do zero copy from userspace to the packet buffers - and thence to UDP (only if for unauthenticated connections). To that end, the following changes are made: (1) Use the new rxrpc_txbuf struct instead of sk_buff for keeping packets to be transmitted in. This allows them to be placed on multiple queues simultaneously. An sk_buff isn't really necessary as it's never passed on to lower-level networking code. (2) Keep the transmissable packets in a linked list on the call struct rather than in a ring. As a consequence, the annotation buffer isn't used either; rather a flag is set on the packet to indicate ackedness. (3) Use the RXRPC_CALL_TX_LAST flag to indicate that the last packet to be transmitted has been queued. Add RXRPC_CALL_TX_ALL_ACKED to indicate that all packets up to and including the last got hard acked. (4) Wire headers are now stored in the txbuf rather than being concocted on the stack and they're stored immediately before the data, thereby allowing zerocopy of a single span. (5) Don't bother with instant-resend on transmission failure; rather, leave it for a timer or an ACK packet to trigger. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	5d7edbc923	rxrpc: Get rid of the Rx ring Get rid of the Rx ring and replace it with a pair of queues instead. One queue gets the packets that are in-sequence and are ready for processing by recvmsg(); the other queue gets the out-of-sequence packets for addition to the first queue as the holes get filled. The annotation ring is removed and replaced with a SACK table. The SACK table has the bits set that correspond exactly to the sequence number of the packet being acked. The SACK ring is copied when an ACK packet is being assembled and rotated so that the first ACK is in byte 0. Flow control handling is altered so that packets that are moved to the in-sequence queue are hard-ACK'd even before they're consumed - and then the Rx window size in the ACK packet (rsize) is shrunk down to compensate (even going to 0 if the window is full). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	d4d02d8bb5	rxrpc: Clone received jumbo subpackets and queue separately Split up received jumbo packets into separate skbuffs by cloning the original skbuff for each subpacket and setting the offset and length of the data in that subpacket in the skbuff's private data. The subpackets are then placed on the recvmsg queue separately. The security class then gets to revise the offset and length to remove its metadata. If we fail to clone a packet, we just drop it and let the peer resend it. The original packet gets used for the final subpacket. This should make it easier to handle parallel decryption of the subpackets. It also simplifies the handling of lost or misordered packets in the queuing/buffering loop as the possibility of overlapping jumbo packets no longer needs to be considered. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	faf92e8d53	rxrpc: Split the rxrpc_recvmsg tracepoint Split the rxrpc_recvmsg tracepoint so that the tracepoints that are about data packet processing (and which have extra pieces of information) are separate from the tracepoint that shows the general flow of recvmsg(). Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	530403d9ba	rxrpc: Clean up ACK handling Clean up the rxrpc_propose_ACK() function. If deferred PING ACK proposal is split out, it's only really needed for deferred DELAY ACKs. All other ACKs, bar terminal IDLE ACK are sent immediately. The deferred IDLE ACK submission can be handled by conversion of a DELAY ACK into an IDLE ACK if there's nothing to be SACK'd. Also, because there's a delay between an ACK being generated and being transmitted, it's possible that other ACKs of the same type will be generated during that interval. Apart from the ACK time and the serial number responded to, most of the ACK body, including window and SACK parameters, are not filled out till the point of transmission - so we can avoid generating a new ACK if there's one pending that will cover the SACK data we need to convey. Therefore, don't propose a new DELAY or IDLE ACK for a call if there's one already pending. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	72f0c6fb05	rxrpc: Allocate ACK records at proposal and queue for transmission Allocate rxrpc_txbuf records for ACKs and put onto a queue for the transmitter thread to dispatch. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	02a1935640	rxrpc: Define rxrpc_txbuf struct to carry data to be transmitted Define a struct, rxrpc_txbuf, to carry data to be transmitted instead of a socket buffer so that it can be placed onto multiple queues at once. This also allows the data buffer to be in the same allocation as the internal data. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	a11e6ff961	rxrpc: Remove call->tx_phase Remove call->tx_phase as it's only ever set. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	27f699ccb8	rxrpc: Remove the flags from the rxrpc_skb tracepoint Remove the flags from the rxrpc_skb tracepoint as we're no longer going to be using this for the transmission buffers and so marking which are transmission buffers isn't going to be necessary. Note that this also remove the rxrpc skb flag that indicates if this is a transmission buffer and so the count is not updated for the moment. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	23b237f325	rxrpc: Remove unnecessary header inclusions Remove a bunch of unnecessary header inclusions. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	ed472b0c87	rxrpc: Call udp_sendmsg() directly Call udp_sendmsg() and udpv6_sendmsg() directly rather than calling kernel_sendmsg() as the latter assumes we want a kvec-class iterator. However, zerocopy explicitly doesn't work with such an iterator. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	b6c66c4324	rxrpc: Use the core ICMP/ICMP6 parsers Make rxrpc_encap_rcv_err() pass the ICMP/ICMP6 skbuff to ip_icmp_error() or ipv6_icmp_error() as appropriate to do the parsing rather than trying to do it in rxrpc. This pushes an error report onto the UDP socket's error queue and calls ->sk_error_report() from which point rxrpc can pick it up. It would be preferable to steal the packet directly from ip*_icmp_error() rather than letting it get queued, but this is probably good enough. Also note that __udp4_lib_err() calls sk_error_report() twice in some cases. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:28 +00:00
David Howells	42fb06b391	net: Change the udp encap_err_rcv to allow use of {ip,ipv6}_icmp_error() Change the udp encap_err_rcv signature to match ip_icmp_error() and ipv6_icmp_error() so that those can be used from the called function and export them. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: netdev@vger.kernel.org	2022-11-08 16:42:28 +00:00
David Howells	8889a711f9	rxrpc: Fix ack.bufferSize to be 0 when generating an ack ack.bufferSize should be set to 0 when generating an ack. Fixes: `8d94aa381d` ("rxrpc: Calls shouldn't hold socket refs") Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:15 +00:00
David Howells	f7fa52421f	rxrpc: Record stats for why the REQUEST-ACK flag is being set Record stats for why the REQUEST-ACK flag is being set. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:15 +00:00
David Howells	f2a676d100	rxrpc: Record statistics about ACK types Record statistics about the different types of ACKs that have been transmitted and received and the number of ACKs that have been filled out and transmitted or that have been skipped. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:15 +00:00
David Howells	b015424695	rxrpc: Add stats procfile and DATA packet stats Add a procfile, /proc/net/rxrpc/stats, to display some statistics about what rxrpc has been doing. Writing a blank line to the stats file will clear the increment-only counters. Allocated resource counters don't get cleared. Add some counters to count various things about DATA packets, including the number created, transmitted and retransmitted and the number received, the number of ACK-requests markings and the number of jumbo packets received. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:15 +00:00
David Howells	589a0c1e0a	rxrpc: Track highest acked serial Keep track of the highest DATA serial number that has been acked by the peer for future purposes. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:15 +00:00
David Howells	334dfbfc5a	rxrpc: Split call timer-expiration from call timer-set tracepoint Split the tracepoint for call timer-set to separate out the call timer-expiration event Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:15 +00:00
David Howells	4d843be56b	rxrpc: Trace setting of the request-ack flag Add a tracepoint to log why the request-ack flag is set on an outgoing DATA packet, allowing debugging as to why. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org	2022-11-08 16:42:15 +00:00
Gaosheng Cui	9621e74f39	rxrpc: remove rxrpc_max_call_lifetime declaration rxrpc_max_call_lifetime has been removed since commit `a158bdd324` ("rxrpc: Fix call timeouts"), so remove it. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Link: https://lore.kernel.org/r/20220909064042.1149404-1-cuigaosheng1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-19 17:58:47 -07:00
David Howells	21457f4a91	rxrpc: Remove rxrpc_get_reply_time() which is no longer used Remove rxrpc_get_reply_time() as that is no longer used now that the call issue time is used instead of the reply time. Signed-off-by: David Howells <dhowells@redhat.com>	2022-09-01 11:44:13 +01:00
David Howells	214a9dc7d8	rxrpc: Fix calc of resend age Fix the calculation of the resend age to add a microsecond value as microseconds, not nanoseconds. Signed-off-by: David Howells <dhowells@redhat.com>	2022-09-01 11:44:12 +01:00
David Howells	d3d863036d	rxrpc: Fix local destruction being repeated If the local processor work item for the rxrpc local endpoint gets requeued by an event (such as an incoming packet) between it getting scheduled for destruction and the UDP socket being closed, the rxrpc_local_destroyer() function can get run twice. The second time it can hang because it can end up waiting for cleanup events that will never happen. Signed-off-by: David Howells <dhowells@redhat.com>	2022-09-01 11:44:12 +01:00
David Howells	0d40f728e2	rxrpc: Fix an insufficiently large sglist in rxkad_verify_packet_2() rxkad_verify_packet_2() has a small stack-allocated sglist of 4 elements, but if that isn't sufficient for the number of fragments in the socket buffer, we try to allocate an sglist large enough to hold all the fragments. However, for large packets with a lot of fragments, this isn't sufficient and we need at least one additional fragment. The problem manifests as skb_to_sgvec() returning -EMSGSIZE and this then getting returned by userspace. Most of the time, this isn't a problem as rxrpc sets a limit of 5692, big enough for 4 jumbo subpackets to be glued together; occasionally, however, the server will ignore the reported limit and give a packet that's a lot bigger - say 19852 bytes with ->nr_frags being 7. skb_to_sgvec() then tries to return a "zeroth" fragment that seems to occur before the fragments counted by ->nr_frags and we hit the end of the sglist too early. Note that __skb_to_sgvec() also has an skb_walk_frags() loop that is recursive up to 24 deep. I'm not sure if I need to take account of that too - or if there's an easy way of counting those frags too. Fix this by counting an extra frag and allocating a larger sglist based on that. Fixes: `d0d5c0cd1e` ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org	2022-09-01 11:44:12 +01:00
David Howells	ac56a0b48d	rxrpc: Fix ICMP/ICMP6 error handling Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing it to siphon off UDP packets early in the handling of received UDP packets thereby avoiding the packet going through the UDP receive queue, it doesn't get ICMP packets through the UDP ->sk_error_report() callback. In fact, it doesn't appear that there's any usable option for getting hold of ICMP packets. Fix this by adding a new UDP encap hook to distribute error messages for UDP tunnels. If the hook is set, then the tunnel driver will be able to see ICMP packets. The hook provides the offset into the packet of the UDP header of the original packet that caused the notification. An alternative would be to call the ->error_handler() hook - but that requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error() do, though isn't really necessary or desirable in rxrpc's case is we want to parse them there and then, not queue them). Changes ======= ver #3) - Fixed an uninitialised variable. ver #2) - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals. Fixes: `5271953cad` ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells <dhowells@redhat.com>	2022-09-01 11:42:12 +01:00
David Howells	b0f571ecd7	rxrpc: Fix locking in rxrpc's sendmsg Fix three bugs in the rxrpc's sendmsg implementation: (1) rxrpc_new_client_call() should release the socket lock when returning an error from rxrpc_get_call_slot(). (2) rxrpc_wait_for_tx_window_intr() will return without the call mutex held in the event that we're interrupted by a signal whilst waiting for tx space on the socket or relocking the call mutex afterwards. Fix this by: (a) moving the unlock/lock of the call mutex up to rxrpc_send_data() such that the lock is not held around all of rxrpc_wait_for_tx_window*() and (b) indicating to higher callers whether we're return with the lock dropped. Note that this means recvmsg() will not block on this call whilst we're waiting. (3) After dropping and regaining the call mutex, rxrpc_send_data() needs to go and recheck the state of the tx_pending buffer and the tx_total_len check in case we raced with another sendmsg() on the same call. Thinking on this some more, it might make sense to have different locks for sendmsg() and recvmsg(). There's probably no need to make recvmsg() wait for sendmsg(). It does mean that recvmsg() can return MSG_EOR indicating that a call is dead before a sendmsg() to that call returns - but that can currently happen anyway. Without fix (2), something like the following can be induced: WARNING: bad unlock balance detected! 5.16.0-rc6-syzkaller #0 Not tainted ------------------------------------- syz-executor011/3597 is trying to release lock (&call->user_mutex) at: [<ffffffff885163a3>] rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748 but there are no more locks to release! other info that might help us debug this: no locks held by syz-executor011/3597. ... Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_unlock_imbalance_bug include/trace/events/lock.h:58 [inline] __lock_release kernel/locking/lockdep.c:5306 [inline] lock_release.cold+0x49/0x4e kernel/locking/lockdep.c:5657 __mutex_unlock_slowpath+0x99/0x5e0 kernel/locking/mutex.c:900 rxrpc_do_sendmsg+0xc13/0x1350 net/rxrpc/sendmsg.c:748 rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:561 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:724 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae [Thanks to Hawkins Jiawei and Khalid Masum for their attempts to fix this] Fixes: `bc5e3a546d` ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Reported-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Tested-by: syzbot+7f0483225d0c94cb3441@syzkaller.appspotmail.com cc: Hawkins Jiawei <yin31149@gmail.com> cc: Khalid Masum <khalid.masum.92@gmail.com> cc: Dan Carpenter <dan.carpenter@oracle.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/166135894583.600315.7170979436768124075.stgit@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-25 12:39:40 -07:00
William Dean	aa246499bb	net: delete extra space and tab in blank line delete extra space and tab in blank line, there is no functional change. Reported-by: Hacash Robot <hacashRobot@santino.com> Signed-off-by: William Dean <williamsukatube@gmail.com> Link: https://lore.kernel.org/r/20220723073222.2961602-1-williamsukatube@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-25 19:38:31 -07:00
Justin Stitt	5b47d23646	net: rxrpc: fix clang -Wformat warning When building with Clang we encounter this warning: \| net/rxrpc/rxkad.c:434:33: error: format specifies type 'unsigned short' \| but the argument has type 'u32' (aka 'unsigned int') [-Werror,-Wformat] \| _leave(" = %d [set %hx]", ret, y); y is a u32 but the format specifier is `%hx`. Going from unsigned int to short int results in a loss of data. This is surely not intended behavior. If it is intended, the warning should be suppressed through other means. This patch should get us closer to the goal of enabling the -Wformat flag for Clang builds. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20220707182052.769989-1-justinstitt@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-08 20:15:11 -07:00
Jakub Kicinski	677fb75253	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/cadence/macb_main.c `5cebb40bc9` ("net: macb: Fix PTP one step sync support") `138badbc21` ("net: macb: use NAPI for TX completion path") https://lore.kernel.org/all/20220523111021.31489367@canb.auug.org.au/ net/smc/af_smc.c `75c1edf23b` ("net/smc: postpone sk_refcnt increment in connect()") `3aba103006` ("net/smc: align the connect behaviour with TCP") https://lore.kernel.org/all/20220524114408.4bf1af38@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-05-23 21:19:17 -07:00
David Howells	9a3dedcf18	rxrpc: Fix decision on when to generate an IDLE ACK Fix the decision on when to generate an IDLE ACK by keeping a count of the number of packets we've received, but not yet soft-ACK'd, and the number of packets we've processed, but not yet hard-ACK'd, rather than trying to keep track of which DATA sequence numbers correspond to those points. We then generate an ACK when either counter exceeds 2. The counters are both cleared when we transcribe the information into any sort of ACK packet for transmission. IDLE and DELAY ACKs are skipped if both counters are 0 (ie. no change). Fixes: `805b21b929` ("rxrpc: Send an ACK after every few DATA packets we receive") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:30:53 +01:00
David Howells	81524b6312	rxrpc: Don't let ack.previousPacket regress The previousPacket field in the rx ACK packet should never go backwards - it's now the highest DATA sequence number received, not the last on received (it used to be used for out of sequence detection). Fixes: `248f219cb8` ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:30:53 +01:00
David Howells	8940ba3cfe	rxrpc: Fix overlapping ACK accounting Fix accidental overlapping of Rx-phase ACK accounting with Tx-phase ACK accounting through variables shared between the two. call->acks_* members refer to ACKs received in the Tx phase and call->ackr_* members to ACKs sent/to be sent during the Rx phase. Fixes: `1a2391c30c` ("rxrpc: Fix detection of out of order acks") Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:30:53 +01:00
David Howells	114af61f88	rxrpc: Don't try to resend the request if we're receiving the reply rxrpc has a timer to trigger resending of unacked data packets in a call. This is not cancelled when a client call switches to the receive phase on the basis that most calls don't last long enough for it to ever expire. However, if it does expire after we've started to receive the reply, we shouldn't then go into trying to retransmit or pinging the server to find out if an ack got lost. Fix this by skipping the resend code if we're into receiving the reply to a client call. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:30:53 +01:00
David Howells	88e2215975	rxrpc: Fix listen() setting the bar too high for the prealloc rings AF_RXRPC's listen() handler lets you set the backlog up to 32 (if you bump up the sysctl), but whilst the preallocation circular buffers have 32 slots in them, one of them has to be a dead slot because we're using CIRC_CNT(). This means that listen(rxrpc_sock, 32) will cause an oops when the socket is closed because rxrpc_service_prealloc_one() allocated one too many calls and rxrpc_discard_prealloc() won't then be able to get rid of them because it'll think the ring is empty. rxrpc_release_calls_on_socket() then tries to abort them, but oopses because call->peer isn't yet set. Fix this by setting the maximum backlog to RXRPC_BACKLOG_MAX - 1 to match the ring capacity. BUG: kernel NULL pointer dereference, address: 0000000000000086 ... RIP: 0010:rxrpc_send_abort_packet+0x73/0x240 [rxrpc] Call Trace: <TASK> ? __wake_up_common_lock+0x7a/0x90 ? rxrpc_notify_socket+0x8e/0x140 [rxrpc] ? rxrpc_abort_call+0x4c/0x60 [rxrpc] rxrpc_release_calls_on_socket+0x107/0x1a0 [rxrpc] rxrpc_release+0xc9/0x1c0 [rxrpc] __sock_release+0x37/0xa0 sock_close+0x11/0x20 __fput+0x89/0x240 task_work_run+0x59/0x90 do_exit+0x319/0xaa0 Fixes: `00e907127e` ("rxrpc: Preallocate peers, conns and calls for incoming service requests") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lists.infradead.org/pipermail/linux-afs/2022-March/005079.html Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:30:53 +01:00
David Howells	adc9613ff6	afs: Adjust ACK interpretation to try and cope with NAT If a client's address changes, say if it is NAT'd, this can disrupt an in progress operation. For most operations, this is not much of a problem, but StoreData can be different as some servers modify the target file as the data comes in, so if a store request is disrupted, the file can get corrupted on the server. The problem is that the server doesn't recognise packets that come after the change of address as belonging to the original client and will bounce them, either by sending an OUT_OF_SEQUENCE ACK to the apparent new call if the packet number falls within the initial sequence number window of a call or by sending an EXCEEDS_WINDOW ACK if it falls outside and then aborting it. In both cases, firstPacket will be 1 and previousPacket will be 0 in the ACK information. Fix this by the following means: (1) If a client call receives an EXCEEDS_WINDOW ACK with firstPacket as 1 and previousPacket as 0, assume this indicates that the server saw the incoming packets from a different peer and thus as a different call. Fail the call with error -ENETRESET. (2) Also fail the call if a similar OUT_OF_SEQUENCE ACK occurs if the first packet has been hard-ACK'd. If it hasn't been hard-ACK'd, the ACK packet will cause it to get retransmitted, so the call will just be repeated. (3) Make afs_select_fileserver() treat -ENETRESET as a straight fail of the operation. (4) Prioritise the error code over things like -ECONNRESET as the server did actually respond. (5) Make writeback treat -ENETRESET as a retryable error and make it redirty all the pages involved in a write so that the VM will retry. Note that there is still a circumstance that I can't easily deal with: if the operation is fully received and processed by the server, but the reply is lost due to address change. There's no way to know if the op happened. We can examine the server, but a conflicting change could have been made by a third party - and we can't tell the difference. In such a case, a message like: kAFS: vnode modified {100058:146266} b7->b8 YFS.StoreData64 (op=2646a) will be logged to dmesg on the next op to touch the file and the client will reset the inode state, including invalidating clean parts of the pagecache. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004811.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:03:02 +01:00
David Howells	de696c4784	rxrpc, afs: Fix selection of abort codes The RX_USER_ABORT code should really only be used to indicate that the user of the rxrpc service (ie. userspace) implicitly caused a call to be aborted - for instance if the AF_RXRPC socket is closed whilst the call was in progress. (The user may also explicitly abort a call and specify the abort code to use). Change some of the points of generation to use other abort codes instead: (1) Abort the call with RXGEN_SS_UNMARSHAL or RXGEN_CC_UNMARSHAL if we see ENOMEM and EFAULT during received data delivery and abort with RX_CALL_DEAD in the default case. (2) Abort with RXGEN_SS_MARSHAL if we get ENOMEM whilst trying to send a reply. (3) Abort with RX_CALL_DEAD if we stop hearing from the peer if we had heard from the peer and abort with RX_CALL_TIMEOUT if we hadn't. (4) Abort with RX_CALL_DEAD if we try to disconnect a call that's not completed successfully or been aborted. Reported-by: Jeffrey Altman <jaltman@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:03:02 +01:00
David Howells	4ba68c5192	rxrpc: Return an error to sendmsg if call failed If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that was being given data was aborted remotely or otherwise failed, return an error rather than returning the amount of data buffered for transmission. The call (presumably) did not complete, so there's not much point continuing with it. AF_RXRPC considers it "complete" and so will be unwilling to do anything else with it - and won't send a notification for it, deeming the return from sendmsg sufficient. Not returning an error causes afs to incorrectly handle a StoreData operation that gets interrupted by a change of address due to NAT reconfiguration. This doesn't normally affect most operations since their request parameters tend to fit into a single UDP packet and afs_make_call() returns before the server responds; StoreData is different as it involves transmission of a lot of data. This can be triggered on a client by doing something like: dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512 at one prompt, and then changing the network address at another prompt, e.g.: ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0 Tracing packets on an Auristor fileserver looks something like: 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538) 192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 <ARP exchange for 192.168.6.2> 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0) 192.168.6.1 -> 192.168.6.2 RX 107 ACK Exceeds Window Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001 192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 29321 Call: 4 Source Port: 7000 Destination Port: 7001 The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the condition and generates an abort first and the unmarshal error is a consequence of that at the application layer. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html # v1 Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:03:02 +01:00
David Howells	ad25f5cb39	rxrpc: Fix locking issue There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); <Interrupt> lock(&rxnet->call_lock); * DEADLOCK * 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: `17926a7932` ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:03:01 +01:00
David Howells	a05754295e	rxrpc: Use refcount_t rather than atomic_t Move to using refcount_t rather than atomic_t for refcounts in rxrpc. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:03:01 +01:00
David Howells	33912c2639	rxrpc: Allow list of in-use local UDP endpoints to be viewed in /proc Allow the list of in-use local UDP endpoints in the current network namespace to be viewed in /proc. To aid with this, the endpoint list is converted to an hlist and RCU-safe manipulation is used so that the list can be read with only the RCU read lock held. Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-05-22 21:03:01 +01:00
David Howells	39cb9faa5d	rxrpc: Enable IPv6 checksums on transport socket AF_RXRPC doesn't currently enable IPv6 UDP Tx checksums on the transport socket it opens and the checksums in the packets it generates end up 0. It probably should also enable IPv6 UDP Rx checksums and IPv4 UDP checksums. The latter only seem to be applied if the socket family is AF_INET and don't seem to apply if it's AF_INET6. IPv4 packets from an IPv6 socket seem to have checksums anyway. What seems to have happened is that the inet_inv_convert_csum() call didn't get converted to the appropriate udp_port_cfg parameters - and udp_sock_create() disables checksums unless explicitly told not too. Fix this by enabling the three udp_port_cfg checksum options. Fixes: `1a9b86c9fd` ("rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: Vadim Fedorenko <vfedorenko@novek.ru> cc: David S. Miller <davem@davemloft.net> cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-04-30 13:59:34 +01:00

1 2 3 4 5 ...

941 Commits