linux/net
Alex Elder e02493c07c libceph: requeue only sent requests when kicking
The osd expects incoming requests for a given object from a given
client to arrive in order, with the tid for each request being
greater than the tid for requests that have already arrived.  This
patch fixes two places the osd client might not maintain that
ordering.

For the osd client, the connection fault method is osd_reset().
That function calls __reset_osd() to close and re-open the
connection, then calls __kick_osd_requests() to cause all
outstanding requests for the affected osd to be re-sent after
the connection has been re-established.

When an osd is reset, any in-flight messages will need to be
re-sent.  An osd client maintains distinct lists for unsent and
in-flight messages.  Meanwhile, an osd maintains a single list of
all its requests (both sent and un-sent).  (Each message is linked
into two lists--one for the osd client and one list for the osd.)

To process an osd "kick" operation, the request list for the *osd*
is traversed, and each request is moved off whichever osd *client*
list it was on (unsent or sent) and placed onto the osd client's
unsent list.  (It remains where it is on the osd's request list.)

When that is done, osd_reset() calls __send_queued() to cause each
of the osd client's unsent messages to be sent.

OK, with that background...

As the osd request list is traversed each request is prepended to
the osd client's unsent list in the order they're seen.  The effect
of this is to reverse the order of these requests as they are put
(back) onto the unsent list.

Instead, build up a list of only the requests for an osd that have
already been sent (by checking their r_sent flag values).  Once an
unsent request is found, stop examining requests and prepend the
requests that need re-sending to the osd client's unsent list.

Preserve the original order of requests in the process (previously
re-queued requests were reversed in this process).  Because they
have already been sent, they will have lower tids than any request
already present on the unsent list.

Just below that, traverse the linger list in forward order as
before, but add them to the *tail* of the list rather than the head.
These requests get re-registered, and in the process are give a new
(higher) tid, so the should go at the end.

This partially resolves:
    http://tracker.ceph.com/issues/4392

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-off-by: Sage Weil <sage@inktank.com>
2013-05-01 21:17:18 -07:00
..
9p Revert parts of "hlist: drop the node parameter from iterators" 2013-03-08 15:05:34 -08:00
802 net/802/mrp: fix possible race condition when calling mrp_pdu_queue() 2013-04-12 15:10:48 -04:00
8021q 8021q: fix a potential use-after-free 2013-03-24 17:27:28 -04:00
appletalk hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
atm atm: update msg_namelen in vcc_recvmsg() 2013-04-07 16:28:00 -04:00
ax25 ax25: fix info leak via msg_name in ax25_recvmsg() 2013-04-07 16:28:00 -04:00
batman-adv batman-adv: make is_my_mac() check for the current mesh only 2013-04-17 22:31:22 +02:00
bluetooth Bluetooth: SCO - Fix missing msg_namelen update in sco_sock_recvmsg() 2013-04-07 16:28:01 -04:00
bridge bridge: make user modified path cost sticky 2013-04-15 14:03:44 -04:00
caif caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg() 2013-04-07 16:28:01 -04:00
can can: gw: use kmem_cache_free() instead of kfree() 2013-04-09 09:58:44 +02:00
ceph libceph: requeue only sent requests when kicking 2013-05-01 21:17:18 -07:00
core net: rate-limit warn-bad-offload splats. 2013-04-19 17:57:49 -04:00
dcb dcbnl: fix various netlink info leaks 2013-03-10 05:19:26 -04:00
dccp Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
decnet hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
dns_resolver Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2012-12-16 15:40:50 -08:00
dsa dsa: make dsa_switch_setup check for valid port names 2013-01-21 15:40:12 -05:00
ethernet net: split eth_mac_addr for better error handling 2013-01-21 14:07:44 -05:00
ieee802154 6lowpan: Fix endianness issue in is_addr_link_local(). 2013-03-10 16:49:35 -04:00
ipv4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2013-04-19 14:24:47 -04:00
ipv6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2013-04-19 14:24:47 -04:00
ipx hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
irda irda: small read past the end of array in debug code 2013-04-19 17:32:31 -04:00
iucv af_iucv: fix recvmsg by replacing skb_pull() function 2013-04-08 17:16:57 -04:00
key Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2013-03-27 14:07:04 -04:00
l2tp l2tp: fix info leak in l2tp_ip6_recvmsg() 2013-04-07 16:28:01 -04:00
lapb net/lapb: remove depends on CONFIG_EXPERIMENTAL 2013-01-11 11:40:01 -08:00
llc llc: Fix missing msg_namelen update in llc_ui_recvmsg() 2013-04-07 16:28:01 -04:00
mac80211 Merge branch 'for-john' of git://x-git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-12 13:20:51 -04:00
mac802154 Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
netfilter netfilter: ipset: bitmap:ip,mac: fix listing with timeout 2013-04-18 23:40:41 +02:00
netlabel netlabel: fix build problems when CONFIG_IPV6=n 2013-03-08 11:33:51 -05:00
netlink genetlink: trigger BUG_ON if a group name is too long 2013-03-20 12:05:51 -04:00
netrom netrom: fix invalid use of sizeof in nr_recvmsg() 2013-04-08 22:49:23 -04:00
nfc NFC: llcp: fix info leaks via msg_name in llcp_sock_recvmsg() 2013-04-07 16:28:02 -04:00
openvswitch openvswitch: correct an invalid BUG_ON 2013-03-27 09:07:41 -07:00
packet hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
phonet hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
rds net/rds: zero last byte for strncpy 2013-03-08 00:35:44 -05:00
rfkill rfkill: don't use [delayed_]work_pending() 2012-12-28 13:40:16 -08:00
rose rose: fix info leak via msg_name in rose_recvmsg() 2013-04-07 16:28:02 -04:00
rxrpc Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
sched pkt_sched: fix error return code in fw_change_attrs() 2013-04-19 17:34:53 -04:00
sctp sctp: don't break the loop while meeting the active_path so as to find the matched transport 2013-03-13 10:09:55 -04:00
sunrpc NFS client bugfixes for Linux 3.9 2013-04-10 09:00:51 -07:00
tipc tipc: fix info leaks via msg_name in recv_msg/recv_stream 2013-04-07 16:28:02 -04:00
unix af_unix: If we don't care about credentials coallesce all messages 2013-04-05 00:49:13 -04:00
vmw_vsock VSOCK: Fix missing msg_namelen update in vsock_stream_recvmsg() 2013-04-07 16:28:02 -04:00
wimax
wireless Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-04-03 14:19:48 -04:00
x25 hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2013-03-27 14:07:04 -04:00
compat.c
Kconfig Driver core patches for 3.9-rc1 2013-02-21 12:05:51 -08:00
Makefile VSOCK: Introduce VM Sockets 2013-02-10 19:41:08 -05:00
nonet.c
socket.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2013-02-26 20:16:07 -08:00
sysctl_net.c user_ns: get rid of duplicate code in net_ctl_permissions 2012-11-18 20:32:45 -05:00