linux/net/ceph
Ilya Dryomov 7eb71e0351 libceph: fix double __remove_osd() problem
It turns out it's possible to get __remove_osd() called twice on the
same OSD.  That doesn't sit well with rb_erase() - depending on the
shape of the tree we can get a NULL dereference, a soft lockup or
a random crash at some point in the future as we end up touching freed
memory.  One scenario that I was able to reproduce is as follows:

            <osd3 is idle, on the osd lru list>
<con reset - osd3>
con_fault_finish()
  osd_reset()
                              <osdmap - osd3 down>
                              ceph_osdc_handle_map()
                                <takes map_sem>
                                kick_requests()
                                  <takes request_mutex>
                                  reset_changed_osds()
                                    __reset_osd()
                                      __remove_osd()
                                  <releases request_mutex>
                                <releases map_sem>
    <takes map_sem>
    <takes request_mutex>
    __kick_osd_requests()
      __reset_osd()
        __remove_osd() <-- !!!

A case can be made that osd refcounting is imperfect and reworking it
would be a proper resolution, but for now Sage and I decided to fix
this by adding a safe guard around __remove_osd().

Fixes: http://tracker.ceph.com/issues/8087

Cc: Sage Weil <sage@redhat.com>
Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e: libceph: assert both regular and lingering lists in __remove_osd()
Cc: stable@vger.kernel.org # 3.9+: cc9f1f518c: libceph: change from BUG to WARN for __remove_osd() asserts
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
2015-02-19 14:27:50 +03:00
..
crush crush: add SET_CHOOSELEAF_VARY_R step 2014-04-04 21:07:28 -07:00
armor.c libceph: Fix base64-decoding when input ends in newline. 2011-03-15 09:14:02 -07:00
auth_none.c libceph: Fix NULL pointer dereference in auth client code 2013-07-03 15:32:55 -07:00
auth_none.h net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
auth_x_protocol.h
auth_x.c libceph: fix sparse endianness warnings 2015-01-08 20:36:57 +03:00
auth_x.h libceph: store session key in cephx authorizer 2014-12-17 20:09:50 +03:00
auth.c libceph: wrap auth methods in a mutex 2013-05-01 21:17:15 -07:00
buffer.c libceph: nuke ceph_kvfree() 2014-12-17 20:09:50 +03:00
ceph_common.c libceph: tcp_nodelay support 2015-02-19 13:31:40 +03:00
ceph_fs.c ceph: fix file mode calculation 2011-07-19 11:25:04 -07:00
ceph_hash.c net: cleanup unsigned to unsigned int 2012-04-15 12:44:40 -04:00
ceph_strings.c libceph: nuke pool op infrastructure 2015-02-19 13:31:37 +03:00
crypto.c libceph: do not crash on large auth tickets 2014-11-13 22:21:12 +03:00
crypto.h net: 8021q/bluetooth/bridge/can/ceph: Remove extern from function prototypes 2013-10-19 19:12:11 -04:00
debugfs.c libceph: nuke pool op infrastructure 2015-02-19 13:31:37 +03:00
Kconfig libceph: select CRYPTO_CBC in addition to CRYPTO_AES 2014-10-14 21:03:20 +04:00
Makefile libceph: create source file "net/ceph/snapshot.c" 2013-05-01 21:20:08 -07:00
messenger.c libceph: tcp_nodelay support 2015-02-19 13:31:40 +03:00
mon_client.c libceph: use mon_client.c/put_generic_request() more 2015-02-19 13:31:37 +03:00
msgpool.c libceph: initialize msgpool message types 2012-07-30 09:29:50 -07:00
osd_client.c libceph: fix double __remove_osd() problem 2015-02-19 14:27:50 +03:00
osdmap.c libceph: Convert pr_warning to pr_warn 2014-10-14 21:03:23 +04:00
pagelist.c libceph: reference counting pagelist 2014-10-14 12:56:48 -07:00
pagevec.c ceph_sync_read: stop poking into iov_iter guts 2014-05-06 17:39:42 -04:00
snapshot.c libceph: create source file "net/ceph/snapshot.c" 2013-05-01 21:20:08 -07:00