linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-18 00:53:40 +00:00

Author	SHA1	Message	Date
Ilya Dryomov	25f897773b	libceph: separate multiple ops with commas in debugfs output For requests with multiple ops, separate ops with commas instead of \t, which is a field separator here. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:57:03 -07:00
Ilya Dryomov	70b5bfa360	libceph: sync osd op definitions in rados.h Bring in missing osd ops and strings, use macros to eliminate multiple points of maintenance. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:57:02 -07:00
Yan, Zheng	e4339d28f6	libceph: reference counting pagelist this allow pagelist to present data that may be sent multiple times. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-10-14 12:56:48 -07:00
Ilya Dryomov	91883cd27c	libceph: don't try checking queue_work() return value queue_work() doesn't "fail to queue", it returns false if work was already on a queue, which can't happen here since we allocate event_work right before we queue it. So don't bother at all. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-10-14 21:03:25 +04:00
Joe Perches	b9a678994b	libceph: Convert pr_warning to pr_warn Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-10-14 21:03:23 +04:00
Li RongQing	589506f1e7	libceph: fix a use after free issue in osdmap_set_max_osd If the state variable is krealloced successfully, map->osd_state will be freed, once following two reallocation failed, and exit the function without resetting map->osd_state, map->osd_state become a wild pointer. fix it by resetting them after krealloc successfully. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-10-14 21:03:21 +04:00
Ilya Dryomov	dc220db03f	libceph: select CRYPTO_CBC in addition to CRYPTO_AES We want "cbc(aes)" algorithm, so select CRYPTO_CBC too, not just CRYPTO_AES. Otherwise on !CRYPTO_CBC kernels we fail rbd map/mount with libceph: error -2 building auth method x request Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-10-14 21:03:20 +04:00
Ilya Dryomov	2cc6128ab2	libceph: resend lingering requests with a new tid Both not yet registered (r_linger && list_empty(&r_linger_item)) and registered linger requests should use the new tid on resend to avoid the dup op detection logic on the OSDs, yet we were doing this only for "registered" case. Factor out and simplify the "registered" logic and use the new helper for "not registered" case as well. Fixes: http://tracker.ceph.com/issues/8806 Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-10-14 21:03:19 +04:00
Ilya Dryomov	f671b581f1	libceph: abstract out ceph_osd_request enqueue logic Introduce __enqueue_request() and switch to it. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-10-14 21:03:18 +04:00
Ilya Dryomov	c27a3e4d66	libceph: do not hard code max auth ticket len We hard code cephx auth ticket buffer size to 256 bytes. This isn't enough for any moderate setups and, in case tickets themselves are not encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but ceph_decode_copy() doesn't - it's just a memcpy() wrapper). Since the buffer is allocated dynamically anyway, allocated it a bit later, at the point where we know how much is going to be needed. Fixes: http://tracker.ceph.com/issues/8979 Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-09-10 20:08:36 +04:00
Ilya Dryomov	597cda3577	libceph: add process_one_ticket() helper Add a helper for processing individual cephx auth tickets. Needed for the next commit, which deals with allocating ticket buffers. (Most of the diff here is whitespace - view with git diff -b). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-09-10 20:08:35 +04:00
Sage Weil	73c3d4812b	libceph: gracefully handle large reply messages from the mon We preallocate a few of the message types we get back from the mon. If we get a larger message than we are expecting, fall back to trying to allocate a new one instead of blindly using the one we have. CC: stable@vger.kernel.org Signed-off-by: Sage Weil <sage@redhat.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-09-10 20:08:32 +04:00
Linus Torvalds	8d2d441ac4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is a lot of refactoring and hardening of the libceph and rbd code here from Ilya that fix various smaller bugs, and a few more important fixes with clone overlap. The main fix is a critical change to the request_fn handling to not sleep that was exposed by the recent mutex changes (which will also go to the 3.16 stable series). Yan Zheng has several fixes in here for CephFS fixing ACL handling, time stamps, and request resends when the MDS restarts. Finally, there are a few cleanups from Himangi Saraogi based on Coccinelle" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits) libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly rbd: remove extra newlines from rbd_warn() messages rbd: allocate img_request with GFP_NOIO instead GFP_ATOMIC rbd: rework rbd_request_fn() ceph: fix kick_requests() ceph: fix append mode write ceph: fix sizeof(struct tYpO *) typo ceph: remove redundant memset(0) rbd: take snap_id into account when reading in parent info rbd: do not read in parent info before snap context rbd: update mapping size only on refresh rbd: harden rbd_dev_refresh() and callers a bit rbd: split rbd_dev_spec_update() into two functions rbd: remove unnecessary asserts in rbd_dev_image_probe() rbd: introduce rbd_dev_header_info() rbd: show the entire chain of parent images ceph: replace comma with a semicolon rbd: use rbd_segment_name_free() instead of kfree() ceph: check zero length in ceph_sync_read() ceph: reset r_resend_mds after receiving -ESTALE ...	2014-08-13 17:43:29 -06:00
Ilya Dryomov	5f740d7e15	libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly Determining ->last_piece based on the value of ->page_offset + length is incorrect because length here is the length of the entire message. ->last_piece set to false even if page array data item length is <= PAGE_SIZE, which results in invalid length passed to ceph_tcp_{send,recv}page() and causes various asserts to fire. # cat pages-cursor-init.sh #!/bin/bash rbd create --size 10 --image-format 2 foo FOO_DEV=$(rbd map foo) dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null rbd snap create foo@snap rbd snap protect foo@snap rbd clone foo@snap bar # rbd_resize calls librbd rbd_resize(), size is in bytes ./rbd_resize bar $(((4 << 20) + 512)) rbd resize --size 10 bar BAR_DEV=$(rbd map bar) # trigger a 512-byte copyup -- 512-byte page array data item dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5 The problem exists only in ceph_msg_data_pages_cursor_init(), ceph_msg_data_pages_advance() does the right thing. The size_t cast is unnecessary. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-08-09 11:27:32 +04:00
David Howells	7c3bec0a1f	KEYS: Ceph: Use user_match() Ceph can use user_match() instead of defining its own identical function. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> cc: Tommi Virtanen <tommi.virtanen@dreamhost.com>	2014-07-22 21:46:30 +01:00
David Howells	efa64c0978	KEYS: Ceph: Use key preparsing Make use of key preparsing in Ceph so that quota size determination can take place prior to keyring locking when a key is being added. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Steve Dickson <steved@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com> cc: Tommi Virtanen <tommi.virtanen@dreamhost.com>	2014-07-22 21:46:23 +01:00
Ilya Dryomov	37ab77ac29	libceph: drop osd ref when canceling con work queue_con() bumps osd ref count. We should do the reverse when canceling con work. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:46 +04:00
Ilya Dryomov	2d05f082cb	libceph: nuke ceph_osdc_unregister_linger_request() Remove now unused ceph_osdc_unregister_linger_request(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:45 +04:00
Ilya Dryomov	c9f9b93ddf	libceph: introduce ceph_osdc_cancel_request() Introduce ceph_osdc_cancel_request() intended for canceling requests from the higher layers (rbd and cephfs). Because higher layers are in charge and are supposed to know what and when they are canceling, the request is not completed, only unref'ed and removed from the libceph data structures. __cancel_request() is no longer called before __unregister_request(), because __unregister_request() unconditionally revokes r_request and there is no point in trying to do it twice. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:44 +04:00
Ilya Dryomov	4f23409e0c	libceph: fix linger request check in __unregister_request() We should check if request is on the linger request list of any of the OSDs, not whether request is registered or not. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:44 +04:00
Ilya Dryomov	af59306455	libceph: unregister only registered linger requests Linger requests that have not yet been registered should not be unregistered by __unregister_linger_request(). This messes up ref count and leads to use-after-free. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:44 +04:00
Ilya Dryomov	7c6e6fc53e	libceph: assert both regular and lingering lists in __remove_osd() It is important that both regular and lingering requests lists are empty when the OSD is removed. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:43 +04:00
Ilya Dryomov	6562d661d2	libceph: harden ceph_osdc_request_release() a bit Add some WARN_ONs to alert us when we try to destroy requests that are still registered. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:43 +04:00
Ilya Dryomov	9e94af202a	libceph: move and add dout()s to ceph_osdc_request_{get,put}() Add dout()s to ceph_osdc_request_{get,put}(). Also move them to .c and turn kref release callback into a static function. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:43 +04:00
Ilya Dryomov	0215e44bb3	libceph: move and add dout()s to ceph_msg_{get,put}() Add dout()s to ceph_msg_{get,put}(). Also move them to .c and turn kref release callback into a static function. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:43 +04:00
Ilya Dryomov	bbf37ec3a6	libceph: add maybe_move_osd_to_lru() and switch to it Abstract out __move_osd_to_lru() logic from __unregister_request() and __unregister_linger_request(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:43 +04:00
Ilya Dryomov	1d0326b13b	libceph: rename ceph_osd_request::r_linger_osd to r_linger_osd_item So that: req->r_osd_item --> osd->o_requests list req->r_linger_osd_item --> osd->o_linger_requests list Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-07-08 15:08:42 +04:00
Linus Torvalds	6d87c225f5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This has a mix of bug fixes and cleanups. Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT check when a second rbd image is mapped and a couple memory leaks. Zheng fixes several issues with fragmented directories and multiple MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's patches fix setting and unsetting RBD images read-only. Naturally there are several other cleanups mixed in for good measure" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits) rbd: only set disk to read-only once rbd: move calls that may sleep out of spin lock range rbd: add ioctl for rbd ceph: use truncate_pagecache() instead of truncate_inode_pages() ceph: include time stamp in every MDS request rbd: fix ida/idr memory leak rbd: use reference counts for image requests rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync() rbd: make sure we have latest osdmap on 'rbd map' libceph: add ceph_monc_wait_osdmap() libceph: mon_get_version request infrastructure libceph: recognize poolop requests in debugfs ceph: refactor readpage_nounlock() to make the logic clearer mds: check cap ID when handling cap export message ceph: remember subtree root dirfrag's auth MDS ceph: introduce ceph_fill_fragtree() ceph: handle cap import atomically ceph: pre-allocate ceph_cap struct for ceph_add_cap() ceph: update inode fields according to issued caps rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO ...	2014-06-12 23:06:23 -07:00
Linus Torvalds	f9da455b93	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...	2014-06-12 14:27:40 -07:00
Al Viro	9c1d5284c7	Merge commit '9f12600fe425bc28f0ccba034a77783c09c15af4' into for-linus Backmerge of dcache.c changes from mainline. It's that, or complete rebase... Conflicts: fs/splice.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-06-12 00:28:09 -04:00
stephen hemminger	f647944995	ceph: remove bogus extern Sparse complained about this bogus extern on definition of a function. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-06-11 15:39:19 -07:00
Ilya Dryomov	6044cde6f2	libceph: add ceph_monc_wait_osdmap() Add ceph_monc_wait_osdmap(), which will block until the osdmap with the specified epoch is received or timeout occurs. Export both of these as they are going to be needed by rbd. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-06-06 09:29:57 +08:00
Ilya Dryomov	513a8243d6	libceph: mon_get_version request infrastructure Add support for mon_get_version requests to libceph. This reuses much of the ceph_mon_generic_request infrastructure, with one exception. Older OSDs don't set mon_get_version reply hdr->tid even if the original request had a non-zero tid, which makes it impossible to lookup ceph_mon_generic_request contexts by tid in get_generic_reply() for such replies. As a workaround, we allocate a reply message on the reply path. This can probably interfere with revoke, but I don't see a better way. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-06-06 09:29:57 +08:00
Ilya Dryomov	002b36ba5e	libceph: recognize poolop requests in debugfs Recognize poolop requests in debugfs monc dump, fix prink format specifiers - tid is unsigned. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-06-06 09:29:56 +08:00
Ilya Dryomov	f140662f35	crush: decode and initialize chooseleaf_vary_r Commit `e2b149cc4b` ("crush: add chooseleaf_vary_r tunable") added the crush_map::chooseleaf_vary_r field but missed the decode part. This lead to misdirected requests caused by incorrect raw crush mapping sets. Fixes: http://tracker.ceph.com/issues/8226 Reported-and-Tested-by: Dmitry Smirnov <onlyjob@member.fsf.org> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-05-16 21:29:55 +04:00
Chunwei Chen	178eda29ca	libceph: fix corruption when using page_count 0 page in rbd It has been reported that using ZFSonLinux on rbd will result in memory corruption. The bug report can be found here: https://github.com/zfsonlinux/spl/issues/241 http://tracker.ceph.com/issues/7790 The reason is that ZFS will send pages with page_count 0 into rbd, which in turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with page_count 0, as it will do get_page and put_page, and erroneously free the page. This type of issue has been noted before, and handled in iscsi, drbd, etc. So, rbd should also handle this. This fix address this issue by fall back to slower sendmsg when page_count 0 detected. Cc: Sage Weil <sage@inktank.com> Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: stable@vger.kernel.org Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-05-16 21:29:26 +04:00
Al Viro	2b777c9dd9	ceph_sync_read: stop poking into iov_iter guts Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-05-06 17:39:42 -04:00
Linus Torvalds	5575eeb7b9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "First, there is a critical fix for the new primary-affinity function that went into -rc1. The second batch of patches from Zheng fix a range of problems with directory fragmentation, readdir, and a few odds and ends for cephfs" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: reserve caps for file layout/lock MDS requests ceph: avoid releasing caps that are being used ceph: clear directory's completeness when creating file libceph: fix non-default values check in apply_primary_affinity() ceph: use fpos_cmp() to compare dentry positions ceph: check directory's completeness before emitting directory entry	2014-05-05 15:17:02 -07:00
Ilya Dryomov	92b2e75158	libceph: fix non-default values check in apply_primary_affinity() osd_primary_affinity array is indexed into incorrectly when checking for non-default primary-affinity values. This nullifies the impact of the rest of the apply_primary_affinity() and results in misdirected requests. if (osds[i] != CRUSH_ITEM_NONE && osdmap->osd_primary_affinity[i] != ^^^ CEPH_OSD_DEFAULT_PRIMARY_AFFINITY) { For a pool with size 2, this always ends up checking osd0 and osd1 primary_affinity values, instead of the values that correspond to the osds in question. E.g., given a [2,3] up set and a [max,max,0,max] primary affinity vector, requests are still sent to osd2, because both osd0 and osd1 happen to have max primary_affinity values and therefore we return from apply_primary_affinity() early on the premise that all osds in the given set have max (default) values. Fix it. Fixes: http://tracker.ceph.com/issues/7954 Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>	2014-04-28 12:54:10 -07:00
David S. Miller	676d23690f	net: Fix use after free by removing length arg from sk_data_ready callbacks. Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-04-11 16:15:36 -04:00
Linus Torvalds	240cd6a817	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "The biggest chunk is a series of patches from Ilya that add support for new Ceph osd and crush map features, including some new tunables, primary affinity, and the new encoding that is needed for erasure coding support. This brings things into parity with the server side and the looming firefly release. There is also support for allocation hints in RBD that help limit fragmentation on the server side. There is also a series of patches from Zheng fixing NFS reexport, directory fragmentation support, flock vs fnctl behavior, and some issues with clustered MDS. Finally, there are some miscellaneous fixes from Yunchuan Wen for fscache, Fabian Frederick for ACLs, and from me for fsync(dirfd) behavior" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (79 commits) ceph: skip invalid dentry during dcache readdir libceph: dump pool {read,write}_tier to debugfs libceph: output primary affinity values on osdmap updates ceph: flush cap release queue when trimming session caps ceph: don't grabs open file reference for aborted request ceph: drop extra open file reference in ceph_atomic_open() ceph: preallocate buffer for readdir reply libceph: enable PRIMARY_AFFINITY feature bit libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting() libceph: add support for osd primary affinity libceph: add support for primary_temp mappings libceph: return primary from ceph_calc_pg_acting() libceph: switch ceph_calc_pg_acting() to new helpers libceph: introduce apply_temps() helper libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers libceph: ceph_can_shift_osds(pool) and pool type defines libceph: ceph_osd_{exists,is_up,is_down}(osd) definitions libceph: enable OSDMAP_ENC feature bit libceph: primary_affinity decode bits libceph: primary_affinity infrastructure ...	2014-04-07 11:09:13 -07:00
Ilya Dryomov	8a53f23fcd	libceph: dump pool {read,write}_tier to debugfs Dump pool {read,write}_tier to debugfs. While at it, fixup printk type specifiers and remove the unnecessary cast to unsigned long long. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-04-04 21:08:29 -07:00
Ilya Dryomov	f31da0f3e1	libceph: output primary affinity values on osdmap updates Similar to osd weights, output primary affinity values on incremental osdmap updates. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>	2014-04-04 21:08:28 -07:00
Ilya Dryomov	c4c1228525	libceph: redo ceph_calc_pg_primary() in terms of ceph_calc_pg_acting() Reimplement ceph_calc_pg_primary() in terms of ceph_calc_pg_acting() and get rid of the now unused calc_pg_raw(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:19 -07:00
Ilya Dryomov	47ec1f3cc4	libceph: add support for osd primary affinity Respond to non-default primary_affinity values accordingly. (Primary affinity allows the admin to shift 'primary responsibility' away from specific osds, effectively shifting around the read side of the workload and whatever overhead is incurred by peering and writes by virtue of being the primary). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:17 -07:00
Ilya Dryomov	5e8d4d36bf	libceph: add support for primary_temp mappings Change apply_temp() to override primary in the same way pg_temp overrides osd set. primary_temp overrides pg_temp primary too. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:16 -07:00
Ilya Dryomov	8008ab1080	libceph: return primary from ceph_calc_pg_acting() In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call sites accordingly. Primary is now specified separately from the order of osds in the set. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:14 -07:00
Ilya Dryomov	ac972230e2	libceph: switch ceph_calc_pg_acting() to new helpers Switch ceph_calc_pg_acting() to new helpers: pg_to_raw_osds(), raw_to_up_osds() and apply_temps(). Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:13 -07:00
Ilya Dryomov	45966c3467	libceph: introduce apply_temps() helper apply_temp() helper for applying various temporary mappings (at this point only pg_temp mappings) to the up set, therefore transforming it into an acting set. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:11 -07:00
Ilya Dryomov	2bd93d4d7e	libceph: introduce pg_to_raw_osds() and raw_to_up_osds() helpers pg_to_raw_osds() helper for computing a raw (crush) set, which can contain non-existant and down osds. raw_to_up_osds() helper for pruning non-existant and down osds from the raw set, therefore transforming it into an up set, and determining up primary. Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>	2014-04-04 21:08:10 -07:00

1 2 3 4 5 ...

608 Commits