linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-24 13:11:40 +00:00

Author	SHA1	Message	Date
Eric Dumazet	95c9617472	net: cleanup unsigned to unsigned int Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-15 12:44:40 -04:00
Alex Elder	8d63e318c4	libceph: isolate kmap() call in write_partial_msg_pages() In write_partial_msg_pages(), every case now does an identical call to kmap(page). Instead, just call it once inside the CRC-computing block where it's needed. Move the definition of kaddr inside that block, and make it a (char *) to ensure portable pointer arithmetic. We still don't kunmap() it until after the sendpage() call, in case that also ends up needing to use the mapping. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:52 -05:00
Alex Elder	9bd1966344	libceph: rename "page_shift" variable to something sensible In write_partial_msg_pages() there is a local variable used to track the starting offset within a bio segment to use. Its name, "page_shift" defies the Linux convention of using that name for log-base-2(page size). Since it's only used in the bio case rename it "bio_offset". Use it along with the page_pos field to compute the memory offset when computing CRC's in that function. This makes the bio case match the others more closely. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:52 -05:00
Alex Elder	0cdf9e6018	libceph: get rid of zero_page_address There's not a lot of benefit to zero_page_address, which basically holds a mapping of the zero page through the life of the messenger module. Even with our own mapping, the sendpage interface where it's used may need to kmap() it again. It's almost certain to be in low memory anyway. So stop treating the zero page specially in write_partial_msg_pages() and just get rid of zero_page_address entirely. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:52 -05:00
Alex Elder	e36b13cceb	libceph: only call kernel_sendpage() via helper Make ceph_tcp_sendpage() be the only place kernel_sendpage() is used, by using this helper in write_partial_msg_pages(). Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:52 -05:00
Alex Elder	31739139f3	libceph: use kernel_sendpage() for sending zeroes If a message queued for send gets revoked, zeroes are sent over the wire instead of any unsent data. This is done by constructing a message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg(). Since we are already working with a page in this case we can use the sendpage interface instead. Create a new ceph_tcp_sendpage() helper that sets up flags to match the way ceph_tcp_sendmsg() does now. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	37675b0f42	libceph: fix inverted crc option logic CRC's are computed for all messages between ceph entities. The CRC computation for the data portion of message can optionally be disabled using the "nocrc" (common) ceph option. The default is for CRC computation for the data portion to be enabled. Unfortunately, the code that implements this feature interprets the feature flag wrong, meaning that by default the CRC's have not been computed (or checked) for the data portion of messages unless the "nocrc" option was supplied. Fix this, in write_partial_msg_pages() and read_partial_message(). Also change the flag variable in write_partial_msg_pages() to be "no_datacrc" to match the usage elsewhere in the file. This fixes http://tracker.newdream.net/issues/2064 Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	84495f4961	libceph: some simple changes Nothing too big here. - define the size of the buffer used for consuming ignored incoming data using a symbolic constant - simplify the condition determining whether to unmap the page in write_partial_msg_pages(): do it for crc but not if the page is the zero page Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	f42299e6c3	libceph: small refactor in write_partial_kvec() Make a small change in the code that counts down kvecs consumed by a ceph_tcp_sendmsg() call. Same functionality, just blocked out a little differently. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	fe3ad593e2	libceph: do crc calculations outside loop Move blocks of code out of loops in read_partial_message_section() and read_partial_message(). They were only was getting called at the end of the last iteration of the loop anyway. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	a9a0c51af4	libceph: separate CRC calculation from byte swapping Calculate CRC in a separate step from rearranging the byte order of the result, to improve clarity and readability. Use offsetof() to determine the number of bytes to include in the CRC calculation. In read_partial_message(), switch which value gets byte-swapped, since the just-computed CRC is already likely to be in a register. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	bca064d236	libceph: use "do" in CRC-related Boolean variables Change the name (and type) of a few CRC-related Boolean local variables so they contain the word "do", to distingish their purpose from variables used for holding an actual CRC value. Note that in the process of doing this I identified a fairly serious logic error in write_partial_msg_pages(): the value of "do_crc" assigned appears to be the opposite of what it should be. No attempt to fix this is made here; this change preserves the erroneous behavior. The problem I found is documented here: http://tracker.newdream.net/issues/2064 Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	cffaba15cd	ceph: ensure Boolean options support both senses Many ceph-related Boolean options offer the ability to both enable and disable a feature. For all those that don't offer this, add a new option so that they do. Note that ceph_show_options()--which reports mount options currently in effect--only reports the option if it is different from the default value. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	d3002b974c	libceph: a few small changes This gathers a number of very minor changes: - use %hu when formatting the a socket address's address family - null out the ceph_msgr_wq pointer after the queue has been destroyed - drop a needless cast in ceph_write_space() - add a WARN() call in ceph_state_change() in the event an unrecognized socket state is encountered - rearrange the logic in ceph_con_get() and ceph_con_put() so that: - the reference counts are only atomically read once - the values displayed via dout() calls are known to be meaningful at the time they are formatted Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	41617d0c9c	libceph: make ceph_tcp_connect() return int There is no real need for ceph_tcp_connect() to return the socket pointer it creates, since it already assigns it to con->sock, which is visible to the caller. Instead, have it return an error code, which tidies things up a bit. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:51 -05:00
Alex Elder	6173d1f02f	libceph: encapsulate some messenger cleanup code Define a helper function to perform various cleanup operations. Use it both in the exit routine and in the init routine in the event of an error. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	e0f43c9419	libceph: make ceph_msgr_wq private The messenger workqueue has no need to be public. So give it static scope. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	859eb79948	libceph: encapsulate connection kvec operations Encapsulate the operation of adding a new chunk of data to the next open slot in a ceph_connection's out_kvec array. Also add a "reset" operation to make subsequent add operations start at the beginning of the array again. Use these routines throughout, avoiding duplicate code and ensuring all calls are handled consistently. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	963be4d770	libceph: move prepare_write_banner() One of the arguments to prepare_write_connect() indicates whether it is being called immediately after a call to prepare_write_banner(). Move the prepare_write_banner() call inside prepare_write_connect(), and reinterpret (and rename) the "after_banner" argument so it indicates that prepare_write_connect() should make the call rather than should know it has already been made. This was split out from the next patch to highlight this change in logic. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	ee57741c52	rbd: make ceph_parse_options() return a pointer ceph_parse_options() takes the address of a pointer as an argument and uses it to return the address of an allocated structure if successful. With this interface is not evident at call sites that the pointer is always initialized. Change the interface to return the address instead (or a pointer-coded error code) to make the validity of the returned pointer obvious. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	99f0f3b2c4	ceph: eliminate some abusive casts This fixes some spots where a type cast to (void *) was used as as a universal type hiding mechanism. Instead, properly cast the type to the intended target type. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:45 -05:00
Alex Elder	bd40614512	ceph: eliminate some needless casts This eliminates type casts in some places where they are not required. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:45 -05:00
Alex Elder	f64a93172b	ceph: kill addr_str_lock spinlock; use atomic instead A spinlock is used to protect a value used for selecting an array index for a string used for formatting a socket address for human consumption. The index is reset to 0 if it ever reaches the maximum index value. Instead, use an ever-increasing atomic variable as a sequence number, and compute the array index by masking off all but the sequence number's lowest bits. Make the number of entries in the array a power of two to allow the use of such a mask (to avoid jumps in the index value when the sequence number wraps). The length of these strings is somewhat arbitrarily set at 60 bytes. The worst-case length of a string produced is 54 bytes, for an IPv6 address that can't be shortened, e.g.: [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767 Change it so we arbitrarily use 64 bytes instead; if nothing else it will make the array of these line up better in hex dumps. Rename a few things to reinforce the distinction between the number of strings in the array and the length of individual strings. Signed-off-by: Alex Elder <elder@newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:45 -05:00
Alex Elder	a5bc3129a2	ceph: make use of "else" where appropriate Rearrange ceph_tcp_connect() a bit, making use of "else" rather than re-testing a value with consecutive "if" statements. Don't record a connection's socket pointer unless the connect operation is successful. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:45 -05:00
Alex Elder	5766651971	ceph: use a shared zero page rather than one per messenger Each messenger allocates a page to be used when writing zeroes out in the event of error or other abnormal condition. Instead, use the kernel ZERO_PAGE() for that purpose. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:45 -05:00
Xi Wang	6448669777	libceph: fix overflow check in crush_decode() The existing overflow check (n > ULONG_MAX / b) didn't work, because n = ULONG_MAX / b would both bypass the check and still overflow the allocation size a + n * b. The correct check should be (n > (ULONG_MAX - a) / b). Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:45 -05:00
Jim Schutt	182fac2689	net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer The Ceph messenger would sometimes queue multiple work items to write data to a socket when the socket buffer was full. Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the same way that net/core/stream.c:sk_stream_write_space() does, i.e., clearing it only when sufficient space is available in the socket buffer. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:45 -05:00
Linus Torvalds	6c073a7ee2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix safety of rbd_put_client() rbd: fix a memory leak in rbd_get_client() ceph: create a new session lock to avoid lock inversion ceph: fix length validation in parse_reply_info() ceph: initialize client debugfs outside of monc->mutex ceph: change "ceph.layout" xattr to be "ceph.file.layout"	2012-02-02 15:47:33 -08:00
Sage Weil	ab434b60ab	ceph: initialize client debugfs outside of monc->mutex Initializing debufs under monc->mutex introduces a lock dependency for sb->s_type->i_mutex_key, which (combined with several other dependencies) leads to an annoying lockdep warning. There's no particular reason to do the debugfs setup under this lock, so move it out. It used to be the case that our first monmap could come from the OSD; that is no longer the case with recent servers, so we will reliably set up the client entry during the initial authentication. We don't have to worry about racing with debugfs teardown by ceph_debugfs_client_cleanup() because ceph_destroy_client() calls ceph_msgr_flush() first, which will wait for the message dispatch work to complete (and the debugfs init to complete). Fixes: #1940 Signed-off-by: Sage Weil <sage@newdream.net>	2012-02-02 12:49:01 -08:00
Sage Weil	56e925b677	libceph: remove useless return value for osd_client __send_request() Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-10 08:57:03 -08:00
Sage Weil	e11b05d31f	crush: fix force for non-root TAKE Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-10 08:56:57 -08:00
Thomas Meyer	186482560f	ceph: Use kmemdup rather than duplicating its implementation Use kmemdup rather than duplicating its implementation The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. Signed-off-by: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Sage Weil <sage@newdream.net>	2012-01-10 08:56:54 -08:00
Linus Torvalds	653f42f6b6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: add missing spin_unlock at ceph_mdsc_build_path() ceph: fix SEEK_CUR, SEEK_SET regression crush: fix mapping calculation when force argument doesn't exist ceph: use i_ceph_lock instead of i_lock rbd: remove buggy rollback functionality rbd: return an error when an invalid header is read ceph: fix rasize reporting by ceph_show_options	2011-12-13 14:59:42 -08:00
Sage Weil	f1932fc1a6	crush: fix mapping calculation when force argument doesn't exist If the force argument isn't valid, we should continue calculating a mapping as if it weren't specified. Signed-off-by: Sage Weil <sage@newdream.net>	2011-12-12 09:09:45 -08:00
Linus Torvalds	c292fe4aae	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: Allocate larger oid buffer in request msgs ceph: initialize root dentry ceph: fix iput race when queueing inode work	2011-11-21 12:11:13 -08:00
Stratos Psomadakis	224736d911	libceph: Allocate larger oid buffer in request msgs ceph_osd_request struct allocates a 40-byte buffer for object names. RBD image names can be up to 96 chars long (100 with the .rbd suffix), which results in the object name for the image being truncated, and a subsequent map failure. Increase the oid buffer in request messages, in order to avoid the truncation. Signed-off-by: Stratos Psomadakis <psomas@grnet.gr> Signed-off-by: Sage Weil <sage@newdream.net>	2011-11-11 09:50:19 -08:00
Paul Gortmaker	bc3b2d7fb9	net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:30 -04:00
Sage Weil	38d6453ca3	libceph: force resend of osd requests if we skip an osdmap If we skip over one or more map epochs, we need to resend all osd requests because it is possible they remapped to other servers and then back. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:17 -07:00
Noah Watkins	ee3b56f265	ceph: use kernel DNS resolver Change ceph_parse_ips to take either names given as IP addresses or standard hostnames (e.g. localhost). The DNS lookup is done using the dns_resolver facility similar to its use in AFS, NFS, and CIFS. This patch defines CONFIG_CEPH_LIB_USE_DNS_RESOLVER that controls if this feature is on or off. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:16 -07:00
Noah Watkins	49d9224c04	ceph: fix ceph_monc_init memory leak failure clean up does not consider ceph_auth_init. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:16 -07:00
Sage Weil	f0ed1b7cef	libceph: warn on msg allocation failures Any non-masked msg allocation failure should generate a warning and stack trace to the console. All of these need to eventually be replaced by safe preallocation or msgpools. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:16 -07:00
Sage Weil	b61c27636f	libceph: don't complain on msgpool alloc failures The pool allocation failures are masked by the pool; there is no need to spam the console about them. (That's the whole point of having the pool in the first place.) Mark msg allocations whose failure is safely handled as such. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Sage Weil	f6a2f5be07	libceph: always preallocate mon connection Allocate the mon connection on init. We already reuse it across reconnects. Remove now unnecessary (and incomplete) NULL checks. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Sage Weil	6ab00d465a	libceph: create messenger with client This simplifies the init/shutdown paths, and makes client->msgr available during the rest of the setup process. Signed-off-by: Sage Weil <sage@newdream.net>	2011-10-25 16:10:15 -07:00
Linus Torvalds	92bb062fe3	Merge branch 'for-linus' of git://github.com/NewDreamNetwork/ceph-client * 'for-linus' of git://github.com/NewDreamNetwork/ceph-client: libceph: fix pg_temp mapping update libceph: fix pg_temp mapping calculation libceph: fix linger request requeuing libceph: fix parse options memory leak libceph: initialize ack_stamp to avoid unnecessary connection reset	2011-09-29 19:58:58 -07:00
Sage Weil	8adc8b3d78	libceph: fix pg_temp mapping update The incremental map updates have a record for each pg_temp mapping that is to be add/updated (len > 0) or removed (len == 0). The old code was written as if the updates were a complete enumeration; that was just wrong. Update the code to remove 0-length entries and drop the rbtree traversal. This avoids misdirected (and hung) requests that manifest as server errors like [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-28 10:13:35 -07:00
Sage Weil	782e182e91	libceph: fix pg_temp mapping calculation We need to apply the modulo pg_num calculation before looking up a pgid in the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes (some) misdirected requests that result in messages like [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 on the server and stall make the client block without getting a reply (at least until the pg_temp mapping goes way, but that can take a long long time). Reorder calc_pg_raw() a bit to make more sense. Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-28 10:13:31 -07:00
Sage Weil	935b639a04	libceph: fix linger request requeuing The r_req_lru_item list node moves between several lists, and that cycle is not directly related (and does not begin) with __register_request(). Initialize it in the request constructor, not __register_request(). This fixes later badness (below) when OSDs restart underneath an rbd mount. Crashes we've seen due to this include: [ 213.974288] kernel BUG at net/ceph/messenger.c:2193! and [ 144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 144.035278] IP: [<ffffffffa036c053>] con_work+0x1463/0x2ce0 [libceph] Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-16 11:13:17 -07:00
Noah Watkins	1cad78932a	libceph: fix parse options memory leak ceph_destroy_options does not free opt->mon_addr that is allocated in ceph_parse_options. Signed-off-by: Noah Watkins <noahwatkins@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-16 09:19:53 -07:00
Jim Schutt	c0d5f9db1c	libceph: initialize ack_stamp to avoid unnecessary connection reset Commit `4cf9d54463` recorded when an outgoing ceph message was ACKed, in order to avoid unnecessary connection resets when an OSD is busy. However, ack_stamp is uninitialized, so there is a window between when the message is sent and when it is ACKed in which handle_timeout() interprets the unitialized value as an expired timeout, and resets the connection unnecessarily. Close the window by initializing ack_stamp. Signed-off-by: Jim Schutt <jaschut@sandia.gov> Signed-off-by: Sage Weil <sage@newdream.net>	2011-09-16 09:16:22 -07:00

1 2 3

120 Commits