linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-08 13:11:45 +00:00

Author	SHA1	Message	Date
Asai Thambi S P	2df7aa96e7	mtip32xx: Set custom timeouts for PIO commands This change sets custom timeouts depending on PIO command. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Asai Thambi S P	6bb688c048	mtip32xx: fix clearing an incorrect register in mtip_init_port Fix clearing an incorrect register in mtip_init_port Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-31 08:36:55 +02:00
Konrad Rzeszutek Wilk	8c9ce606a6	xen/blkback: Copy id field when doing BLKIF_DISCARD. We weren't copying the id field so when we sent the response back to the frontend (especially with a 64-bit host and 32-bit guest), we ended up using a random value. This lead to the frontend crashing as it would try to pass to __blk_end_request_all a NULL 'struct request' (b/c it would use the 'id' to find the proper 'struct request' in its shadow array) and end up crashing: BUG: unable to handle kernel NULL pointer dereference at 000000e4 IP: [<c0646d4c>] __blk_end_request_all+0xc/0x40 .. snip.. EIP is at __blk_end_request_all+0xc/0x40 .. snip.. [<ed95db72>] blkif_interrupt+0x172/0x330 [xen_blkfront] This fixes the bug by passing in the proper id for the response. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=824641 CC: stable@kernel.org Tested-by: William Dauchy <wdauchy@gmail.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-30 17:20:04 -04:00
Linus Torvalds	af56e0aa35	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "There are some updates and cleanups to the CRUSH placement code, a bug fix with incremental maps, several cleanups and fixes from Josh Durgin in the RBD block device code, a series of cleanups and bug fixes from Alex Elder in the messenger code, and some miscellaneous bounds checking and gfp cleanups/fixes." Fix up trivial conflicts in net/ceph/{messenger.c,osdmap.c} due to the networking people preferring "unsigned int" over just "unsigned". * git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (45 commits) libceph: fix pg_temp updates libceph: avoid unregistering osd request when not registered ceph: add auth buf in prepare_write_connect() ceph: rename prepare_connect_authorizer() ceph: return pointer from prepare_connect_authorizer() ceph: use info returned by get_authorizer ceph: have get_authorizer methods return pointers ceph: ensure auth ops are defined before use ceph: messenger: reduce args to create_authorizer ceph: define ceph_auth_handshake type ceph: messenger: check return from get_authorizer ceph: messenger: rework prepare_connect_authorizer() ceph: messenger: check prepare_write_connect() result ceph: don't set WRITE_PENDING too early ceph: drop msgr argument from prepare_write_connect() ceph: messenger: send banner in process_connect() ceph: messenger: reset connection kvec caller libceph: don't reset kvec in prepare_write_banner() ceph: ignore preferred_osd field ceph: fully initialize new layout ...	2012-05-30 11:17:19 -07:00
Linus Torvalds	a70f35af4e	Merge branch 'for-3.5/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the driver related changes for 3.5. It contains: - The floppy changes from Jiri. Jiri is now also marked as the maintainer of floppy.c, I shall be publically branding his forehead with red hot iron at the next opportune moment. - A batch of drbd updates and fixes from the linbit crew, as well as fixes from others. - Two small fixes for xen-blkfront courtesy of Jan." * 'for-3.5/drivers' of git://git.kernel.dk/linux-block: (70 commits) floppy: take over maintainership floppy: remove floppy-specific O_EXCL handling floppy: convert to delayed work and single-thread wq xen-blkfront: module exit handling adjustments xen-blkfront: properly name all devices drbd: grammar fix in log message drbd: check MODULE for THIS_MODULE drbd: Restore the request restart logic drbd: introduce a bio_set to allocate housekeeping bios from drbd: remove unused define drbd: bm_page_async_io: properly initialize page->private drbd: use the newly introduced page pool for bitmap IO drbd: add page pool to be used for meta data IO drbd: allow bitmap to change during writeout from resync_finished drbd: fix race between drbdadm invalidate/verify and finishing resync drbd: fix resend/resubmit of frozen IO drbd: Ensure that data_size is not 0 before using data_size-1 as index drbd: Delay/reject other state changes while establishing a connection drbd: move put_ldev from __req_mod() to the endio callback drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE ...	2012-05-30 09:05:47 -07:00
Linus Torvalds	99262a3daf	Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPuv35AAoJENkgDmzRrbjxUx4P/0uc+0oNnZv11vYQsqHuhURa zMlsVdlXGVkvPqQiLY0QkrK5LcO6KiSnSk8vEnOYFIPjL4wNqL/4RRRLnTAJwmE+ lsrL9DblI8Ira/EZRv7d2L12QrP+F2ZGKOZr67uVxSaxH71fUqtiJ0jqA/I8AYH7 /V7+DgdIB1DD28Ya/JEFEUi41F08A6MU10hpaQWy9kXv09gCc9apgvH7/S3s9DaQ G640YWkoKZAx/OFBb8XFvpu9LqZcVl02Nl8goMZOKnMctC4iU3km7HeVjfwCgLjO AdA5spLMhDkS/xrpI0mSQ/wT0k0+sSYW5vEdW9N4XLZza0NgH9GfU4RtEuK85Slj 7bPviZOcpjtt0sGi4wXCaVjZyHROX6tyRvTMUAIj3D0oJglb5T9D3MCvQnadILb0 I0+7gk3d9rHqkO6CmjNaZG9IwR9NpFkbuolcFQuEaZoUMoKd2pYNQyxpbFGl+jCl 7ViFHAy+fydNqDoETKincld4A43KWxOV7jyEJd7hloKcCixsqI7ZdPS7X8amec72 a0hfNgMJzarZkTgo61Hair/d+vKGRJPgEdF1Yq76SDhYKD1TeWeDjmboctsiMjqe f5M4C6IdNJj9cDIlCxMk+3bX250oy7KG77v7Ux0/7nvtSWVa3yEMowD57hnn1But 0gNC8bjXDHRsho90rDRN =Kj9v -----END PGP SIGNATURE----- Merge tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus Pull virtio updates from Rusty Russell. * tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: virtio: fix typo in comment virtio-mmio: Devices parameter parsing virtio_blk: Drop unused request tracking list virtio-blk: Fix hot-unplug race in remove method virtio: Use ida to allocate virtio index virtio: balloon: separate out common code between remove and freeze functions virtio: balloon: drop restore_common() 9p: disconnect channel when PCI device is removed virtio: update documentation to v0.9.5 of spec	2012-05-21 20:20:23 -07:00
Asias He	f65ca1dc6a	virtio_blk: Drop unused request tracking list Benchmark shows small performance improvement on fusion io device. Before: seq-read : io=1,024MB, bw=19,982KB/s, iops=39,964, runt= 52475msec seq-write: io=1,024MB, bw=20,321KB/s, iops=40,641, runt= 51601msec rnd-read : io=1,024MB, bw=15,404KB/s, iops=30,808, runt= 68070msec rnd-write: io=1,024MB, bw=14,776KB/s, iops=29,552, runt= 70963msec After: seq-read : io=1,024MB, bw=20,343KB/s, iops=40,685, runt= 51546msec seq-write: io=1,024MB, bw=20,803KB/s, iops=41,606, runt= 50404msec rnd-read : io=1,024MB, bw=16,221KB/s, iops=32,442, runt= 64642msec rnd-write: io=1,024MB, bw=15,199KB/s, iops=30,397, runt= 68991msec Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-05-22 12:16:14 +09:30
Asias He	b79d866c8b	virtio-blk: Fix hot-unplug race in remove method If we reset the virtio-blk device before the requests already dispatched to the virtio-blk driver from the block layer are finised, we will stuck in blk_cleanup_queue() and the remove will fail. blk_cleanup_queue() calls blk_drain_queue() to drain all requests queued before DEAD marking. However it will never success if the device is already stopped. We'll have q->in_flight[] > 0, so the drain will not finish. How to reproduce the race: 1. hot-plug a virtio-blk device 2. keep reading/writing the device in guest 3. hot-unplug while the device is busy serving I/O Test: ~1000 rounds of hot-plug/hot-unplug test passed with this patch. Changes in v3: - Drop blk_abort_queue and blk_abort_request - Use __blk_end_request_all to complete request dispatched to driver Changes in v2: - Drop req_in_flight - Use virtqueue_detach_unused_buf to get request dispatched to driver Signed-off-by: Asias He <asias@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-05-22 12:16:13 +09:30
David S. Miller	17eea0df5f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-20 21:53:04 -04:00
Linus Torvalds	14e931a264	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block layer fixes from Jens Axboe: "A few small, but important fixes. Most of them are marked for stable as well - Fix failure to release a semaphore on error path in mtip32xx. - Fix crashable condition in bio_get_nr_vecs(). - Don't mark end-of-disk buffers as mapped, limit it to i_size. - Fix for build problem with CONFIG_BLOCK=n on arm at least. - Fix for a buffer overlow on UUID partition printing. - Trivial removal of unused variables in dac960." * 'for-linus' of git://git.kernel.dk/linux-block: block: fix buffer overflow when printing partition UUIDs Fix blkdev.h build errors when BLOCK=n bio allocation failure due to bio_get_nr_vecs() block: don't mark buffers beyond end of disk as mapped mtip32xx: release the semaphore on an error path dac960: Remove unused variables from DAC960_CreateProcEntries()	2012-05-19 10:12:17 -07:00
Jens Axboe	4fd1ffaa12	Merge branch 'for-jens' of git://git.drbd.org/linux-drbd into for-3.5/drivers Philipp writes: This are the updates we have in the drbd-8.3 tree. They are intended for your "for-3.5/drivers" drivers branch. These changes include one new feature: * Allow detach from frozen backing devices with the new --force option; configurable timeout for backing devices by the new disk-timeout option And huge number of bug fixes: * Fixed a write ordering problem on SyncTarget nodes for a write to a block that gets resynced at the same time. The bug can only be triggered with a device that has a firmware that actually reorders writes to the same block * Fixed a race between disconnect and receive_state, that could cause a IO lockup * Fixed resend/resubmit for requests with disk or network timeout * Make sure that hard state changed do not disturb the connection establishing process (I.e. detach due to an IO error). When the bug was triggered it caused a retry in the connect process * Postpone soft state changes to no disturb the connection establishing process (I.e. becoming primary). When the bug was triggered it could cause both nodes going into SyncSource state * Fixed a refcount leak that could cause failures when trying to unload a protocol family modules, that was used by DRBD * Dedicated page pool for meta data IOs * Deny normal detach (as opposed to --forced) if the user tries to detach from the last UpToDate disk in the resource * Fixed a possible protocol error that could be caused by "unusual" BIOs. * Enforce the disk-timeout option also on meta-data IO operations * Implemented stable bitmap pages when we do a full write out of the bitmap * Fixed a rare compatibility issue with DRBD's older than 8.3.7 when negotiating the bio_size * Fixed a rare race condition where an empty resync could stall with if pause/unpause events happen in parallel * Made the re-establishing of connections quicker, if it got a broken pipe once. Previously there was a bug in the code caused it to waste the first successful established connection after a broken pipe event. PS: I am postponing the drbd-8.4 for mainline for one or two kernel development cycles more (the ~400 patchets set).	2012-05-18 16:20:06 +02:00
Jens Axboe	13828dec45	Merge branch 'stable/for-jens-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.5/drivers Konrad writes: Please git pull the following branch: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.5 in your for-3.5/drivers branch. The changes in it are rather simple - cleaning up some code and adding proper mechanism to unload without leaking memory.	2012-05-18 16:17:41 +02:00
Jiri Kosina	bfa10b8c98	floppy: remove floppy-specific O_EXCL handling Block layer now handles O_EXCL in a generic way for block devices. The semantics is however different for floppy and all other block devices, as floppy driver contains its own O_EXCL handling. The semantics for all-but-floppy bdevs is "there can be at most one O_EXCL open of this file", while for floppy bdev the semantics is "if someone has the bdev open with O_EXCL, noone else can open it". There is actual userspace-observable change in behavior because of this since commit `e525fd89d3` ("block: make blkdev_get/put() handle exclusive access") -- on kernels containing this commit, mount of /dev/fd0 causes the fd0 block device be claimed with _EXCL, preventing subsequent open(/dev/fd0). Bring things back into shape, i.e. make it possible, analogically to other block devices, to mount the floppy and open() it afterwards -- remove the floppy-specific handling and let the generic bdev code O_EXCL handling take over. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-05-18 15:19:11 +02:00
Jiri Kosina	070ad7e793	floppy: convert to delayed work and single-thread wq There are several races in floppy driver between bottom half (scheduled_work) and timers (fd_timeout, fd_timer). Due to slowness of the actual floppy devices, those races are never (at least to my knowledge) triggered on a bare floppy metal. However on virtualized (emulated) floppy drives, which are of course magnitudes faster than the real ones, these races trigger reliably. They usually exhibit themselves as NULL pointer dereferences during DMA setup, such as BUG: unable to handle kernel NULL pointer dereference at 0000000a [ ... snip ... ] EIP: 0060:[<c02053d5>] EFLAGS: 00010293 CPU: 0 EAX: ffffe000 EBX: 0000000a ECX: 00000000 EDX: 0000000a ESI: c05d2718 EDI: 00000000 EBP: 00000000 ESP: f540fe44 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=f540e000 task=c082d5a0 task.ti=c0826000) Stack: ffffe000 00001ffc 00000000 00000000 00000000 c05d2718 c0708b40 f540fe80 c020470f c05d2718 c0708b40 00000000 f540fe80 0000000a f540fee4 00000000 c0708b40 f540fee4 00000000 00000000 c020526b 00000000 c05d2718 c0708b40 Call Trace: [<c020470f>] dump_trace+0xaf/0x110 [<c020526b>] show_trace_log_lvl+0x4b/0x60 [<c0205298>] show_trace+0x18/0x20 [<c05c5811>] dump_stack+0x6d/0x72 [<c0248527>] warn_slowpath_common+0x77/0xb0 [<c02485f3>] warn_slowpath_fmt+0x33/0x40 [<f7ec593c>] setup_DMA+0x14c/0x210 [floppy] [<f7ecaa95>] setup_rw_floppy+0x105/0x190 [floppy] [<c0256d08>] run_timer_softirq+0x168/0x2a0 [<c024e762>] __do_softirq+0xc2/0x1c0 [<c02042ed>] do_softirq+0x7d/0xb0 [<f54d8a00>] 0xf54d89ff but other instances can be easily seen as well. This can be observed at least under VMWare, VirtualBox and KVM. This patch converts all the timers and bottom halfs to be processed in a single workqueue. This aproach has been already discussed back in 2010 if I remember correctly, and Acked by Linus [1], but it then never made it to the tree. This all is based on original idea and code of Stephen Hemminger. I have ported original Stepen's code to the current state of the floppy driver, and performed quite some testing (on real hardware), which didn't reveal any issues (this includes not only writing and reading data, but also formatting (unfortunately I didn't find any Double-Density disks any more)). Ability to handle errors properly (supplying known bad floppies) has also been verified. [1] http://kerneltrap.org/mailarchive/linux-kernel/2010/6/11/4582092 Based-on-patch-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-05-18 15:19:10 +02:00
David S. Miller	028940342a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2012-05-16 22:17:37 -04:00
Josh Durgin	263c6ca007	rbd: rename __rbd_update_snaps to __rbd_refresh_header This function rereads the entire header and handles any changes in it, not just changes in snapshots. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:13:09 -05:00
Josh Durgin	3591538fb2	rbd: fix snapshot size type Snapshot sizes should be the same type as regular image sizes. This only affects their displayed size in sysfs, not the reported size of an actual block device sizes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:13:03 -05:00
Josh Durgin	b06e6a6be7	rbd: remove conditional snapid parameters The snapid parameters passed to rbd_do_op() and rbd_req_sync_op() are now always either a valid snapid or an explicit CEPH_NOSNAP. [elder@dreamhost.com: Rephrased the description] Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:58 -05:00
Josh Durgin	77dfe99fe3	rbd: store snapshot id instead of index When a device was open at a snapshot, and snapshots were deleted or added, data from the wrong snapshot could be read. Instead of assuming the snap context is constant, store the actual snap id when the device is initialized, and rely on the OSDs to signal an error if we try reading from a snapshot that was deleted. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:52 -05:00
Josh Durgin	403f24d3d5	rbd: protect read of snapshot sequence number This is updated whenever a snapshot is added or deleted, and the snapc pointer is changed with every refresh of the header. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Yehuda Sadeh <yehuda@hq.newdream.net>	2012-05-14 12:12:46 -05:00
Xi Wang	50f7c4c967	rbd: fix integer overflow in rbd_header_from_disk() ondisk->snap_count is read from disk via rbd_req_sync_read() and thus needs validation. Otherwise, a bogus `snap_count' could overflow the kmalloc() size, leading to memory corruption. Also use `u32' consistently for `snap_count'. [elder@dreamhost.com: changed to use UINT_MAX rather than ULONG_MAX] Signed-off-by: Xi Wang <xi.wang@gmail.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-05-14 12:12:41 -05:00
Dan Carpenter	f8ad495a8a	rbd: use gfp_flags parameter in rbd_header_from_disk() We should use the gfp_flags that the caller specified instead of GFP_KERNEL here. There is only one caller and it uses GFP_KERNEL, so this change is just a cleanup and doesn't change how the code works. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-05-14 12:12:35 -05:00
Jan Beulich	8605067fb9	xen-blkfront: module exit handling adjustments The blkdev major must be released upon exit, or else the module can't attach to devices using the same majors upon being loaded again. Also avoid leaking the minor tracking bitmap. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-11 16:11:54 -04:00
Jan Beulich	e77c78c022	xen-blkfront: properly name all devices - devices beyond xvdzz didn't get proper names assigned at all - extended devices with minors not representable within the kernel's major/minor bit split spilled into foreign majors Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-05-11 16:11:52 -04:00
Asai Thambi S P	a09ba13eef	mtip32xx: release the semaphore on an error path Release the semaphore in an error path in mtip_hw_get_scatterlist(). This fixes the smatch warning inconsistent returns. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-11 16:42:14 +02:00
Jesper Juhl	d88a440edd	dac960: Remove unused variables from DAC960_CreateProcEntries() The variables 'StatusProcEntry' and 'UserCommandProcEntry' are assigned to once and then never used. This patch gets rid of the variables. While I was there I also fixed the indentation of the function to use tabs rather than spaces for the lines that did not already do so. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-05-11 16:42:14 +02:00
Eric W. Biederman	38bf195398	connector/userns: replace netlink uses of cap_raised() with capable() In 2009 Philip Reiser notied that a few users of netlink connector interface needed a capability check and added the idiom cap_raised(nsp->eff_cap, CAP_SYS_ADMIN) to a few of them, on the premise that netlink was asynchronous. In 2011 Patrick McHardy noticed we were being silly because netlink is synchronous and removed eff_cap from the netlink_skb_params and changed the idiom to cap_raised(current_cap(), CAP_SYS_ADMIN). Looking at those spots with a fresh eye we should be calling capable(CAP_SYS_ADMIN). The only reason I can see for not calling capable is that it once appeared we were not in the same task as the caller which would have made calling capable() impossible. In the initial user_namespace the only difference between between cap_raised(current_cap(), CAP_SYS_ADMIN) and capable(CAP_SYS_ADMIN) are a few sanity checks and the fact that capable(CAP_SYS_ADMIN) sets PF_SUPERPRIV if we use the capability. Since we are going to be using root privilege setting PF_SUPERPRIV seems the right thing to do. The motivation for this that patch is that in a child user namespace cap_raised(current_cap(),...) tests your capabilities with respect to that child user namespace not capabilities in the initial user namespace and thus will allow processes that should be unprivielged to use the kernel services that are only protected with cap_raised(current_cap(),..). To fix possible user_namespace issues and to just clean up the code replace cap_raised(current_cap(), CAP_SYS_ADMIN) with capable(CAP_SYS_ADMIN). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Philipp Reisner <philipp.reisner@linbit.com> Acked-by: Serge E. Hallyn <serge.hallyn@canonical.com> Acked-by: Andrew G. Morgan <morgan@kernel.org> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <james.l.morris@oracle.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-10 23:21:39 -04:00
Lars Ellenberg	92b4ca291f	drbd: grammar fix in log message Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-10 12:00:56 +02:00
Cong Wang	bc4854bc91	drbd: check MODULE for THIS_MODULE THIS_MODULE is NULL only when drbd is compiled as built-in, so the #ifdef CONFIG_MODULES should be #ifdef MODULE instead. This fixes the warning: drivers/block/drbd/drbd_main.c: In function ‘drbd_buildtag’: drivers/block/drbd/drbd_main.c:4187:24: warning: the comparison will always evaluate as ‘true’ for the address of ‘__this_module’ will never be NULL [-Waddress] Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-10 12:00:54 +02:00
Philipp Reisner	f6d0a8dbfd	drbd: Restore the request restart logic It got lost with the commit `5a7bbad27a` "block: remove support for bio remapping from ->make_request" Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 17:20:59 +02:00
Lars Ellenberg	9476f39d66	drbd: introduce a bio_set to allocate housekeeping bios from Don't rely on availability of bios from the global fs_bio_set, we should use our own bio_set for meta data IO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:07 +02:00
Lars Ellenberg	3c2f7a856f	drbd: remove unused define Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:06 +02:00
Arne Redlich	0c7db27920	drbd: bm_page_async_io: properly initialize page->private If bm_page_async_io is advised to use a new page for I/O (BM_AIO_COPY_PAGES is set), it will get it from a mempool. Once the mempool has to dip into its reserves the page is not reinitialized, i.e. page->private contains garbage, which will lead to various problems once the I/O completes (dereferences of NULL pointers, the submitting thread getting stuck in D-state, ...). Signed-off-by: Arne Redlich <arne.redlich@googlemail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2012-05-09 15:17:04 +02:00
Lars Ellenberg	4d95a10f97	drbd: use the newly introduced page pool for bitmap IO Conflicts: drbd/drbd_bitmap.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:03 +02:00
Lars Ellenberg	4281808fb3	drbd: add page pool to be used for meta data IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:02 +02:00
Lars Ellenberg	0e8488ade2	drbd: allow bitmap to change during writeout from resync_finished Symptom: messages similar to "FIXME asender in bm_change_bits_to, bitmap locked for 'write from resync_finished' by worker" If a resync or verify is finished (or aborted), a full bitmap writeout is triggered. If we have ongoing local IO, the bitmap may still change during that writeout, pending and not yet processed acks may cause bits to be cleared, while new writes may cause bits to be to be set. To fix this, introduce the drbd_bm_write_copy_pages() variant. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:17:00 +02:00
Lars Ellenberg	a574daf5d7	drbd: fix race between drbdadm invalidate/verify and finishing resync When a resync or online verify is finished or aborted, drbd does a bulk write-out of changed bitmap pages. If in that very moment a new verify or resync is triggered, this can race: ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending? and similar. This can be observed with e.g. tight invalidate loops in test scripts, and probably has no real-life implication. Still, that race can be solved by first quiescen the device, before starting a new resync or verify. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:59 +02:00
Lars Ellenberg	ba280c092e	drbd: fix resend/resubmit of frozen IO DRBD can freeze IO, due to fencing policy (fencing resource-and-stonith), or because we lost access to data (on-no-data-accessible suspend-io). Resuming from there (re-connect, or re-attach, or explicit admin intervention) should "just work". Unfortunately, if the re-attach/re-connect did not happen within the timeout, since the commit drbd: Implemented real timeout checking for request processing time if so configured, the request_timer_fn() would timeout and detach/disconnect virtually immediately. This change tracks the most recent attach and connect, and does not timeout within <configured timeout interval> after attach/connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:58 +02:00
Philipp Reisner	5de738272e	drbd: Ensure that data_size is not 0 before using data_size-1 as index This could be exploited by a peer which runs modified code. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:56 +02:00
Philipp Reisner	197296ffed	drbd: Delay/reject other state changes while establishing a connection Changes to the role and disk state should be delayed or rejected while we establish a connection. This is necessary, since the peer will base its resync decision on the UUIDs and the state we sent in the drbd_connect() function. The most prominent example for this race is becoming primary after sending state and UUIDs and before the state changes to C_WF_CONNECTION. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:55 +02:00
Lars Ellenberg	46385c84ac	drbd: move put_ldev from __req_mod() to the endio callback One invocation in the endio handler is good enough, we don't need mention it for each of the different ways it calls __req_mod(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:51 +02:00
Lars Ellenberg	d64957c9a9	drbd: fix WRITE_ACKED_BY_PEER_AND_SIS to not set RQ_NET_DONE Just because this request happened during a resync does not mean it may pretend to have been barrier-acked. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:50 +02:00
Lars Ellenberg	41c4a0035b	drbd: fix READ_RETRY_REMOTE_CANCELED to not complete if device is suspended READ_RETRY_REMOTE_CANCELED needs to be grouped with the other _CANCELED cases, not with CONNECTION_LOST_WHILE_PENDING, as that would complete (fail) the bio even if the device became suspended. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:48 +02:00
Lars Ellenberg	6d49e101fd	drbd: make OOS_HANDED_TO_NETWORK its own case OOS_HANDED_TO_NETWORK should not be grouped with the various _CANCELED/_FAILED cases. Also, not only clear the RQ_NET_QUEUED flag, but also mark it RQ_NET_DONE, so it can be distinguished from a local-only request even after that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:47 +02:00
Lars Ellenberg	c088b2d904	drbd: don't pretend that barrier_nr == 0 was special We used to have a barrier implementation where barrier_nr 0 was reserved. That is long gone. Just use the full sequence space. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:46 +02:00
Lars Ellenberg	7ffcaa7194	drbd: remove unused static helper function Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:44 +02:00
Lars Ellenberg	a5d214f621	drbd: remove some very outdated comments Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:43 +02:00
Lars Ellenberg	1abc2af205	drbd: missing wakeup after drbd_rs_del_all Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:42 +02:00
Lars Ellenberg	671a74e749	drbd: remove now unused seq_num member from struct drbd_request Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:40 +02:00
Lars Ellenberg	001a88687a	drbd: fix potential data corruption and protocol error We assumed only bios with bi_idx == 0 would end up in drbd_make_request(). That is wrong. At least device mapper, in __clone_and_map(), may submit clones only covering a partial bio, but sharing the original bvec, by adjusting bi_idx and relevant other bio members of the clone. We used __bio_for_each_segment() in various places, even though that is documented as * drivers should not use the __ version unless they _really_ want to * run through the entire bio and not just pending pieces Impact: we would send the full bio bvec, even for the clone with bi_idx > 0, which will cause data corruption on the peer (because we submit wrong data at the clone offset), and will cause a DRBD protocol error, disconnect/reconnect and resync (thus fixing the corruption), because the next package header would be expected right in the middle of the sent data, causing DRBD magic mismatch. Fix: drop the assert, and use bio_for_each_segment() instead of the __ version. Conflicts: drbd/drbd_tracing.c Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:39 +02:00
Philipp Reisner	b6a370ba07	drbd: Fix a potential write ordering issue on SyncTarget nodes If a SyncTarget node gets a P_RS_DATA_REPLY before a P_DATA packet for the same sector, it simply submits these two IO requests. This is be possible because on the SyncSource node, the data of the P_RS_DATA_REPLY packet was read from disk. Immediately after that a write request from upper layers came in. The disk scheduler or even the "hardware" queues on the disk drive might reorder these writes. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:38 +02:00
Philipp Reisner	fc28845bc0	drbd: Fix a potential race that could case data inconsistency When we have a write request and a state change C_WF_BITMAP_S -> C_SYNC_SOURCE at the same time, and it happens that the line remote = remote && drbd_should_do_remote(s); stills sees C_WF_BITMAP_S, and send_oos = rw == WRITE && drbd_should_send_oos(s); already sees C_SYNC_SOURCE both are 0. This causes the write to not be mirrored, but marked as out-of-sync on the Sync_Source node. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:34 +02:00
Lars Ellenberg	031a7c173f	drbd: add missing part_round_stats to _drbd_start_io_acct Without this, iostat frequently sees bogus svctime and >= 100% "utilization". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:33 +02:00
Lars Ellenberg	47a4f1c1bb	drbd: Fix module refcount leak in drbd_accept() drbd_accept was modelled after kernel_accept with drbd commit 53eb779 in July 2008. Only, kernel_accept was then broken, and only fixed later with kernel commit `1b08534e` in Dec 2008: net: Fix module refcount leak in kernel_accept() Impact: protocol families provided as modules, e.g. ipv6 or ib_sdp, would soon have their reference count become negative, preventing them from being unloaded (likely), or worse, hit zero without actually being unused, allowing them to be unloaded while still in use (unlikely, but if triggered, causing a kernel crash). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:32 +02:00
Philipp Reisner	7caacb69ac	drbd: Consider the disk-timeout also for meta-data IO operations If the backing device is already frozen during attach, we failed to recognize that. The current disk-timeout code works on top of the drbd_request objects. During attach we do not allow IO and therefore never generate a drbd_request object but block before that in drbd_make_request(). This patch adds the timeout to all drbd_md_sync_page_io(). Before this patch we used to go from D_ATTACHING directly to D_DISKLESS if IO failed during attach. We can no longer do this since we have to stay in D_FAILED until all IO ops issued to the backing device returned. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:30 +02:00
Philipp Reisner	4afc433cf8	drbd: Do not send state packets while lower than C_CONNECTED cstate I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION. Sending may already work in these cstates, but the peer still expects the HandShake / ConnectionFeatures packet. Actually triggered by the Testuite on kugel. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:29 +02:00
Lars Ellenberg	545752d5d8	drbd: fix race between disconnect and receive_state If the asender thread, or request_timer_fn(), or some other part of the code, decided to drop the connection (because of timeout or other), but the receiver just now was processing a P_STATE packet, there was a chance that receive_state() would do a hard state change "re-establishing" an already failed connection without additional handshake. Log excerpt: Remote failed to finish a request within ko-count * timeout peer( Secondary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown ) asender terminated ... peer( Unknown -> Secondary ) conn( Timeout -> Connected ) pdsk( DUnknown -> UpToDate ) peer_isp( 0 -> 1 ) ... Connection closed peer( Secondary -> Unknown ) conn( Connected -> Unconnected ) pdsk( UpToDate -> DUnknown ) peer_isp( 1 -> 0 ) receiver terminated Impact: while the connection state is erroneously "Connected", requests may be queued and even sent, which would never be acknowledged, and may have been missed by the cleanup. These requests would never be completed. The next drbd_suspend_io() will then lock up, waiting forever for these requests to complete. Fixed in several code paths: Make sure the connection state is NetworkFailure or worse before starting the cleanup in drbd_disconnect(). This should make sure the cleanup won't miss any requests. Disallow receive_state() to "upgrade" the connection state from an error state. This will make sure the "illegal" state transition won't happen. For all connection failure states, relax the safe-guard in sanitize_state() again to silently mask out those state changes (e.g. Timeout -> Connected becomes Timeout -> Timeout). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:16:01 +02:00
Lars Ellenberg	763eb63625	drbd: fix potential spinlock deadlock drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks left to be resynced (rs_left) in the current resync extent. If it detects a mismatch, it complains, and forces a disconnect using drbd_force_state(mdev, NS(conn, C_DISCONNECTING)); Unfortunately, this may be called while holding the req_lock, and drbd_force_state() want's to aquire that lock itself. Deadlock. Don't force a disconnect, but fix up rs_left by recounting and reassigning the number of dirty blocks in that extent. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:58 +02:00
Philipp Reisner	e89868a092	drbd: Fixed an obvious copy-n-paste mistake This bug might have caused troubles if disk-barriers and the ahead-behind more are enabled at the same time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:57 +02:00
Lars Ellenberg	f479ea0661	drbd: send intermediate state change results to the peer DRBD state changes schedule after_state_ch() actions to a worker thread, which decides on the old and new states of that change, whether to send an informational state update packet (P_STATE) to the peer. If it decides to drbd_send_state(), it would however always send the _curent_ state, which, if a second state change happens before the after_state_ch() of the first ran, may "fast-forward" the peer's view about this node. In most cases that is harmless, but sometimes this can confuse DRBD, for example into not actually starting a necessary resync if you do a very tight detach/attach loop on a Connected Secondary. Fix this by always sending the "new" state of the respective state transition which scheduled this after_state_ch() work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:56 +02:00
Lars Ellenberg	a2e9138197	drbd: fix spurious meta data IO "error" When detaching, even cleanly detaching due to administrator request, we always go through D_FAILED before we become D_DISKLESS. Don't let that state change race with an in-flight meta data IO, or that one might think it actually experienced an IO error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:54 +02:00
Philipp Reisner	aaae506d54	drbd: Fixed a race condition between detach and start of resync drbd_state_lock() is only there to serialize cluster wide state changes. Testing the local disk state needs to happen while holding the global_state_lock. Otherwise you might see something like this (Oct 6 on kugel) 14:20:24 drbd0: conn( WFSyncUUID -> Connected ) disk( Inconsistent -> Failed ) 14:20:24 drbd0: helper command: /sbin/drbdadm before-resync-target minor-0 exit code 0 (0x0) 14:20:24 drbd0: conn( Connected -> SyncTarget ) disk( Failed -> Inconsistent ) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:53 +02:00
Lars Ellenberg	6a9a92f4ef	drbd: fix harmless race to not trigger an ASSERT We have one pre-allocated page to do certain synchronous meta data IO with, using it is serialized like so: drbd_md_get_buffer(); drbd_md_sync_page_io(); drbd_md_sync_page_io(); ... drbd_md_put_buffer(); In drbd_md_sync_page_io() there is an ASSERT(atomic_read(&mdev->md_io_in_use) == 1); We want to be able to timeout on unresponsive lower level devices, so we can "detach" in that case. Inside drbd_md_sync_page_io() we grab an extra reference, to not have a dangling pointer in case a delayed IO eventually does still complete, even after we "detached" already. We need to put the extra reference before we signal completion from the completion handler, or the second drbd_md_sync_page_io() above may trigger the assert (reference count still 2). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:52 +02:00
Philipp Reisner	5ba3dac521	drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:50 +02:00
Andreas Gruenbacher	7b4e4d3126	drbd: drbd_nl_resize(): Fix missing put_ldev() on error path Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:49 +02:00
Lars Ellenberg	40424e4a24	drbd: fix "stalled" empty resync With sync-after dependencies, given "lucky" timing of pause/unpause events, and the end of an empty (0 bits set) resync was sometimes not detected on the SyncTarget, leading to a "stalled" SyncSource state. Fixed this by expecting not only "Inconsistent -> UpToDate" but also "Consistent -> UpToDate" transitions for the peer disk state to end a resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:47 +02:00
Philipp Reisner	1e86ac48af	drbd: Bugfix for the connection behavior If we get into the C_BROKEN_PIPE cstate once, the state engine set the thi->t_state of the receiver thread to restarting. But with the while loop in drbdd_init() a new connection gets established. After the call into drbdd() returns immediately since the thi->t_state is not RUNNING. The restart of drbd_init() then resets thi->t_state to RUNNING. I.e. after entering C_BROKEN_PIPE once, the next successful established connection gets wasted. The two parts of the fix: * Do not cause the thread to restart if we detect the issue with the sockets while we are in C_WF_CONNECTION. * Make sure that all actions that would have set us to C_BROKEN_PIPE happen before the state change to C_WF_REPORT_PARAMS. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:46 +02:00
Philipp Reisner	80f9fd55a6	drbd: Cleanup all epoch objects upon connection loss Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:44 +02:00
Philipp Reisner	fd2491f4a4	drbd: detach must not try to abort non-local requests from drbd-8.4 Cherry picked form 8.4 Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:43 +02:00
Philipp Reisner	79f16f5dbc	drbd: Consider that the no-data-condition could be in connected state ...when the peer has inconsistent data. In that case we failed to clear the susp_nod flag. When the local disk was attached again Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:15:42 +02:00
Philipp Reisner	bca482e90b	drbd: Fixed current UUID generation Now, the new edition of the clause only fires if a diskless peer gets promoted. This is a fixup for "drbd: Delayed creation of current-UUID". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:50 +02:00
Lars Ellenberg	22f46ce2ef	drbd: change some GFP_KERNEL to GFP_NOIO Bitmap IO may happend in the context of an application write, in the generic block IO path. We need to use GFP_NOIO. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:47 +02:00
Philipp Reisner	dfa8bedbfe	drbd: Implemented the disk-timeout option When the disk-timeout is active, and it expires for a single request, we consider the local disk as D_FAILED. Note: With this change, I made both timeout based state transitions HARD state transitions. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:45 +02:00
Philipp Reisner	02ee8f95fa	drbd: Force flag for the detach operation Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:38 +02:00
Philipp Reisner	5ca1de0384	drbd: Allow new IOs while the local disk in in FAILED state The last bunch of commits prepared the 'detach from tar pit' feature. With that we can be for long time in disk state FAILED. We need to accept new IO requests during that time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:34 +02:00
Philipp Reisner	9e58c4dad7	drbd: Bitmap IO functions can now return prematurely if the disk breaks Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 15:10:33 +02:00
Philipp Reisner	d1f3779bbe	drbd: Added a kref to bm_aio_ctx Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:37:19 +02:00
Philipp Reisner	b2057629ea	drbd: Hold a reference to ldev while doing meta-data IO Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:31:11 +02:00
Philipp Reisner	4a2fe568b5	drbd: Keep a reference to the bio until the completion handler finished Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:28:51 +02:00
Philipp Reisner	0c46442515	drbd: Implemented wait_until_done_or_disk_failure() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:26:51 +02:00
Philipp Reisner	e17117310b	drbd: Replaced md_io_mutex by an atomic: md_io_in_use The new function drbd_md_get_buffer() aborts waiting for the buffer in case the disk failes in the meantime. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:22:31 +02:00
Philipp Reisner	cc94c65015	drbd: moved md_io into mdev Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:17:24 +02:00
Philipp Reisner	2b4dd36fba	drbd: Immediately allow completion of IOs, that wait for IO completions on a failed disk Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:16:04 +02:00
Philipp Reisner	6d7e32f568	drbd: Keep a reference to barrier acked requests Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:15:28 +02:00
Philipp Reisner	6809384c71	drbd: Improve compatibility with drbd's older than 8.3.7 Regression introduced with 8.3.11 commit: drbd: Take a more conservative approach when deciding max_bio_size Never ever tell an older drbd, that we support more than 32KiB in a single data request (packet). Never believe an older drbd, that is supports more than 32KiB in a single data request (packet) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:08:57 +02:00
Philipp Reisner	77e8fdfc18	drbd: Only print sanitize state's warnings, if the state change happens The reason for this change is that, with when doing 'drbdadm invalidate' on a disconnected resource caused an "implicitly set pdsk from UpToDate to DUnknown" message, which was missleading. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:08:22 +02:00
Lars Ellenberg	07667347c8	drbd: downgraded error printk to info Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:05:25 +02:00
David Howells	5f138ce01a	DRBD: Fix comparison always false warning due to long/long long compare Fix warnings of the following nature in the drbd header: In file included from drivers/block/drbd/drbd_bitmap.c:32: drivers/block/drbd/drbd_int.h: In function 'drbd_get_syncer_progress': drivers/block/drbd/drbd_int.h:2234: warning: comparison is always false due to limited range of data where mdev->rs_total (an unsigned long) is being compared to 1ULL << 32, which is always false on a 32-bit machine. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>	2012-05-09 10:03:19 +02:00
Lars Ellenberg	7948bcdc38	drbd: spelling fix: too small It is not "to small", but "too small". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:02:22 +02:00
Lars Ellenberg	1381e9a496	drbd: cosmetic: fix accidental division instead of modulo when pretty printing For large resync rates, seq_printf_with_thousands_grouping() accidentally only produced Y,000,00Y, instead of the real numbers. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 10:01:39 +02:00
Philipp Reisner	ebd2b0cde5	drbd: Lower log priority for an event that is definitely not an error Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>	2012-05-09 09:59:29 +02:00
Sage Weil	3469ac1aa3	ceph: drop support for preferred_osd pgs This was an ill-conceived feature that has been removed from Ceph. Do this gracefully: - reject attempts to specify a preferred_osd via the ioctl - stop exposing this information via virtual xattrs - always fill in -1 for requests, in case we talk to an older server - don't calculate preferred_osd placements/pgids Reviewed-by: Alex Elder <elder@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>	2012-05-07 15:33:36 -07:00
David S. Miller	f24001941c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Fix merge between commit `3adadc08cc` ("net ax25: Reorder ax25_exit to remove races") and commit `0ca7a4c87d` ("net ax25: Simplify and cleanup the ax25 sysctl handling") The former moved around the sysctl register/unregister calls, the later simply removed them. With help from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-23 23:15:17 -04:00
Pavel Emelyanov	4a17fd5229	sock: Introduce named constants for sk_reuse Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-04-21 15:52:25 -04:00
Linus Torvalds	c1acb0ba33	Fixes in various components: * mechanism to work with misconfigured backends (where they are advertised but in reality don't exist). * two tiny compile warning fixes. * proper error handling in gnttab_resume * Not using VM_PFNMAP anymore to allow backends in the same domain. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPkYeqAAoJEFjIrFwIi8fJEXIIAI+PYLNMcHTc4bxa6pErpKaS rq5eCXL9+EaZOwTUqHRJjfrjnlAc+BWO8lN0H41oRQWFYh14hgfUVJ+ziEujb1kw N1eTMVHnH/XRJV6rIFX+TiBasnyoMmNfWEAb45UL1nEUTMPL1Jv7AiRY/GxUlHyg M+uFG52KP3ytXxcIiGW6pYEqJd6UgWrqnclaeg5TR5zvDlWfJbUIBEMQ/PyV0WSS 4e7biiwi4XPWT2f1qewOmI+3r68CltU3GAs1XxjcSX+bYYuh00UtY39AsBWo2N8I 1VORuq0QPs+GB22r3e47IqBcjXkBGRIf6w1e/5a6WLiq7TqVq4bYgCGUmWT8V7o= =olCU -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: - mechanism to work with misconfigured backends (where they are advertised but in reality don't exist). - two tiny compile warning fixes. - proper error handling in gnttab_resume - Not using VM_PFNMAP anymore to allow backends in the same domain. * tag 'stable/for-linus-3.4-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: Revert "xen/p2m: m2p_find_override: use list_for_each_entry_safe" xen/resume: Fix compile warnings. xen/xenbus: Add quirk to deal with misconfigured backends. xen/blkback: Fix warning error. xen/p2m: m2p_find_override: use list_for_each_entry_safe xen/gntdev: do not set VM_PFNMAP xen/grant-table: add error-handling code on failure of gnttab_resume	2012-04-20 11:31:00 -07:00
Konrad Rzeszutek Wilk	a71e23d992	xen/blkback: Fix warning error. drivers/block/xen-blkback/xenbus.c: In function 'xen_blkbk_discard': drivers/block/xen-blkback/xenbus.c:419:4: warning: passing argument 1 of 'dev_warn' makes pointer from integer without a cast +[enabled by default] include/linux/device.h:894:5: note: expected 'const struct device *' but argument is of type 'long int' It is unclear how that mistake made it in. It surely is wrong. Acked-by: Jens Axboe <axboe@kernel.dk> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-04-18 15:54:08 -04:00
Linus Torvalds	cdd5983063	virtio: fixes on top of 3.4-rc2 Here are some virtio fixes for 3.4: a test build fix, a patch by Ren fixing naming for systems with a massive number of virtio blk devices, and balloon fixes for powerpc by David Gibson. There was some discussion about Ren's patch for virtio disc naming: some people wanted to move the legacy name mangling function to the block core. But there's no concensus on that yet, and we can always deduplicate later. Added comments in the hope that this will stop people from copying this legacy naming scheme into future drivers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPio1GAAoJECgfDbjSjVRpGDAH/3C/bXm9mriuNauRHwktHgJe gmh2BfUgnxly6vheuz0Fv61lTe6V8kekHVolbUYwAUgXeWEKK1C59xehrMGRIPDG 1XUiti50U3P+skhIfrbkS5nZ7L+5Hk0ToQ6dd9v0BM2GxDOvgwidlY1bZe+wJEZf Lvl6w/djBCr1e3k4qfRnpTcdJJ4FnOjGbikLQhSTGfUXeNo6uWS1hljYWnAhzFkd 1xU8h5PP0TDR0nYb80CeB+9Lxw0w4qyNPJIBhNN6ucB/1U6R+55HpEpmrLUkn910 sEFEFsc0cRVWr8FiOTlmzxLHnwTc8AY/Bsp9TMSmnTRu3ZQcoQMTQQCczRj04xI= =VmpJ -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael S. Tsirkin: "Here are some virtio fixes for 3.4: a test build fix, a patch by Ren fixing naming for systems with a massive number of virtio blk devices, and balloon fixes for powerpc by David Gibson. There was some discussion about Ren's patch for virtio disc naming: some people wanted to move the legacy name mangling function to the block core. But there's no concensus on that yet, and we can always deduplicate later. Added comments in the hope that this will stop people from copying this legacy naming scheme into future drivers." * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: fix handling of PAGE_SIZE != 4k virtio_balloon: Fix endian bug virtio_blk: helper function to format disk names tools/virtio: fix up vhost/test module build	2012-04-16 18:34:12 -07:00
Linus Torvalds	c104f1fa1e	Merge branch 'for-3.4/drivers' of git://git.kernel.dk/linux-block Pull block driver bits from Jens Axboe: - A series of fixes for mtip32xx. Most from Asai at Micron, but also one from Greg, getting rid of the dependency on PCIE_HOTPLUG. - A few bug fixes for xen-blkfront, and blkback. - A virtio-blk fix for Vivek, making resize actually work. - Two fixes from Stephen, making larger transfers possible on cciss. This is needed for tape drive support. * 'for-3.4/drivers' of git://git.kernel.dk/linux-block: block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy mtip32xx: dump tagmap on failure mtip32xx: fix handling of commands in various scenarios mtip32xx: Shorten macro names mtip32xx: misc changes mtip32xx: Add new sysfs entry 'status' mtip32xx: make setting comp_time as common mtip32xx: Add new bitwise flag 'dd_flag' mtip32xx: fix error handling in mtip_init() virtio-blk: Call revalidate_disk() upon online disk resize xen/blkback: Make optional features be really optional. xen/blkback: Squash the discard support for 'file' and 'phy' type. mtip32xx: fix incorrect value set for drv_cleanup_done, and re-initialize and start port in mtip_restart_port() cciss: Fix scsi tape io with more than 255 scatter gather elements cciss: Initialize scsi host max_sectors for tape drive support xen-blkfront: make blkif_io_lock spinlock per-device xen/blkfront: don't put bdev right after getting it xen-blkfront: use bitmap_set() and bitmap_clear() xen/blkback: Enable blkback on HVM guests xen/blkback: use grant-table.c hypercall wrappers	2012-04-13 18:45:13 -07:00
Ren Mingxin	c0aa3e0916	virtio_blk: helper function to format disk names The current virtio block's naming algorithm just supports 18278 (26^3 + 26^2 + 26) disks. If there are more virtio blocks, there will be disks with the same name. Based on commit `3e1a7ff8a0`, add a function "virtblk_name_format()" for virtio block to support mass of disks naming. Notes: - Our naming scheme is ugly. We are stuck with it for virtio but don't use it for any new driver: new drivers should name their devices PREFIX%d where the sequence number can be allocated by ida - sd_format_disk_name has exactly the same logic. Moving it to a central place was deferred over worries that this will make people keep using the legacy naming in new drivers. We kept code idential in case someone wants to deduplicate later. Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com> Acked-by: Asias He <asias@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2012-04-12 10:37:05 +03:00
Greg Kroah-Hartman	6363480651	block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy This removes the HOTPLUG_PCI_PCIE dependency on the driver and makes it depend on PCI. Cc: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-12 08:47:05 +02:00
Asai Thambi S P	95fea2f1d9	mtip32xx: dump tagmap on failure Dump tagmap on failure, instead of individual tags. Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:39 +02:00
Asai Thambi S P	c74b0f586f	mtip32xx: fix handling of commands in various scenarios * If a ncq command time out and a non-ncq command is active, skip restart port * Queue(pause) ncq commands during operations spanning more than one non-ncq commands - secure erase, download microcode * When a non-ncq command is active, allow incoming non-ncq commands to wait instead of failing back * Changed timeout for download microcode and smart commands * If the device in write protect mode, fail all writes (do not send to device) * Set maximum retries to 2 Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:39 +02:00
Asai Thambi S P	8a857a880b	mtip32xx: Shorten macro names Shortened macros used to represent mtip_port->flags and dd->dd_flag Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	8182b49528	mtip32xx: misc changes * Handle the interrupt completion of polled internal commands * Do not check remove pending flag for standby command * On rebuild failure, - set corresponding bit dd_flag - do not send standby command * Free ida index in remove path Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	f65872177d	mtip32xx: Add new sysfs entry 'status' * Add support for detecting the following device status - write protect - over temp (thermal shutdown) * Add new sysfs entry 'status', possible values - online, write_protect, thermal_shutdown * Add new file 'sysfs-block-rssd' to document ABI (Reported-by: Greg Kroah-Hartman) Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	dad40f16ff	mtip32xx: make setting comp_time as common Moved setting completion time into mtip_issue_ncq_command() Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Asai Thambi S P	45038367c2	mtip32xx: Add new bitwise flag 'dd_flag' * Merged the following flags into one variable 'dd_flag': * drv_cleanup_done * resumeflag * Added the following flags into 'dd_flag' * remove pending * init done * Removed 'ftlrebuildflag' (similar flag is already part of mti_port->flags) Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-09 08:35:38 +02:00
Linus Torvalds	9479f0f801	Two fixes for regressions: * one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree, * the other is to fix drivers to load on PV (a previous patch made them only load in PVonHVM mode). The rest are just minor fixes in the various drivers and some cleanup in the core code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPfyVUAAoJEFjIrFwIi8fJUjUH/jbY5JavRqSlNELZW2A4Ta76 8p00LqLHw/C56iHZcWKke8mqtWNb+ZfcQt7ZYcxDIYa4QWBL28x0OLAO2tOBIt37 ZjYESWSdFJaJvmpADluWtFyGyZ9TYJllDTBm/jWj1ZtKSZvR1YkhuMXCS0f4AmGQ xFzSWJZUDdiOAqpN+VQD8wP00gfR8knQLg16XE2fvFdQo4XwpCtqLfHV/5pMMGdy Cs/ep6rq/7cdv/nshKOcBnw7RW8l3Xoi/28ht8k3DvAQ2VtFq1Tugv2G9pcCHwQG DIBkB3SOU6/v6P5at5+egKS5xR1fJetCWlkMd8kkbcdz2NPI4UDMkvOW6Q8yQls= =6Ve+ -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull xen fixes from Konrad Rzeszutek Wilk: "Two fixes for regressions: * one is a workaround that will be removed in v3.5 with proper fix in the tip/x86 tree, * the other is to fix drivers to load on PV (a previous patch made them only load in PVonHVM mode). The rest are just minor fixes in the various drivers and some cleanup in the core code." * tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success xen/pciback: fix XEN_PCI_OP_enable_msix result xen/smp: Remove unnecessary call to smp_processor_id() xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries' xen: only check xen_platform_pci_unplug if hvm	2012-04-06 17:54:53 -07:00
Igor Mammedov	e95ae5a493	xen: only check xen_platform_pci_unplug if hvm commit b9136d207f08 xen: initialize platform-pci even if xen_emul_unplug=never breaks blkfront/netfront by not loading them because of xen_platform_pci_unplug=0 and it is never set for PV guest. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-04-06 12:12:52 -04:00
Alex Elder	cd9d9f5df6	rbd: don't hold spinlock during messenger flush A recent change made changes to the rbd_client_list be protected by a spinlock. Unfortunately in rbd_put_client(), the lock is taken before possibly dropping the last reference to an rbd_client, and on the last reference that eventually calls flush_workqueue() which can sleep. The problem was flagged by a debug spinlock warning: BUG: spinlock wrong CPU on CPU#3, rbd/27814 The solution is to move the spinlock acquisition and release inside rbd_client_release(), which is the spot where it's really needed for protecting the removal of the rbd_client from the client list. Signed-off-by: Alex Elder <elder@dreamhost.com> Reviewed-by: Sage Weil <sage@newdream.net>	2012-04-05 15:43:58 -05:00
Ryosuke Saito	6d27f09a63	mtip32xx: fix error handling in mtip_init() Ensure that block device is properly unregistered, if pci_register_driver() fails. Signed-off-by: Ryosuke Saito <raitosyo@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-04-05 08:09:34 -06:00
Len Brown	f6365201d8	x86: Remove the ancient and deprecated disable_hlt() and enable_hlt() facility The X86_32-only disable_hlt/enable_hlt mechanism was used by the 32-bit floppy driver. Its effect was to replace the use of the HLT instruction inside default_idle() with cpu_relax() - essentially it turned off the use of HLT. This workaround was commented in the code as: "disable hlt during certain critical i/o operations" "This halt magic was a workaround for ancient floppy DMA wreckage. It should be safe to remove." H. Peter Anvin additionally adds: "To the best of my knowledge, no-hlt only existed because of flaky power distributions on 386/486 systems which were sold to run DOS. Since DOS did no power management of any kind, including HLT, the power draw was fairly uniform; when exposed to the much hhigher noise levels you got when Linux used HLT caused some of these systems to fail. They were by far in the minority even back then." Alan Cox further says: "Also for the Cyrix 5510 which tended to go castors up if a HLT occurred during a DMA cycle and on a few other boxes HLT during DMA tended to go astray. Do we care ? I doubt it. The 5510 was pretty obscure, the 5520 fixed it, the 5530 is probably the oldest still in any kind of use." So, let's finally drop this. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephen Hemminger <shemminger@vyatta.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/n/tip-3rhk9bzf0x9rljkv488tloib@git.kernel.org [ If anyone cares then alternative instruction patching could be used to replace HLT with a one-byte NOP instruction. Much simpler. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-03-30 08:50:27 +02:00
Vivek Goyal	e9986f303d	virtio-blk: Call revalidate_disk() upon online disk resize If a virtio disk is open in guest and a disk resize operation is done, (virsh blockresize), new size is not visible to tools like "fdisk -l". This seems to be happening as we update only part->nr_sects and not bdev->bd_inode size. Call revalidate_disk() which should take care of it. I tested growing disk size of already open disk and it works for me. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-29 10:09:44 +02:00
Linus Torvalds	532bfc851a	Merge branch 'akpm' (Andrew's patch-bomb) Merge third batch of patches from Andrew Morton: - Some MM stragglers - core SMP library cleanups (on_each_cpu_mask) - Some IPI optimisations - kexec - kdump - IPMI - the radix-tree iterator work - various other misc bits. "That'll do for -rc1. I still have ~10 patches for 3.4, will send those along when they've baked a little more." * emailed from Andrew Morton <akpm@linux-foundation.org>: (35 commits) backlight: fix typo in tosa_lcd.c crc32: add help text for the algorithm select option mm: move hugepage test examples to tools/testing/selftests/vm mm: move slabinfo.c to tools/vm mm: move page-types.c from Documentation to tools/vm selftests/Makefile: make `run_tests' depend on `all' selftests: launch individual selftests from the main Makefile radix-tree: use iterators in find_get_pages* functions radix-tree: rewrite gang lookup using iterator radix-tree: introduce bit-optimized iterator fs/proc/namespaces.c: prevent crash when ns_entries[] is empty nbd: rename the nbd_device variable from lo to nbd pidns: add reboot_pid_ns() to handle the reboot syscall sysctl: use bitmap library functions ipmi: use locks on watchdog timeout set on reboot ipmi: simplify locking ipmi: fix message handling during panics ipmi: use a tasklet for handling received messages ipmi: increase KCS timeouts ipmi: decrease the IPMI message transaction time in interrupt mode ...	2012-03-28 17:19:28 -07:00
Wanlong Gao	f4507164e7	nbd: rename the nbd_device variable from lo to nbd rename the nbd_device variable from "lo" to "nbd", since "lo" is just a name copied from loop.c. Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-28 17:14:37 -07:00
Linus Torvalds	0195c00244	Disintegrate and delete asm/system.h -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+ /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD NypQthI85pc= =G9mT -----END PGP SIGNATURE----- Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...	2012-03-28 15:58:21 -07:00
Linus Torvalds	47b816ff7d	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull a few more things for powerpc by Benjamin Herrenschmidt: - Anton's did some recent improvements to EPOW event reporting on pSeries (power supply failures and such). The patches are self contained enough and replace really nasty code so I felt it should still go in - I did the vio driver registration change Greg requested, I don't see the point of leaving that til the next merge window - The remaining EEH changes I said were still pending to get rid of the EEH references from the generic struct device_node - A few more iSeries removal bits - A perf bug fix on 970 * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/perf: Fix instruction address sampling on 970 and Power4 powerpc+sparc/vio: Modernize driver registration powerpc: Random little legacy iSeries removal tidy ups powerpc: Remove NO_IRQ_IGNORE powerpc/pseries: Cut down on enthusiastic use of defines in RAS code powerpc/pseries: Clean up ras_error_interrupt code powerpc/pseries: Remove RTAS_POWERMGM_EVENTS powerpc/pseries: Use rtas_get_sensor in RAS code powerpc/pseries: Parse and handle EPOW interrupts powerpc: Make function that parses RTAS error logs global powerpc/eeh: Retrieve PHB from global list powerpc/eeh: Remove eeh information from pci_dn powerpc/eeh: Remove eeh device from OF node	2012-03-28 14:41:36 -07:00
David Howells	9ffc93f203	Remove all #inclusions of asm/system.h Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\sinclude\s<asm/system[.]h>.\n!!' `grep -Irl '^#\sinclude\s<asm/system[.]h>' ` Signed-off-by: David Howells <dhowells@redhat.com>	2012-03-28 18:30:03 +01:00
Linus Torvalds	56b59b429b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates for 3.4-rc1 from Sage Weil: "Alex has been busy. There are a range of rbd and libceph cleanups, especially surrounding device setup and teardown, and a few critical fixes in that code. There are more cleanups in the messenger code, virtual xattrs, a fix for CRC calculation/checks, and lots of other miscellaneous stuff. There's a patch from Amon Ott to make inos behave a bit better on 32-bit boxes, some decode check fixes from Xi Wang, and network throttling fix from Jim Schutt, and a couple RBD fixes from Josh Durgin. No new functionality, just a lot of cleanup and bug fixing." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (65 commits) rbd: move snap_rwsem to the device, rename to header_rwsem ceph: fix three bugs, two in ceph_vxattrcb_file_layout() libceph: isolate kmap() call in write_partial_msg_pages() libceph: rename "page_shift" variable to something sensible libceph: get rid of zero_page_address libceph: only call kernel_sendpage() via helper libceph: use kernel_sendpage() for sending zeroes libceph: fix inverted crc option logic libceph: some simple changes libceph: small refactor in write_partial_kvec() libceph: do crc calculations outside loop libceph: separate CRC calculation from byte swapping libceph: use "do" in CRC-related Boolean variables ceph: ensure Boolean options support both senses libceph: a few small changes libceph: make ceph_tcp_connect() return int libceph: encapsulate some messenger cleanup code libceph: make ceph_msgr_wq private libceph: encapsulate connection kvec operations libceph: move prepare_write_banner() ...	2012-03-28 10:01:29 -07:00
Benjamin Herrenschmidt	cb52d8970e	powerpc+sparc/vio: Modernize driver registration This makes vio_register_driver() get the module owner & name at compile time like PCI drivers do, and adds a name pointer directly in struct vio_driver to avoid having to explicitly initialize the embedded struct device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David S. Miller <davem@davemloft.net>	2012-03-28 11:33:24 +11:00
Jens Axboe	6674fb79ca	Merge branch 'stable/for-jens-3.4-bugfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-3.4/drivers Konrad writes: I've two small fixes for the xen-blkback - and I think one more will show up eventually (a partial revert), but not sure when. So in the spirit of keeping the patches flowing, please git pull the following branch.	2012-03-26 09:13:14 +02:00
Linus Torvalds	e22057c859	One tiny feature that accidentally got lost in the initial git pull: * Add fast-EOI acking of interrupts (clear a bit instead of hypercall) And bug-fixes: * Fix CPU bring-up code missing a call to notify other subsystems. * Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded. * In Xen ACPI processor driver: remove too verbose WARN messages, fix up the Kconfig dependency to be a module by default, and add dependency on CPU_FREQ. * Disable CPU frequency drivers from loading when booting under Xen (as we want the Xen ACPI processor to be used instead). * Cleanups in tmem code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAABAgAGBQJPbc3DAAoJEFjIrFwIi8fJTQkIAMnH2fPhcHAb4mNaz+3gdmsZ Flo6V1gMBcO8xKZlUkFgKKPYoOm7lLmvoceXLVSH5oOKSnSJo1zSinzKmcdJQo/D kPo4/EguNwtzcAcQh2dmT6/IM9O3ihMKUli7Oajif9PLCFFFqTaG3Y3YNBo/rxTY D3HAnJrIfmIyG0NpLnaFCWhCzUvcB4M7ysutECqcF8l5gnbHxRVeCKD0blM+n9GH Wyum00dQCwo6h6wTduhPOAxHAM4rncyR3heOB2vDxq9YJHSUhhcva5QCgQ+tdUVt 6U2TQT1L2Px8iXXzr2w9YBpepOVajZReoKhajLjJ5VbkpBZFz5dVNfJ8LpF8RV8= =z8IB -----END PGP SIGNATURE----- Merge tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull more xen updates from Konrad Rzeszutek Wilk: "One tiny feature that accidentally got lost in the initial git pull: * Add fast-EOI acking of interrupts (clear a bit instead of hypercall) And bug-fixes: * Fix CPU bring-up code missing a call to notify other subsystems. * Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded. * In Xen ACPI processor driver: remove too verbose WARN messages, fix up the Kconfig dependency to be a module by default, and add dependency on CPU_FREQ. * Disable CPU frequency drivers from loading when booting under Xen (as we want the Xen ACPI processor to be used instead). * Cleanups in tmem code." * tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/acpi: Fix Kconfig dependency on CPU_FREQ xen: initialize platform-pci even if xen_emul_unplug=never xen/smp: Fix bringup bug in AP code. xen/acpi: Remove the WARN's as they just create noise. xen/tmem: cleanup xen: support pirq_eoi_map xen/acpi-processor: Do not depend on CPU frequency scaling drivers. xen/cpufreq: Disable the cpu frequency scaling drivers from loading. provide disable_cpufreq() function to disable the API.	2012-03-24 12:20:25 -07:00
Konrad Rzeszutek Wilk	3389bb8bf7	xen/blkback: Make optional features be really optional. They were using the xenbus_dev_fatal() function which would change the state of the connection immediately. Which is not what we want when we advertise optional features. So make 'feature-discard','feature-barrier','feature-flush-cache' optional. Suggested-by: Jan Beulich <JBeulich@suse.com> [v1: Made the discard function void and static] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-24 10:04:36 -04:00
Konrad Rzeszutek Wilk	4dae76705f	xen/blkback: Squash the discard support for 'file' and 'phy' type. The only reason for the distinction was for the special case of 'file' (which is assumed to be loopback device), was to reach inside the loopback device, find the underlaying file, and call fallocate on it. Fortunately "xen-blkback: convert hole punching to discard request on loop devices" removes that use-case and we now based the discard support based on blk_queue_discard(q) and extract all appropriate parameters from the 'struct request_queue'. CC: Li Dongyang <lidongyang@novell.com> Acked-by: Jan Beulich <JBeulich@suse.com> [v1: Dropping pointless initializer and keeping blank line] [v2: Remove the kfree as it is not used anymore] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-03-24 10:04:35 -04:00
Oleg Nesterov	70834d3070	usermodehelper: use UMH_WAIT_PROC consistently A few call_usermodehelper() callers use the hardcoded constant instead of the proper UMH_WAIT_PROC, fix them. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Januszewski <spock@gentoo.org> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-23 16:58:41 -07:00
Asai Thambi S P	22be2e6e13	mtip32xx: fix incorrect value set for drv_cleanup_done, and re-initialize and start port in mtip_restart_port() This patch includes two changes: * fix incorrect value set for drv_cleanup_done * re-initialize and start port in mtip_restart_port() Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-23 12:33:03 +01:00
Stephen M. Cameron	bc67f63650	cciss: Fix scsi tape io with more than 255 scatter gather elements The total number of scatter gather elements in the CISS command used by the scsi tape code was being cast to a u8, which can hold at most 255 scatter gather elements. It should have been cast to a u16. Without this patch the command gets rejected by the controller since the total scatter gather count did not add up to the right value resulting in an i/o error. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-22 21:40:09 +01:00
Stephen M. Cameron	395d287526	cciss: Initialize scsi host max_sectors for tape drive support The default is too small (1024 blocks), use h->cciss_max_sectors (8192 blocks) Without this change, if you try to set the block size of a tape drive above 512*1024, via "mt -f /dev/st0 setblk nnn" where nnn is greater than 524288, it won't work right. Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-03-22 21:40:08 +01:00
Josh Durgin	c666601a93	rbd: move snap_rwsem to the device, rename to header_rwsem A new temporary header is allocated each time the header changes, but only the changed properties are copied over. We don't need a new semaphore for each header update. This addresses http://tracker.newdream.net/issues/2174 Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:52 -05:00
Alex Elder	32eec68d2f	rbd: don't drop the rbd_id too early Currently an rbd device's id is released when it is removed, but it is done before the code is run to clean up sysfs-related files (such as /sys/bus/rbd/devices/1). It's possible that an rbd is still in use after the rbd_remove() call has been made. It's essentially the same as an active inode that stays around after it has been removed--until its final close operation. This means that the id shows up as free for reuse at a time it should not be. The effect of this was seen by Jens Rehpoehler, who: - had a filesystem mounted on an rbd device - unmapped that filesystem (without unmounting) - found that the mount still worked properly - but hit a panic when he attempted to re-map a new rbd device This re-map attempt found the previously-unmapped id available. The subsequent attempt to reuse it was met with a panic while attempting to (re-)install the sysfs entry for the new mapped device. Fix this by holding off "putting" the rbd id, until the rbd_device release function is called--when the last reference is finally dropped. Note: This fixes: http://tracker.newdream.net/issues/1907 Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	593a9e7b34	rbd: small changes Here is another set of small code tidy-ups: - Define SECTOR_SHIFT and SECTOR_SIZE, and use these symbolic names throughout. Tell the blk_queue system our physical block size, in the (unlikely) event we want to use something other than the default. - Delete the definition of struct rbd_info, which is never used. - Move the definition of dev_to_rbd() down in its source file, just above where it gets first used, and change its name to dev_to_rbd_dev(). - Replace an open-coded operation in rbd_dev_release() to use dev_to_rbd_dev() instead. - Calculate the segment size for a given rbd_device just once in rbd_init_disk(). - Use the '%zd' conversion specifier in rbd_snap_size_show(), since the value formatted is a size_t. - Switch to the '%llu' conversion specifier in rbd_snap_id_show(). since the value formatted is unsigned. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	00f1f36ffa	rbd: do some refactoring A few blocks of code are rearranged a bit here: - In rbd_header_from_disk(): - Don't bother computing snap_count until we're sure the on-disk header starts with a good signature. - Move a few independent lines of code so they are after a check for a failed memory allocation. - Get rid of unnecessary local variable "ret". - Make a few other changes in rbd_read_header(), similar to the above--just moving things around a bit while preserving the functionality. - In rbd_rq_fn(), just assign rq in the while loop's controlling expression rather than duplicating it before and at the end of the loop body. This allows the use of "continue" rather than "goto next" in a number of spots. - Rearrange the logic in snap_by_name(). End result is the same. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	fed4c143ba	rbd: fix module sysfs setup/teardown code Once rbd_bus_type is registered, it allows an "add" operation via the /sys/bus/rbd/add bus attribute, and adding a new rbd device that way establishes a connection between the device and rbd_root_dev. But rbd_root_dev is not registered until after the rbd_bus_type registration is complete. This could (in principle anyway) result in an invalid state. Since rbd_root_dev has no tie to rbd_bus_type we can reorder these two initializations and never be faced with this scenario. In addition, unregister the device in the event the bus registration fails at module init time. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:50 -05:00
Alex Elder	7ef3214af2	rbd: don't allocate mon_addrs buffer in rbd_add() The mon_addrs buffer in rbd_add is used to hold a copy of the monitor IP addresses supplied via /sys/bus/rbd/add. That is passed to rbd_get_client(), which never modifies it (nor do any of the functions it gets passed to thereafter)--the mon_addr parameter to rbd_get_client() is a pointer to constant data, so it can't be modifed. Furthermore, rbd_get_client() has the length of the mon_addrs buffer and that is used to ensure nothing goes beyond its end. Based on all this, there is no reason that a buffer needs to be used to hold a copy of the mon_addrs provided via /sys/bus/rbd/add. Instead, the location within that passed-in buffer can be provided, along with the length of the "token" therein which represents the monitor IP's. A small change to rbd_add_parse_args() allows the address within the buffer to be passed back, and the length is already returned. This now means that, at least from the perspective of this interface, there is no such thing as a list of monitor addresses that is too long. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:50 -05:00
Alex Elder	5214ecc45c	rbd: have rbd_parse_args() report found mon_addrs size The argument parsing routine already computes the size of the mon_addrs buffer it extracts from the "command." Pass it to the caller so it can use it to provide the length to rbd_get_client(). Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	81a8979378	rbd: do a few checks at build time This is a bit gratuitous, but there are a few things that can be verified at build time rather than run time, so do that. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	e28fff268e	rbd: don't use sscanf() in rbd_add_parse_args() Make use of a few simple helper routines to parse the arguments rather than sscanf(). This will treat both missing and too-long arguments as invalid input (rather than silently truncating the input in the too-long case). In time this can also be used by rbd_add() to use the passed-in buffer in place, rather than copying its contents into new buffers. It appears to me that the sscanf() previously used would not correctly handle a supplied snapshot--the two final "%s" conversion specifications were not separated by a space, and I'm not sure how sscanf() handles that situation. It may not be well-defined. So that may be a bug this change fixes (but I didn't verify that). The sizes of the mon_addrs and options buffers are now passed to rbd_add_parse_args(), so they can be supplied to copy_token(). Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:49 -05:00
Alex Elder	a725f65e52	rbd: encapsulate argument parsing for rbd_add() Move the code that parses the arguments provided to rbd_add() (which are supplied via /sys/bus/rbd/add) into a separate function. Also rename the "mon_dev_name" variable in rbd_add() to be "mon_addrs". The variable represents a list of one or more comma-separated monitor IP addresses, each with an optional port number. I think "mon_addrs" captures that notion a little better. Signed-off-by: Alex Elder <elder@dreamhost.com>	2012-03-22 10:47:48 -05:00
Alex Elder	27cc25943f	rbd: simplify error handling in rbd_add() If a couple pointers are initialized to NULL then a single "out_nomem" label can be used for all of the memory allocation failure cases in rbd_add(). Also, get rid of the "irc" local variable there. There is no real need for "rc" to be type ssize_t, and it can be used in the spot "irc" was. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	60571c7d55	rbd: reduce memory used for rbd_dev fields The length of the string containing the monitor address specification(s) will never exceed the length of the string passed in to rbd_add(). The same holds true for the ceph + rbd options string. So reduce the amount of memory allocated for these to that length rather than the maximum (1024 bytes). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	d720bcb0a8	rbd: have rbd_get_client() return a rbd_client Since rbd_get_client() currently returns an error code. It assigns the rbd_client field of the rbd_device structure it is passed if successful. Instead, have it return the created rbd_client structure and return a pointer-coded error if there is an error. This makes the assignment of the client pointer more obvious at the call site. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	f0f8cef5a3	rbd: a few simple changes Here are a few very simple cleanups: - Add a "RBD_" prefix to the two driver name string definitions. - Move the definition of struct rbd_request below struct rbd_req_coll to avoid the need for an empty declaration of the latter. - Move and group the definitions of rbd_root_dev_release() and rbd_root_dev, as well as rbd_bus_type and rbd_bus_attrs[], close to the top of the file. Arrange the latter so rbd_bus_type.bus_attrs can be initialized statically. - Get rid of an unnecessary local variable in rbd_open(). - Rework some hokey logic in rbd_bus_add_dev(), so the value of "ret" at the end is either 0 or -ENOENT to avoid the need for the code duplication that was there. - Rename a goto target in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	432b858749	rbd: rename "node_lock" The spinlock used to protect rbd_client_list is named "node_lock". Rename it to "rbd_client_list_lock" to make it more obvious what it's for. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:48 -05:00
Alex Elder	bc534d86be	rbd: move ctl_mutex lock inside rbd_client_create() Since rbd_client_create() is only called in one place, move the acquisition of the mutex around that call inside that function. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	d97081b0c7	rbd: move ctl_mutex lock inside rbd_get_client() Since rbd_get_client() is only called in one place, move the acquisition of the mutex around that call inside that function. Furthermore, within rbd_get_client(), it appears the mutex only needs to be held while calling rbd_client_create(). (Moving the lock inside that function will wait for the next patch.) Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	e6994d3dde	rbd: release client list lock sooner In rbd_get_client(), if a client is reused, a number of things get done while still holding the list lock unnecessarily. This just moves a few things that need no lock protection outside the lock. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	d184f6bfde	rbd: restore previous rbd id sequence behavior It used to be that selecting a new unique identifier for an added rbd device required searching all existing ones to find the highest id is used. A recent change made that unnecessary, but made it so that id's used were monotonically non-decreasing. It's a bit more pleasant to have smaller rbd id's though, and this change makes ids get allocated as they were before--each new id is one more than the maximum currently in use. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	499afd5b8e	rbd: tie rbd_dev_list changes to rbd_id operations The only time entries are added to or removed from the global rbd_dev_list is exactly when a "put" or "get" operation is being performed on a rbd_dev's id. So just move the list management code into get/put routines. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	e124a82f3c	rbd: protect the rbd_dev_list with a spinlock The rbd_dev_list is just a simple list of all the current rbd_devices. Using the ctl_mutex as a concurrency guard is overkill. Instead, use a spinlock for that specific purpose. This also reduces the window that the ctl_mutex needs to be held in rbd_add(). Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00
Alex Elder	1ddbe94eda	rbd: rework calculation of new rbd id's In order to select a new unique identifier for an added rbd device, the list of all existing ones is searched and a value one greater than the highest id is used. The list search can be avoided by using an atomic variable that keeps track of the current highest id. Using a get/put model for id's we can limit the boundless growth of id numbers a bit by arranging to reuse the current highest id once it gets released. Add these calls to "put" the id when an rbd is getting removed. Note that this changes the pattern of device id's used--new values will never be below the highest one seen so far (even if there exists an unused lower one). I assert this is OK because the key property of an rbd id is its uniqueness, not its magnitude. Regardless, a follow-on patch will restore the old way of doing things, I just think this commit just makes the incremental change to atomics a little easier to understand. Signed-off-by: Alex Elder <elder@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2012-03-22 10:47:47 -05:00

1 2 3 4 5 ...

2531 Commits