In case a write failes on the local disk, go into D_INCONSISTENT
disk state. That causes future reads of that block to be shipped
to the peer.
Read retry remote was already in place.
Actually the documentation needs to get fixed now. Since the
application is still shielded from the error. (as long as we have
only a single disk failing) The difference to detach is that
we keep the disk. And therefore might keep all the other, still
working sectors up to date.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
After discovering that wide use of prefetch on modern CPUs
could be a net loss instead of a win, net drivers which were
relying on the implicit inclusion of prefetch.h via the list
headers showed up in the resulting cleanup fallout. Give
them an explicit include via the following $0.02 script.
=========================================
#!/bin/bash
MANUAL=""
for i in `git grep -l 'prefetch(.*)' .` ; do
grep -q '<linux/prefetch.h>' $i
if [ $? = 0 ] ; then
continue
fi
( echo '?^#include <linux/?a'
echo '#include <linux/prefetch.h>'
echo .
echo w
echo q
) | ed -s $i > /dev/null 2>&1
if [ $? != 0 ]; then
echo $i needs manual fixup
MANUAL="$i $MANUAL"
fi
done
echo ------------------- 8\<----------------------
echo vi $MANUAL
=========================================
Signed-off-by: Paul <paul.gortmaker@windriver.com>
[ Fixed up some incorrect #include placements, and added some
non-network drivers and the fib_trie.c case - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In commit 95a0f10cdd ("drbd: store in-core bitmap little endian,
regardless of architecture") drbd had made the sane choice to use
little-endian bitmap functions everywhere. However, it used the
horrible old functions names from <asm-generic/bitops/le.h>, that were
never really meant to be exported.
In the meantime, things got cleaned up, and in commit c4945b9ed4
("asm-generic: rename generic little-endian bitops functions") we
renamed the LE bitops to something sane, exactly so that they could be
used in random code without people gouging their eyes out when seeing
the crazy jumble of letters that were the old internal names.
As a result the drbd thing merged cleanly (commit 8d49a77568: "Merge
branch 'for-2.6.39/drivers' of git://git.kernel.dk/linux-2.6-block"),
since there was no data conflict - but the end result obviously doesn't
actually compile.
Reported-and-tested-by: Ingo Molnar <mingo@elte.hu>
Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-2.6.39/drivers' of git://git.kernel.dk/linux-2.6-block: (122 commits)
cciss: fix lost command issue
drbd: need include for bitops functions declarations
Revert "cciss: Add missing allocation in scsi_cmd_stack_setup and corresponding deallocation"
cciss: fix missed command status value CMD_UNABORTABLE
cciss: remove unnecessary casts
cciss: Mask off error bits of c->busaddr in cmd_special_free when calling pci_free_consistent
cciss: Inform controller we are using 32-bit tags.
cciss: hoist tag masking out of loop
cciss: Add missing allocation in scsi_cmd_stack_setup and corresponding deallocation
cciss: export resettable host attribute
drbd: drop code present under #ifdef which is relevant to 2.6.28 and below
drbd: Fixed handling of read errors on a 'VerifyS' node
drbd: Fixed handling of read errors on a 'VerifyT' node
drbd: Implemented real timeout checking for request processing time
drbd: Remove unused function atodb_endio()
drbd: improve log message if received sector offset exceeds local capacity
drbd: kill dead code
drbd: don't BUG_ON, if bio_add_page of a single page to an empty bio fails
drbd: Removed left over, now wrong comments
drbd: serialize admin requests for new verify run with pending bitmap io
...
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...
Fix up conflicts in fs/{aio.c,super.c}
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This code became obsolete and unused last December with
drbd: bitmap keep track of changes vs on-disk bitmap
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Just deal with it more gracefully, if we fail to add even a single page
to an empty bio. We used to BUG_ON() there, but it has been observed in
some Xen deployment, so we need to handle that case more robustly now.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This is an addendum to
drbd: serialize admin requests for new resync with pending bitmap io
It avoids a race that could trigger "FIXME" assert log messages.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When we receive a barrier ack, we walk the ring list of drbd requests
in the transfer log of the respective epoch, do some housekeeping,
and free those objects.
We tried to keep epochs of mirrored and unmirrored drbd requests
separate, and assert that no local-only requests are present in a
barrier_acked epoch.
It turns out that this has quite a number of corner cases and would
add bloated code without functional benefit.
We now revert the (insufficient) commits
drbd: Fixed an issue with AHEAD -> SYNC_SOURCE transitions
drbd: Ensure that an epoch contains only requests of one kind
and instead fix the processing of barrier acks to cope with
a mix of local-only and mirrored requests.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we fail to send the information that we lost our disk,
we have no connection, and no disk: no access to data anymore.
That is either expected (deconfiguration), or there will be so much
noise in the logs that "Sending state failed" is not useful at all.
Drop it.
If the reason for a shorter than expected receive was a signal,
which we sent because we already decided to disconnect,
these additional log messages are confusing and useless.
This patch follows this pattern:
- dev_warn(DEV, "short read expecting header on sock: r=%d\n", r);
+ if (!signal_pending(current))
+ dev_warn(DEV, "short read expecting header on sock: r=%d\n", r);
Also make them all dev_warn for consistency.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Now that we do no longer in-place endian-swap the bitmap, we allow
selected bitmap operations (testing bits, sometimes even settting bits)
during some bulk operations.
This caused us to hit a lot of FIXME asserts similar to
FIXME asender in drbd_bm_count_bits,
bitmap locked for 'write from resync_finished' by worker
Which now is nonsense: looking at the bitmap is perfectly legal
as long as it is not being resized.
This cosmetic patch defines some flags to describe expectations in finer
detail, so the asserts in e.g. bm_change_bits_to() can be skipped if
appropriate.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
All decisions about sync, sync direction, and wether or not to
allow a connect or attach are based on our set of UUIDs to tag a
data generation.
Log changes to the UUIDs whenever they occur,
logging "new current UUID P:Q:R:S" is more useful
than "Creating new current UUID".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When the user clears the sync-pause flag, and sync stays in pause
state, give hints to the user, why it still is in pause state.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The "lazy writeout" of cleared bitmap pages happens during resync, and
should happen again once the resync finishes cleanly, or is aborted.
If resync finished cleanly, or was aborted because of peer disk
failure, we trigger the writeout from worker context in the after
state change work.
If resync was aborted because of connection failure, we should not
immediately trigger bitmap writeout, but rather postpone the
writeout to after the connection cleanup happened. We now do it
in the receiver context from drbd_disconnect().
If resync was aborted because of local disk failure, well, there
is nothing to write to anymore.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This is a minor optimization and cleanup,
and also considerably reduces some harmless (but noisy) race with
the connection cleanup code.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The assert in drbd_req.c:755 forces us to have only requests of
one kind in an epoch. The two kinds we distinguish here are:
local-only or mirrored.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Protocol A has no P_WRITE_ACKs, but has P_NEG_ACKs.
The master bio might already be completed, therefore the
request is no longer in the collision hash.
=> Do not try to validate block_id as request
In Protocol B we might already have got a P_RECV_ACK
but then get a P_NEG_ACK after wards.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The point is that drbd_disconnect() can be called with a cstate of
WFConnection.
That happens if the user issues "drbdsetup disconnect" while the
drbd_connect() function executes. Then drbdd_init() will call
drbdd(), which in turn will return without receiving any
packets. Then drbdd_init() will end up calling drbd_disconnect()
with a cstate of WFConnection.
Bottom line: This assertion is wrong as it is, and we do not
see value in fixing it. => Removing it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The test if rs_pending_cnt == 0 was too weak. Using Test for
unacked_cnt == 0 instead. Moved that into the worker.
Since unacked_cnt gets already increased when an P_RS_DATA_REQ
comes in.
Also using a timer to make Ahead -> SyncSource -> Ahead cycles
slower...
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
See also commit from 2009-08-15
"drbd_uuid_compare(): Do not full sync in case a P_SYNC_UUID packet gets lost."
We saw cases where the History UUIDs where not as expected. So the
detection of the special case did not trigger. With the sync UUID
no longer being a random number, but deducible from the previous
bitmap UUID, the detection of this special case becomes more
reliable.
The SyncUUID now is the previous bitmap UUID + 0x1000000000000.
Rule 5a:
Cs = H1p & H1p + Offset = Bp
Connection was lost before SyncUUID Packet came through.
Corrent (peer) UUIDs:
Bp = H1p
H1p = H2p
H2p = 0
Become Sync target.
Rule 7a:
Cp = H1s & H1s + Offset = Bs
Connection was lost before SyncUUID Packet came through.
Correct (own) UUIDs:
Bs = H1s
H1s = H2s
H2s = 0
Become Sync source.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Besides removed a few lines of code, this moves the inspection
of the state from before the queuing process to after the queuing.
I.e. more closely to the actual invocation of the work.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We may not get from SyncSource to Ahead if we have sent some
P_RS_DATA_REPLY packets to the peer and are waiting for
P_WRITE_ACK.
Again, this is not relevant for proper tuned systems, but makes
sure that the not-tuned system does not get diverging bitmaps.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When the sync source node replies to a P_RS_DATA_REQUEST packet
when it is already in ahead mode. I.e. those two packets
crossed each other on the wire, that may lead to diverging
bitmaps.
This never happens in a well-tuned-system. In a well-tuned-
system the resync controller has reduced the resync speed
to zero long before we got into ahead-mode.
But we have to be prepared for the not-well-tuned-system
of course as well.
Because -> diverging bitmaps = non terminating resync.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Create a new barrier when leaving the AHEAD mode.
Otherwise we trigger the assertion in req_mod(, barrier_acked)
D_ASSERT(req->rq_state & RQ_NET_SENT);
The new barrier is created by recycling the newest existing one.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When on-no-data-accessible is set to suspend-io, also consider that
a Primary, SyncTarget node losses its connection.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
I run into something declaring itself as "spinlock deadlock",
BUG: spinlock lockup on CPU#1, kjournald/27816, ffff88000ad6bca0
Pid: 27816, comm: kjournald Tainted: G W 2.6.34.6 #2
Call Trace:
<IRQ> [<ffffffff811ba0aa>] do_raw_spin_lock+0x11e/0x14d
[<ffffffff81340fde>] _raw_spin_lock_irqsave+0x6a/0x81
[<ffffffff8103b694>] ? __wake_up+0x22/0x50
[<ffffffff8103b694>] __wake_up+0x22/0x50
[<ffffffffa07ff661>] bm_async_io_complete+0x258/0x299 [drbd]
but the call traces do not fit at all,
all other cpus are cpu_idle.
I think it may be this race:
drbd_bm_write_page
wait_queue_head_t io_wait;
atomic_t in_flight;
bm_async_io
submit_bio
bm_async_io_complete
if (atomic_dec_and_test(in_flight))
wait_event(io_wait,
atomic_read(in_flight) == 0)
return
wake_up(io_wait)
The wake_up now accesses the wait_queue_head_t spinlock, which is no
longer valid, since the stack frame of drbd_bm_write_page has been
clobbered now.
Fix this by using struct completion, which does both the condition test
as well as the wake_up inside its spinlock, so this race cannot happen.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Even though we now track the need for bitmap writeout per bitmap page,
there is no need to trigger the writeout while a resync is going on.
Once the resync is finished (or aborted),
we trigger bitmap writeout anyways.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We expect changes to a bitmap page in drbd_bm_write_page,
that's why we submit a copy page.
If a page changes during global writeout, that would be unexpected,
and reason to warn, though.
Also, often page writeout can be skipped (on activity log transactions
during normal operation, for example), no need to log that everytime.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
To improve the latency of IO requests during bitmap exchange,
we recently allowed writes while waiting for the bitmap, sending "set
out-of-sync" information packets for any newly dirtied bits.
We have to make sure that the new resync-uuid does not overtake
these "set oos" packets. Once the resync-uuid is received, the
sync target starts the resync process, and expects the bitmap to
only be cleared, not re-set.
If we use this protocol extension, we queue the generation and sending
of the resync-uuid on the worker, which naturally serializes with all
previously queued packets.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We expect to only receive the recently introduced "set out of sync"
packets in specific states. If we receive them in different states, that
may confuse the resync process to the point where it won't terminate, or
think it made negative progress.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If drbd used to have crypto digest algorithms configured, then is being
unconfigured (but not unloaded), it frees the algorithms, but does not
reset the config. If it then is reconfigured to use the very same
algorithm, it "forgot" to re-allocate the algorithms, thinking that the
config has not changed in that aspect.
It will then Oops on the first attempt to actually use those algorithms.
Fix this by resetting the config to defaults after cleanup.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We must not call it directly from resync_finished,
as we may be in either receiver or worker context there.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Long time ago, we had paranoia code in the bitmap that allocated one
extra word, assigned a magic value, and checked on every occasion that
the magic value was still unchanged.
That debug code is unused, the extra long word complicates code a bit.
Get rid of it.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When we set or clear bits in a bitmap page,
also set a flag in the page->private pointer.
This allows us to skip writes of unchanged pages.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Our on-disk bitmap is a little endian bitstream.
Up to now, we have stored the in-core copy of that in
native endian, applying byte order conversion when necessary.
Instead, keep the bitmap pages little endian, as they are read from disk,
and use the generic_*_le_bit family of functions.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We trusted the on-disk bitmap to have unused bits cleared.
In case that is not true for whatever reason,
and we take a code path where the unused bits don't get cleared
elsewhere (bm_clear_surplus is not called), we may miscount the bits,
and get confused during resync, waiting for bits to get cleared that we
don't even use: the resync process would not terminate.
Fix this by masking out unused bits in __bm_count_bits.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The old name is confusing: the function does not increment anything.
Also rename _inc_ap_bio_cond to inc_ap_bio_cond: there is no need for
an underscore.
Finally, make it clear that these functions return boolean values.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
I guess bitmap I/O errors are supposed to cause drbd_determin_dev_size
to return dev_size_error.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Warning: comparison between ‘enum drbd_ret_code’ and ‘enum drbd_state_rv’
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The FAULT_ACTIVE macro just wraps the drbd_insert_fault macro for no
apparent reason.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This macro doesn't save much code, but makes things a lot harder to read.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
commit e2041475e6ddb081734d161f6421977323f5a9b9
drbd: Starting with protocol 96 we can allow app-IO while receiving the bitmap
Contained a bad chunk that tried to optimize away drbd barriers during
bitmap exchange, but accidentally dropped them for normal mode as well.
Impact: depending on activity log size and access pattern, activity log
extents may not be recycled in time, causeing IO to block indefinetely.
Fix: skip drbd barriers only if there is no connection to send them on,
or the request being completed has not been on the network at all.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
As the network connection can be lost at any time, a --force option
for disconnect is just a matter of completeness.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
In case we ever should add an other packet type,
we must not reuse 27, as that currently used for
"empty" return code only replies.
Document it as such.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Make sure we start with clean buffers to not accidentally send garbage
back to userspace. Note: has not been observed; but just in case.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
There still exists a (theoretical) race on module unload, where
/proc/drbd may still exist, but the netlink callback has been
unregistered already, allowing drbdsetup to shout without listeners,
and get no reply.
Reorder remove_proc_entry and unregister of netlink callback.
drbdsetup first checks for existence of the proc entry,
and if that is missing, won't even try to contact the module.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If someone holds /proc/drbd open, previously rmmod would
"succeed" in starting the unload, but then block on remove_proc_entry,
leading to a situation where the lsmod does not show drbd anymore,
but /proc/drbd being still there (but no longer accessible).
I'd rather have rmmod fail up front in this case.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Given low-enough network bandwidth combined with a IO
pattern that hammers onto a single RS-extent, side-stepping
might be necessary for much longer times.
Changed the code to print a single informal message after
20 seconds, but it keeps on stepping aside forever.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This patch is acutally a necessary addendum to the patch
"fix for spurious full sync (becoming sync target looked like invalidate)"
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* C_STARTING_SYNC_S, C_STARTING_SYNC_T In these states the bitmap gets
written to disk. Locking out of app-IO is done by using the
drbd_queue_bitmap_io() and drbd_bitmap_io() functions these days.
It is no longer necessary to lock out app-IO based on the connection
state.
App-IO that may come in after the BITMAP_IO flag got cleared before the
state transition to C_SYNC_(SOURCE|TARGET) does not get mirrored, sets
a bit in the local bitmap, that is already set, therefore changes nothing.
* C_WF_BITMAP_S In this state we send updates (P_OUT_OF_SYNC packets).
With that we make sure they have the same number of bits when going
into the C_SYNC_(SOURCE|TARGET) connection state.
* C_UNCONNECTED: The receiver starts, no need to lock out IO.
* C_DISCONNECTING: in drbd_disconnect() we had a wait_event()
to wait until ap_bio_cnt reaches 0. Removed that.
* C_TIMEOUT, C_BROKEN_PIPE, C_NETWORK_FAILURE
C_PROTOCOL_ERROR, C_TEAR_DOWN: Same as C_DISCONNECTING
* C_WF_REPORT_PARAMS: IO still possible since that is still
like C_WF_CONNECTION.
And we do not need to send barriers in C_WF_BITMAP_S connection state.
Allow concurrent accesses to the bitmap when receiving the bitmap.
Everything gets ORed anyways.
A drbd_free_tl_hash() is in after_state_chg_work(). At that point
all the work items of the last connections must have been processed.
Introduced a call to drbd_free_tl_hash() into drbd_free_mdev()
for paranoia reasons.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The relevant change is that the state change to C_FW_BITMAP_S should
implicitly change pdsk to C_CONSISTENT. (Think of it as C_OUTDATED, only
without the guarantee that the peer has the outdated written to its
meta data)
At that opportunity I restructured the switch statement so that it
gets evaluated every time. (Has declarative character)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
May only test for ap_bio_cnt == 0 under req_lock. It can increase
only under req_lock.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The condition must be checked after perpare_to_wait(). The old
implementaion could loose wakeup events. Never observed in real
life.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Since inc_ap_bio() might sleep already
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Before:
drbd_rs_begin_io() locked app-IO out of an RS extent, and
waited then until all previous app-IO in that area finished.
(But not only until the disk-IO was finished but until the
barrier/epoch ack came in for that == round trip time latency ++)
After:
As soon as a new app-IO waits wants to start new IO on that
RS extent, drbd_rs_begin_io() steps aside (clearing the
BME_NO_WRITES flag again). It retries after 100ms.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We only issue resync requests if there is no significant application IO
going on. = Application IO has higher priority than resnyc IO.
If application IO can not be started because the resync process locked
an resync_lru entry, start the IO operations necessary to release the
lock ASAP.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This one should be replaced with moving this cleanup to the
'right' position.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
In this connection mode, the ahead node no longer replicates
application IO. The behind's disk becomes out dated.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With commit
drbd: further converge progress display of resync and online-verify
accidentally an u64/u64 div was introduced, causing an unresolvable
symbol __udivdi3 to be reference. Actually for that division, 32bit are
still suficient for now, so we can revert to unsigned long instead.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
To ease tracking of bios in some hash tables, we want it to
not cross certain boundaries (128k, used to be 32k).
We limit the maximum bio size using queue parameters.
Historically some defines and variables we use there have been named
max_segment_size, which was misguided. Rename them to max_bio_size,
and use [blk_]queue_max_hw_sectors where appropriate.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
We used to be limited to 32k requests,
but have increased that limit to 128k now.
This part of the code can only deal with 32k,
it would scramble arbitrary pages for larger requests.
As it is used for debugging only anyways,
it is ok to simply truncate the dumped data here.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With data-integrity digest enabled, double-check on the sending side
for modifications by upper layers of buffers under write back,
so we can tell it appart from corruption on the "wire".
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Show progressbar and ETA always, with proc_details >= 1 also show the
current sector position for both resync and online-verify on both nodes.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
When converting bits (4k resolution, still) to kB, we shift left. If it
was a large number of bits on a 32bit box (>= 4 TiB storage), we may
wrap the 32bit unsigned long base type, resulting in incorrect display.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Preparation patch to be able to use the auto-throttling resync controller
for online-verify requests as well.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Preparation patch to be able to use the auto-throttling resync controller
for online-verify requests as well.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This is in preparation to unify progress reporting of
online-verify and resync requests.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
For partial (resumed) online verify, initialize the resync step marks
once we know what the online verify start sector is.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
For a partial (resumed) online-verify, initialize rs_total not to total
bits, but to number of bits to check in this run, to match the meaning
rs_total has for actual resync.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
For network hickups during online-verify, on the next verify
triggered, we by default want to resume where it left off.
After any replication link interruption, there will be a (possibly
empty) resync. Do not reset online-verify start sector if some resync
completed, that would defeats the purpose.
Only reset the start sector once a verify run is completed.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Netlink message processing in the kernel is synchronous these days,
capabilities can be checked directly in security_netlink_recv() from
the current process.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Reviewed-by: James Morris <jmorris@namei.org>
[chrisw: update to include pohmelfs and uvesafb]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...
In commit 9b7f76dc37919ea36caa9680a3f765e5b19b25fb,
Author: Lars Ellenberg <lars.ellenberg@linbit.com>
Date: Wed Aug 11 23:40:24 2010 +0200
drbd: new configuration parameter c-min-rate
a bad chunk slipped through, which is now reverted as well,
restoring the correct irqsave for the endio callback.
This patch also add comments at both req_mod()
and in the endio callback so it should not happen again.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This should fix a performance degradation we observed recently.
If we don't expect any subheader, we should not call into the tcp stack,
as that may add considerable latency if there is no data available at
this point.
For a synthetic synchronous write load with single outstanding writes,
this additional latency when processing the "unplug remote" packet
added up to a performance degradation factor >= 10.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.
Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After recent blkdev_get() modifications, open_by_devnum() and
open_bdev_exclusive() are simple wrappers around blkdev_get().
Replace them with blkdev_get_by_dev() and blkdev_get_by_path().
blkdev_get_by_dev() is identical to open_by_devnum().
blkdev_get_by_path() is slightly different in that it doesn't
automatically add %FMODE_EXCL to @mode.
All users are converted. Most conversions are mechanical and don't
introduce any behavior difference. There are several exceptions.
* btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
reason to OR it explicitly on blkdev_put().
* gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
sb->s_mode.
* With the above changes, sb->s_mode now always should contain
FMODE_EXCL. WARN_ON_ONCE() added to kill_block_super() to detect
errors.
The new blkdev_get_*() functions are with proper docbook comments.
While at it, add function description to blkdev_get() too.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Joern Engel <joern@lazybastard.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: reiserfs-devel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.
* blkdev_get/put() are the primary open and close functions.
* bd_claim/release() deal with exclusive open.
* open/close_bdev_exclusive() are combination of open and claim and
the other way around, respectively.
* bd_link/unlink_disk_holder() to create and remove holder/slave
symlinks.
* open_by_devnum() wraps bdget() + blkdev_get().
The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access. Reorganize the interface such that,
* blkdev_get() is extended to include exclusive access management.
@holder argument is added and, if is @FMODE_EXCL specified, it will
gain exclusive access atomically w.r.t. other exclusive accesses.
* blkdev_put() is similarly extended. It now takes @mode argument and
if @FMODE_EXCL is set, it releases an exclusive access. Also, when
the last exclusive claim is released, the holder/slave symlinks are
removed automatically.
* bd_claim/release() and close_bdev_exclusive() are no longer
necessary and either made static or removed.
* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
is no longer necessary and removed.
* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
and blkdev_get(). It also has an unexpected extra bdev_read_only()
test which probably should be moved into blkdev_get().
* open_by_devnum() is modified to take @holder argument and pass it to
blkdev_get().
Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should). This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.
open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features. Well, let's leave them for another day.
Most conversions are straight-forward. drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Brown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen <leochen@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Joern Engel <joern@logfs.org>
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Convert direct reads of an inode's i_size to using i_size_read().
i_size_{read,write} use a seqcount to protect reads from accessing
incomple writes. Concurrent i_size_write()s require mutual exclussion
to protect the seqcount that is used by i_size_{read,write}. But
i_size_read() callers do not need to use additional locking.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: NeilBrown <neilb@suse.de>
Acked-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Failure to create drbd_ee_mempool appears not to get checked. Looks like
a copy-and-paste problem to me.
Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
xen-blkfront: disable barrier/flush write support
Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
block: remove BLKDEV_IFL_WAIT
aic7xxx_old: removed unused 'req' variable
block: remove the BH_Eopnotsupp flag
block: remove the BLKDEV_IFL_BARRIER flag
block: remove the WRITE_BARRIER flag
swap: do not send discards as barriers
fat: do not send discards as barriers
ext4: do not send discards as barriers
jbd2: replace barriers with explicit flush / FUA usage
jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
jbd: replace barriers with explicit flush / FUA usage
nilfs2: replace barriers with explicit flush / FUA usage
reiserfs: replace barriers with explicit flush / FUA usage
gfs2: replace barriers with explicit flush / FUA usage
btrfs: replace barriers with explicit flush / FUA usage
xfs: replace barriers with explicit flush / FUA usage
block: pass gfp_mask and flags to sb_issue_discard
dm: convey that all flushes are processed as empty
...
* 'for-2.6.37/drivers' of git://git.kernel.dk/linux-2.6-block: (95 commits)
cciss: fix PCI IDs for new Smart Array controllers
drbd: add race-breaker to drbd_go_diskless
drbd: use dynamic_dev_dbg to optionally log uuid changes
dynamic_debug.h: Fix dynamic_dev_dbg() macro if CONFIG_DYNAMIC_DEBUG not set
drbd: cleanup: change "<= 0" to "== 0"
drbd: relax the grace period of the md_sync timer again
drbd: add some more explicit drbd_md_sync
drbd: drop wrong debug asserts, fix recently introduced race
drbd: cleanup useless leftover warn/error printk's
drbd: add explicit drbd_md_sync to drbd_resync_finished
drbd: Do not log an ASSERT for P_OV_REQUEST packets while C_CONNECTED
drbd: fix for possible deadlock on IO error during resync
drbd: fix unlikely access after free and list corruption
drbd: fix for spurious fullsync (uuids rotated too fast)
drbd: allow for explicit resync-finished notifications
drbd: preparation commit, using full state in receive_state()
drbd: drbd_send_ack_dp must not rely on header information
drbd: Fix regression in recv_bm_rle_bits (compressed bitmap)
drbd: Fixed a stupid copy and paste error
drbd: Allow larger values for c-fill-target.
...
Fix up trivial conflict in drivers/block/ataflop.c due to BKL removal
* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
cfq-iosched: Fix a gcc 4.5 warning and put some comments
block: Turn bvec_k{un,}map_irq() into static inline functions
block: fix accounting bug on cross partition merges
block: Make the integrity mapped property a bio flag
block: Fix double free in blk_integrity_unregister
block: Ensure physical block size is unsigned int
blkio-throttle: Fix possible multiplication overflow in iops calculations
blkio-throttle: limit max iops value to UINT_MAX
blkio-throttle: There is no need to convert jiffies to milli seconds
blkio-throttle: Fix link failure failure on i386
blkio: Recalculate the throttled bio dispatch time upon throttle limit change
blkio: Add root group to td->tg_list
blkio: deletion of a cgroup was causes oops
blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
block: set the bounce_pfn to the actual DMA limit rather than to max memory
block: revert bad fix for memory hotplug causing bounces
Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
block: set the bounce_pfn to the actual DMA limit rather than to max memory
block: Prevent hang_check firing during long I/O
cfq: improve fsync performance for small files
...
Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h
That assertion's condition needed adjustment for today's semantics
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we don't rate limit it, and you happen to log err level messages via
serial console, an IO error on a disconnected Primary may cause serious
unresponsiveness.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This codepath used to be called only for failed kmalloc GFP_ATOMIC,
but is now also triggered by other things.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we get an IO-error during an activity log transaction,
if we failed to write the bitmap of the evicted extent,
we must not write the transaction itself.
If we failed to write the transaction,
we must not even submit the corresponding bio,
as its extent is not yet marked in the activity log.
Otherwise, if this was a disconneted Primary (degraded cluster), which
now lost its disk as well, and we later re-attach the same backend
storage, we possibly "forget" to resync some parts of the disk that
potentially have been changed.
On the receiving side, when receiving from a peer with unhealthy disk,
checking for pdsk == D_DISKLESS is not enough, we need to set out of
sync and do AL transactions for everything pdsk < D_INCONSISTENT on the
receiving side.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If we have contention in drbd_al_begin_iod (heavy randon IO),
an administrative request to detach the disk may deadlock
for similar reasons as the recently fixed deadlock if detaching
because of IO-error.
The approach taken here is to either go through the intermediate
cleanup state D_FAILED, or first lock out application io,
don't just go directly to D_DISKLESS.
We need an additional state bit (WAS_IO_ERROR) to distinguish
the -> D_FAILED because of IO-error from other failures.
Sanitize D_ATTACHING -> D_FAILED to D_ATTACHING -> D_DISKLESS.
If only attaching, ldev may be missing still, but would be referenced
from within the after_state_ch for -> D_FAILED, potentially
dereferencing a NULL pointer.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If those messages ever get logged, clearly state that they are
actually failed ASSERTS, so our regression tests can pick them up
from the logs more easily.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Every code path changing the current UUID needs to get it on stable
storage anyways. Flush it to disk right there, remove the now obsolte
explicit drbd_md_sync statements in the other code paths.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
This adds a necessary race breaker to these commits:
drbd: fix for possible deadlock on IO error during resync
drbd: drop wrong debug asserts, fix recently introduced race
What we do is get a refcount, check the state, then depending on the
state and the requested minimum disk state, either hold it (success),
or give it back immediately (failed "try lock").
Some code paths (flushing of drbd metadata) may still grab and hold a
refcount even if we are D_FAILED (application IO won't).
So even if we hit local_cnt == 0 once after being D_FAILED,
we still need to wait for that again after we changed to D_DISKLESS.
Once local_cnt reaches 0 while we are D_DISKLESS, we can be sure that
no one will look at the protected members anymore, so only then is it
safe to free them.
We cannot easily convert to standard locking primitives here, as we want
to be able to use it in atomic context (we always do a "try lock"),
as well as hold references for a "long time" (from IO submission to
completion callback).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
dt is unsigned so it's never less than zero. We are calculating the
elapsed time, and that's never less than zero (unless there is a bug or
we invent time travel). The comparison here is just to guard against
divide by zero bugs.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Consolidate the ifdef's for the debug level, accidentally the used both
DEBUG and DRBD_DEBUG_MD_SYNC. Default to off.
For production, we can safely reduce the grace period for this timer
again the the value we used to have.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>