Commit Graph

132 Commits

Author SHA1 Message Date
Philipp Reisner
23ce422748 drbd: Null pointer deref fix to the large "multi bio rewrite"
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:01 +02:00
Philipp Reisner
0c3f34516e drbd: Create new current UUID as late as possible
The choice was to either delay creation of the new UUID until
IO got thawed or to delay it until the first IO request.

Both are correct, the later is more friendly to users of
dual-primary setups, that actually only write on one side.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:03:49 +02:00
Lars Ellenberg
a1c88d0d7a drbd: always use_bmbv, ignore setting
Now that the peer may handle multi-bio EEs,
we can ignore the peer's limit,
and concentrate on the limits of the local IO stack.

This is safe accross drbd protocol versions,
as our queue_max_sectors() will be adjusted accordingly.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:03:05 +02:00
Lars Ellenberg
45bb912bd5 drbd: Allow drbd_epoch_entries to use multiple bios.
This should allow for better performance if the lower level IO stack
of the peers differs in limits exposed either via the queue,
or via some merge_bvec_fn.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 02:01:23 +02:00
Philipp Reisner
0ced55a3be drbd: Receiving of delay_probes
Delay_probes are new packets in the DRBD protocol, which allow
DRBD to know the current delay packets have on the data socket.
(relative to the meta data socket)

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:22:11 +02:00
Philipp Reisner
6b4388ac1f drbd: Added transmission faults to the fault injection code
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:19:51 +02:00
Philipp Reisner
e89b591c3a drbd: Implemented flags for the resize packet
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:15:44 +02:00
Adam Gandelman
3a11a48789 drbd: New handler: initial-split-brain
Some wish to be notified of all instances of split brain, not just those that
go unresolved.  The initial-split-brain handler is called to notify someone
upon  detection of all split brain conditions even if auto-recovery policies
are configured.

Signed-off-by: Adam Gandelman <adam.gandelman@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:13:33 +02:00
Lars Ellenberg
6666032ade drbd: check for corrupt or malicous sector addresses when receiving data
Even if it should never happen if the peer does behave, we need to
double check, and not even attempt access beyond end of device.
It usually would be caught by lower layers, resulting in "IO error",
but may also end up in the internal meta data area.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:09:57 +02:00
Philipp Reisner
c3fe30b0e7 drbd: cleanup: This code path to trigger a resync is no longer needed
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:09:13 +02:00
Lars Ellenberg
c3470cde57 drbd: fix potential protocol error
Don't forget to drain the digest in case we cannot satisfy a
checksum based resync or online-verify request.

It would additionally cause a protocoll error,
dropping the connection.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:07:38 +02:00
Lars Ellenberg
8d1894ebe4 drbd: remove bogus ASSERT
block_id may be ID_SYNCER,
as well as checksum based resync request magic, or online verify magic.

Let's just drop that ASSERT.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-05-18 01:06:59 +02:00
Jens Axboe
7407cf355f Merge branch 'master' into for-2.6.35
Conflicts:
	fs/block_dev.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-29 09:36:24 +02:00
Dmitry Monakhov
fbd9b09a17 blkdev: generalize flags for blkdev_issue_fn functions
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-28 19:47:36 +02:00
Philipp Reisner
7e2455c1a1 drbd: Terminate a connection early if sending the protocol fails
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-22 14:50:23 +02:00
Philipp Reisner
309d1608cc drbd: Reduce the time an empty resync takes usually
This mitigates changes introduced with commit:
http://git.drbd.org/?p=drbd-8.3.git;a=commit;h=4b6803a3276652da3737

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-03-11 16:09:03 +01:00
Lars Ellenberg
4589d7f829 drbd_disconnect: grab meta.socket mutex as well
Fixes a race and potential kernel panic if e.g. the worker was just
about to send a few P_RS_IS_IN_SYNC via the meta socket for checksum
based resync, while the receiver destroys the sockets in
drbd_disconnect.

To make sure no-one is using the meta socket,
it is not enough to stop the asender...
Grab the meta socket mutex before destroying it.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-03-11 16:02:45 +01:00
Lars Ellenberg
580b9767db drbd: fix broken state change after split-brain attach while connected
Situation:
we have diverging data sets, i.e. we had a split brain somewhen,
but currently are connected, one node diskless.

Then we try to attach that disk, figure it is consistent,
but has a diverging data set, we refuse to attach.

This led to strange state changes:
22:18:35 bb drbd1: peer( Unknown -> Primary ) conn( WFReportParams -> Connected) pdsk( DUnknown -> UpToDate )
22:19:30 bb drbd1: disk( Diskless -> Attaching )
22:19:30 bb drbd1: disk( Attaching -> Negotiating )
22:19:30 bb drbd1: drbd_sync_handshake:
22:19:30 bb drbd1: self 97BF25798B9D5222:F33D1F62ADE698DD:4269796F9D027C83:AC45D8B5C3C1BF93 bits:19449 flags:0
22:19:30 bb drbd1: peer 280DFB6E125465D3:F33D1F62ADE698DC:4269796F9D027C82:AC45D8B5C3C1BF93 bits:2575806 flags:0
22:19:30 bb drbd1: uuid_compare()=100 by rule 90
22:19:30 bb drbd1: Split-Brain detected, dropping connection!
22:19:30 bb drbd1: disk( Negotiating -> Diskless )

while the other side says:
22:19:30 aa drbd1: Split-Brain detected, dropping connection!
22:19:30 aa drbd1: Disk attach process on the peer node was aborted.
22:19:30 aa drbd1: conn( Connected -> TOO_LARGE ) pdsk( Diskless -> Consistent )

This should be fixed now.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-03-11 16:00:09 +01:00
Philipp Reisner
cf14c2e987 drbd: --dry-run option for drbdsetup net ( drbdadm -- --dry-run connect <res> )
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-03-11 15:51:23 +01:00
Dan Carpenter
d3db7b485a drbd: null dereference bug
epoch is always NULL here.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2010-01-25 18:01:41 +01:00
Philipp Reisner
a393db6f10 drbd: Allow online resizing of DRBD devices while peer not reachable (needs to be explicitly forced)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-01-12 10:02:46 +01:00
Johannes Thoma
b10d96cb9c drbd: Don't go into StandAlone mode when authentification failes because of network error
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-01-12 09:38:27 +01:00
Julia Lawall
2d1ee87d87 drivers/block/drbd: Correct NULL test
Test the just-allocated value for NULL rather than some other value.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,y;
statement S;
@@

x = \(kmalloc\|kcalloc\|kzalloc\)(...);
(
if ((x) == NULL) S
|
if (
-   y
+   x
       == NULL)
 S
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2010-01-04 11:51:41 +01:00
Philipp Reisner
367a8d7385 drbd: Silenced an assert that could triggered after changing write ordering method
Immediately after changing the write ordering method, the epoch can already
be finished at this point.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-12-31 09:33:09 +01:00
Huang Weiyi
820cd61a28 drbd: remove unused #include <linux/version.h>
Remove unused #include <linux/version.h>('s) in
  drivers/block/drbd/drbd_main.c
  drivers/block/drbd/drbd_receiver.c
  drivers/block/drbd/drbd_worker.c

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2009-12-21 13:41:16 +01:00
Philipp Reisner
d8c2a36b77 Fixed a regression in resync decission code drbd_uuid_compare() [Bugz 260]
Since 8.3.3 we fail to do the resync when a partial resynch is not
possible, but a full synch is necessary.

This regression was introduced with 7101539930c0a89146959e7a39c09ad9c3516434

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-11-24 18:13:28 +01:00
Lars Ellenberg
0b33a9164a add missing state change on corrupt packet header in drbd_recv_header
Otherwise the 'state fixup' in the receiver will change to Unconnected,
but the receiver will terminate itself, and any attempt at 'down'ing
that drbd later will block forever.

see also Bugz. #259

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-11-24 18:12:13 +01:00
Philipp Reisner
e656ec8ae2 Do not deadlock in drbd_disconnect() [bugz 258]
When there are many blocks on the fly (ua), and the AL gets into "starving"
mode (random IO, scattered all over the device), and the connections gets
interrupted, the receiver thread deadlocks in the drbd_disconnect() code path.

Affected are only nodes in Primary role.

The bug triggers most likely on system that mirror over "long distances"

Regression introduced shortly before 8.3.3
with git commit 31e0f1250f174ac1ee317f360943a0159e19edc8

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-11-04 15:21:03 +01:00
Lars Ellenberg
ad19bf6e54 fix grammar in printk
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-11-04 15:20:59 +01:00
Jens Axboe
6a0afdf58d drbd: remove tracing bits
They should be reimplemented in the current scheme.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01 21:17:58 +02:00
Lars Ellenberg
ab8fafc2e1 dropping unneeded include autoconf.h
It is force-included on the gcc command line since at least 2.6.15.
Explicit include lines seem to break compilation now in certain configurations.

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
2009-10-01 21:17:54 +02:00
Philipp Reisner
b411b3637f The DRBD driver
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2009-10-01 21:17:49 +02:00