In case a connection transitions into C_TIMEOUT within the timer
function (request_timer_fn()) we need to make sure that the receiver
thread (potentially running on a different CPU) sees the updated
cstate later on.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
We want to store in persistent meta data what the peer DRBD can handle,
which, due to spreading requests to multiple bios,
may be more than its backing device can handle.
Otherwise, if a disconnected Primary temporarily loses access to its local data
as well, we may accidentally shrink the max-bio setting, portentially causing
already assembled, but not yet processed, application bios to be spuriously
failed due to device limits.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Allow the user of REQ_DISCARD.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
We plan to use genl_family->parallel_ops = true in the future,
but need to review all possible interactions first.
For now, only selectively drop genl_lock() in drbd_set_role(),
instead serializing on our own internal resource->conf_update mutex.
We now can be promoted/demoted on many resources in parallel,
which may significantly improve cluster failover times
when fencing is required.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Because all administrative requests via genetlink have been globally
serialized via genl_lock(), we used to have one static struct
drbd_config_context "admin context".
Move this on-stack to the respective callback functions.
This will allow us to selectively drop the genl_lock()
(or use genl_family->parallel_ops) in the future.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
If a user forces the operation he takes the blame in case
the peer does not have enough space. No reason to dey this...
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Actually we are clearing the susp_fen flag if we are not going
to call a fencing handler.
For setting the susp_fen flag needs to be edge-triggerd, and not
level triggered.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Right now every resource has exactly one connection. But we are preparing
for dynamic connections. I.e. in the future thre can be resources without
connections.
However smatch points this out as 'variable dereferenced before check',
which is correct.
This issue was introduced in
drbd: get_one_status(): Iterate over resource->devices instead of connection->peer_devices
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The new function can flush any work queue, not just the work queue of the data
socket of a connection.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Also change drbd_adm_connect() to expect a resource after it requested one.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
in drbd_adm_down(), drbd_create_device() and drbd_set_role()
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
The implicit dependency on a variable inside the macro is problematic.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
With the polymorphic drbd_() macros, we no longer need the connection
specific variants.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
DRBD was using dev_err() and similar all over the code; instead of having to
write dev_err(disk_to_dev(device->vdisk), ...) to convert a drbd_device into a
kernel device, a DEV macro was used which implicitly references the device
variable. This is terrible; introduce separate drbd_err() and similar macros
with an explicit device parameter instead.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Let connection->peer_devices point to peer devices; connection->volumes was
pointing to devices.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
So far, connections and resources always come in pairs, but in the future with
multiple connections per resource, the names will stick with the resources.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
This allows to access the volumes of a resource by number.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
In a first step, each resource has exactly one connection, and both objects are
allocated at the same time. The final result will be one resource and zero or
more connections.
Only allow to delete a resource if all its connections are C_STANDALONE.
Stop the worker threads of all connections early enough.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
In a setup where a device (aka volume) can replicate to multiple peers and one
connection can be shared between multiple devices, we need separate objects to
represent devices on peer nodes and network connections.
As a first step to introduce multiple connections per device, give each
drbd_device object a single drbd_peer_device object which connects it to a
drbd_connection object.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
sed -i -e 's:all_tconn:connections:g' -e 's:tconn:connection:g'
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
sed -i -e 's:\<drbd_conf\>:drbd_device:g'
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Keep the protocol definitions separate from the kernel code; they are useful in
their own right.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Mark functions conn_khelper(), nla_put_drbd_cfg_context(),
nla_put_status_info() and get_one_status() as static in drbd/drbd_nl.c
because they are not used outside this file.
This eliminates the following warnings in drbd/drbd_nl.c:
drivers/block/drbd/drbd_nl.c:365:5: warning: no previous prototype for ‘conn_khelper’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_nl.c:2727:5: warning: no previous prototype for ‘nla_put_drbd_cfg_context’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_nl.c:2753:5: warning: no previous prototype for ‘nla_put_status_info’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_nl.c:2895:5: warning: no previous prototype for ‘get_one_status’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
For a long time, the receiving side has spread "too large" incoming
requests over multiple bios. No need to shrink our max_bio_size
(max_hw_sectors) if the peer is reconfigured to use a different storage.
The problem manifests itself if we are not the top of the device stack
(DRBD is used a LVM PV).
A hardware reconfiguration on the peer may cause the supported
max_bio_size to shrink, and the connection handshake would now
unnecessarily shrink the max_bio_size on the active node.
There is no way to notify upper layers that they have to "re-stack"
their limits. So they won't notice at all, and may keep submitting bios
that are suddenly considered "too large for device".
We already check for compatibility and ignore changes on the peer,
the code only was masked out unless we have a fully established connection.
We just need to allow it a bit earlier during the handshake.
Also consider max_hw_sectors in our merge bvec function, just in case.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Online adding of new minors with freshly created meta data
to an resource with an established connection failed, with a
wrong state transition on one side on one side of the new minor.
Freshly created meta-data has a la_size (last agreed size) of 0.
When we online add such devices, the code wrongly got into
the code path for resyncing new storage that was added while
the disk was detached.
Fixed that by making the GREW from ZERO a special case.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow to change the AL layout with an resize operation. For that
the reisze command gets two new fields: al_stripes and al_stripe_size.
In order to make the operation crash save:
1) Lock out all IO and MD-IO
2) Write the super block with MDF_PRIMARY_IND clear
3) write the bitmap to the new location (all zeros, since
we allow only while connected)
4) Initialize the new AL-area
5) Write the super block with the restored MDF_PRIMARY_IND.
6) Unfreeze all IO
Since the AL-layout has no influence on the protocol, this operation
needs to be beforemed on both sides of a resource (if intended).
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In case the connection was established and lost again before
the a fence-peer handler returns, ignore the exit code of this
instance. (And use the exit code of the later started instance)
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We validated resync_after dependencies, if changed via disk-options.
But we did not validate them when first created via attach.
We also did not check or cleanup dependencies that used to be correct,
but now point to meanwhile removed minor devices.
If the drbd_resync_after_valid() validation in disk-options tried to
follow a dependency chain in this way, this could lead to NULL pointer
dereference.
Validate resync_after settings in drbd_adm_attach() already, as well as
in drbd_adm_disk_opts(), and and only reject dependency loops.
Depending on non-existing disks is allowed and equivalent to no dependency.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The issue was that if the connection broke while we did the
gracefull state change to C_DISCONNECTING (C_TEARDOWN), then
we returned a success code from the state engine. (SS_CW_NO_NEED)
The result of that is that we missed to call the fence-peer
script in such a case.
Fixed that by introducing a new error code (SS_OUTDATE_WO_CONN).
This one should never reach back into user space.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Patch best viewed with git diff --ignore-space-change.
Now that we attempt the fallback to local bitmap operation
only when disconnected, we can safely drop the extra "silent"
state request from both invalidate and invalidate-remote.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>