Remove the unused timeout parameter from dlm_user_adopt_orphan().
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch removes warning messages that could be logged when
remote requests had been waiting on a reply message for some timeout
period (which could be set through configfs, but was rarely enabled.)
The improved midcomms layer now carefully tracks all messages and
replies, and logs much more useful messages if there is an actual
problem.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds the resource name to dlm tracepoints. The name
usually comes through the lkb_resource, but in some cases a resource
may not yet be associated with an lkb, in which case the name and
namelen parameters are used.
It should be okay to access the lkb_resource and the res_name field at
the time when the tracepoint is invoked. The resource is assigned to a
lkb and it's reference is being held during the tracepoint call. During
this time the resource cannot be freed. Also a lkb will never switch
its assigned resource. The name of a dlm_rsb is assigned at creation
time and should never be changed during runtime as well.
The TP_printk() call uses always a hexadecimal string array
representation for the resource name (which is not necessarily ascii.)
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will optimize __put_lkb() by using kref_put_lock(). The
function kref_put_lock() will only take the lock if the reference is
going to be zero, if not the lock will never be held.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will optimize put_rsb() by using kref_put_lock(). The
function kref_put_lock() will only take the lock if the reference is
going to be zero, if not the lock will never be held.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch removes unnecessary error assigns to 0 at places we know that
error is zero because it was checked on non-zero before.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
We always call hold_lkb(lkb) if we increment lkb->lkb_wait_count.
So, we always need to call unhold_lkb(lkb) if we decrement
lkb->lkb_wait_count. This patch will add missing unhold_lkb(lkb) if we
decrement lkb->lkb_wait_count. In case of setting lkb->lkb_wait_count to
zero we need to countdown until reaching zero and call unhold_lkb(lkb).
The waiters list unhold_lkb(lkb) can be removed because it's done for
the last lkb_wait_count decrement iteration as it's done in
_remove_from_waiters().
This issue was discovered by a dlm gfs2 test case which use excessively
dlm_unlock(LKF_CANCEL) feature. Probably the lkb->lkb_wait_count value
never reached above 1 if this feature isn't used and so it was not
discovered before.
The testcase ended in a rsb on the rsb keep data structure with a
refcount of 1 but no lkb was associated with it, which is itself
an invalid behaviour. A side effect of that was a condition in which
the dlm was sending remove messages in a looping behaviour. With this
patch that has not been reproduced.
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.
To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].
This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer to point to the found element [1].
Before, the code implicitly used the head when no element was found
when using &pos->list. Since the new variable is only set if an
element was found, the list_add() is performed within the loop
and only done after the loop if it is done on the list head directly.
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch unsets ls_remove_len and ls_remove_name if a message
allocation of a remove messages fails. In this case we never send a
remove message out but set the per ls ls_remove_len ls_remove_name
variable for a pending remove. Unset those variable should indicate
possible waiters in wait_pending_remove() that no pending remove is
going on at this moment.
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch move the wake_up() call at the point when a remove message
completed. Before it was only when a remove message was going to be
sent. The possible waiter in wait_pending_remove() waits until a remove
is done if the resource name matches with the per ls variable
ls->ls_remove_name. If this is the case we must wait until a pending
remove is done which is indicated if DLM_WAIT_PENDING_COND() returns
false which will always be the case when ls_remove_len and
ls_remove_name are unset to indicate that a remove is not going on
anymore.
Fixes: 21d9ac1a53 ("fs: dlm: use event based wait for pending remove")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will remove the following warning by sparse:
fs/dlm/lock.c:1049:9: warning: context imbalance in 'dlm_master_lookup' - different lock contexts for basic block
I tried to find any issues with the current handling and I did not find
any. However it is hard to follow the lock handling in this area of
dlm_master_lookup() and I suppose that sparse cannot realize that there
are no issues. The variable "toss_list" makes it really hard to follow
the lock handling because if it's set the rsb lock/refcount isn't held
but the ls->ls_rsbtbl[b].lock is held and this is one reason why the rsb
lock/refcount does not need to be held. If it's not set the
ls->ls_rsbtbl[b].lock is not held but the rsb lock/refcount is held. The
indicator of toss_list will be used to store the actual lock state.
Another possibility is that a retry can happen and then it's hard to
follow the specific code part. I did not find any issues but sparse
cannot realize that there are no issues.
To make it more easier to understand for developers and sparse as well,
we remove the toss_list variable which indicates a specific lock state
and move handling in between of this lock state in a separate function.
This function can be called now in case when the initial lock states are
taken which was previously signalled if toss_list was set or not. The
advantage here is that we can release all locks/refcounts in mostly the
same code block as it was taken.
Afterwards sparse had no issues to figure out that there are no problems
with the current lock behaviour.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch cleanups a not necessary label found which can be replaced by
a proper else handling to jump over a specific code block.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch changes to use __le types directly in the dlm message
structure which is casted at the right dlm message buffer positions.
The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch changes to use __le types directly in the dlm rcom
structure which is casted at the right dlm message buffer positions.
The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch changes to use __le types directly in the dlm header
structure which is casted at the right dlm message buffer positions.
The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds a additional check if lkb->lkb_wait_count is non zero as
it is done in validate_unlock_args() to check if any operation is in
progress. While on it add a comment taken from validate_unlock_args() to
signal what the check is doing.
There might be no changes because if lkb->lkb_wait_type is non zero
implies that lkb->lkb_wait_count is non zero. However we should add the
check as it does validate_unlock_args().
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch will use an event based waitqueue to wait for a possible clash
with the ls_remove_name field of dlm_ls instead of doing busy waiting.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch fixes the following crash by receiving a invalid message:
[ 160.672220] ==================================================================
[ 160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370
[ 160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319
[ 160.681447]
[ 160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399
[ 160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014
[ 160.685574] Workqueue: dlm_recv process_recv_sockets
[ 160.686721] Call Trace:
[ 160.687310] dump_stack_lvl+0x56/0x6f
[ 160.688169] ? dlm_user_add_ast+0xc3/0x370
[ 160.689116] kasan_report.cold.14+0x116/0x11b
[ 160.690138] ? dlm_user_add_ast+0xc3/0x370
[ 160.690832] dlm_user_add_ast+0xc3/0x370
[ 160.691502] _receive_unlock_reply+0x103/0x170
[ 160.692241] _receive_message+0x11df/0x1ec0
[ 160.692926] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 160.693700] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 160.694427] ? lock_acquire+0x175/0x400
[ 160.695058] ? do_purge.isra.51+0x200/0x200
[ 160.695744] ? lock_acquired+0x360/0x5d0
[ 160.696400] ? lock_contended+0x6a0/0x6a0
[ 160.697055] ? lock_release+0x21d/0x5e0
[ 160.697686] ? lock_is_held_type+0xe0/0x110
[ 160.698352] ? lock_is_held_type+0xe0/0x110
[ 160.699026] ? ___might_sleep+0x1cc/0x1e0
[ 160.699698] ? dlm_wait_requestqueue+0x94/0x140
[ 160.700451] ? dlm_process_requestqueue+0x240/0x240
[ 160.701249] ? down_write_killable+0x2b0/0x2b0
[ 160.701988] ? do_raw_spin_unlock+0xa2/0x130
[ 160.702690] dlm_receive_buffer+0x1a5/0x210
[ 160.703385] dlm_process_incoming_buffer+0x726/0x9f0
[ 160.704210] receive_from_sock+0x1c0/0x3b0
[ 160.704886] ? dlm_tcp_shutdown+0x30/0x30
[ 160.705561] ? lock_acquire+0x175/0x400
[ 160.706197] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 160.706941] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 160.707681] process_recv_sockets+0x32/0x40
[ 160.708366] process_one_work+0x55e/0xad0
[ 160.709045] ? pwq_dec_nr_in_flight+0x110/0x110
[ 160.709820] worker_thread+0x65/0x5e0
[ 160.710423] ? process_one_work+0xad0/0xad0
[ 160.711087] kthread+0x1ed/0x220
[ 160.711628] ? set_kthread_struct+0x80/0x80
[ 160.712314] ret_from_fork+0x22/0x30
The issue is that we received a DLM message for a user lock but the
destination lock is a kernel lock. Note that the address which is trying
to derefence is 00000000deadbeef, which is in a kernel lock
lkb->lkb_astparam, this field should never be derefenced by the DLM
kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua
(memory is shared by a union field). The struct lkb_ua will be handled
by the DLM kernel stack but on a kernel lock it will contain invalid
data and ends in most likely crashing the kernel.
It can be reproduced with two cluster nodes.
node 2:
dlm_tool join test
echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks
echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters
node 1:
dlm_tool join test
python:
foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \
m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE)
newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb")
newFile.write(bytes(foo))
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds functionality to put a lkb to the waiters state. It can
be useful to combine this feature with the "rawmsg" debugfs
functionality. It will bring the DLM lkb into a state that a message
will be parsed by the kernel.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds functionality to add an lkb during runtime. This is a
highly debugging feature only, wrong input can crash the kernel. It is a
early state feature as well. The goal is to provide a user interface for
manipulate dlm state and combine it with the rawmsg feature. It is
debugfs functionality, we don't care about UAPI breakage. Even it's
possible to add lkb's/rsb's which could never be exists in such wat by
using normal DLM operation. The user of this interface always need to
think before using this feature, not every crash which happens can really
occur during normal dlm operation.
Future there should be more functionality to add a more realistic lkb
which reflects normal DLM state inside the kernel. For now this is
enough.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds functionality to add a lkb with a specific id range.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds initial support for dlm tracepoints. It will introduce
tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their
complete ast() callback or blocking bast() callback.
The lock/unlock functionality has a start and end tracepoint, this is
because there exists a race in case if would have a tracepoint at the
end position only the complete/blocking callbacks could occur before. To
work with eBPF tracing and using their lookup hash functionality there
could be problems that an entry was not inserted yet. However use the
start functionality for hash insert and check again in end functionality
if there was an dlm internal error so there is no ast callback. In further
it might also that locks with local masters will occur those callbacks
immediately so we must have such functionality.
I did not make everything accessible yet, although it seems eBPF can be
used to access a lot of internal datastructures if it's aware of the
struct definitions of the running kernel instance. We still can change
it, if you do eBPF experiments e.g. time measurements between lock and
callback functionality you can simple use the local lkb_id field as hash
value in combination with the lockspace id if you have multiple
lockspaces. Otherwise you can simple use trace-cmd for some functionality,
e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch adds union inside the lockspace id to handle it also for
another use case for a different dlm command.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch prepares hooks to redirect to the midcomms layer which will
be used by the midcomms re-transmit handling.
There exists the new concept of stateless buffers allocation and
commits. This can be used to bypass the midcomms re-transmit handling. It
is used by RCOM_STATUS and RCOM_NAMES messages, because they have their
own ping-like re-transmit handling. As well these two messages will be
used to determine the DLM version per node, because these two messages
are per observation the first messages which are exchanged.
Cluster manager events for node membership are added to add support for
half-closed connections in cases that the peer connection get to
an end of file but DLM still holds membership of the node. In
this time DLM can still trigger new message which we should allow. After
the cluster manager node removal event occurs it safe to close the
connection.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This patch uses GFP_ZERO for allocate a page for the internal dlm
sending buffer allocator instead of calling memset zero after every
allocation. An already allocated space will never be reused again.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Based on 1 normalized pattern(s):
this copyrighted material is made available to anyone wishing to use
modify copy or redistribute it subject to the terms and conditions
of the gnu general public license v 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 45 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to comment in dlm_user_request() ua should be freed
in dlm_free_lkb() after successful attach to lkb.
However ua is attached to lkb not in set_lock_args() but later,
inside request_lock().
Fixes 597d0cae0f ("[DLM] dlm: user locks")
Cc: stable@kernel.org # 2.6.19
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
This function was only for debugging. It would be
called in a condition that should not happen, and
should probably have been removed from the final
version of the original commit.
Remove it because it does mutex lock under spin lock.
Signed-off-by: David Teigland <teigland@redhat.com>
When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock
was detected, the caller of can_be_granted() was unknown.
We change the behavior of can_be_granted() and change it to detect
conversion deadlock regardless of whether the DLM_LKF_NODLCKWT flag
is set or not. And depending on whether the DLM_LKF_NODLCKWT flag
is set or not, we change the behavior at the caller of can_be_granted().
This fix has no effect except when using DLM_LKF_NODLCKWT flag.
Currently, ocfs2 uses the DLM_LKF_NODLCKWT flag and does not expect a
cancel operation from conversion deadlock when calling dlm_lock().
ocfs2 is implemented to perform a cancel operation by requesting
BASTs (callback).
Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David Teigland <teigland@redhat.com>
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kcalloc".
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David Teigland <teigland@redhat.com>
No point in going through loops and hoops instead of just comparing the
values.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
A process may exit, leaving an orphan lock in the lockspace.
This adds the capability for another process to acquire the
orphan lock. Acquiring the orphan just moves the lock from
the orphan list onto the acquiring process's list of locks.
An adopting process must specify the resource name and mode
of the lock it wants to adopt. If a matching lock is found,
the lock is moved to the caller's 's list of locks, and the
lkid of the lock is returned like the lkid of a new lock.
If an orphan with a different mode is found, then -EAGAIN is
returned. If no orphan lock is found on the resource, then
-ENOENT is returned. No async completion is used because
the result is immediately available.
Also, when orphans are purged, allow a zero nodeid to refer
to the local nodeid so the caller does not need to look up
the local nodeid.
Signed-off-by: David Teigland <teigland@redhat.com>
The log messages relating to the progress of recovery
are minimal and very often useful. Change these to
the KERN_INFO level so they are always available.
Signed-off-by: David Teigland <teigland@redhat.com>
We pass the freed "r" pointer back to the caller. It's harmless but it
upsets the static checkers.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
For lockspaces with an LVB length above 64 bytes, avoid truncating
the LVB while exchanging it with another node in the cluster.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Teigland <teigland@redhat.com>
Convert to the much saner new idr interface. Error return values from
recover_idr_add() mix -1 and -errno. The conversion doesn't change
that but it looks iffy.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Keep track of whether a toss list contains any
shrinkable rsbs. If not, dlm_scand can avoid
scanning the list for rsbs to shrink. Unnecessary
scanning can otherwise waste a lot of time because
the toss lists can contain a large number of rsbs
that are non-shrinkable (directory records).
Signed-off-by: David Teigland <teigland@redhat.com>
When a node is removed that held a PW/EX lock, the
existing master node should invalidate the lvb on the
resource due to the purged lock.
Previously, the existing master node was invalidating
the lvb if it found only NL/CR locks on the resource
during recovery for the removed node. This could lead
to cases where it invalidated the lvb and shouldn't
have, or cases where it should have invalidated and
didn't.
When recovery selects a *new* master node for a
resource, and that new master finds only NL/CR locks
on the resource after lock recovery, it should
invalidate the lvb. This case was handled correctly
(but was incorrectly applied to the existing master
case also.)
When a process exits while holding a PW/EX lock,
the lvb on the resource should be invalidated.
This was not happening.
The lvb contents and VALNOTVALID flag should be
recovered before granting locks in recovery so that
the recovered lvb state is provided in the callback.
The lvb was being recovered after the lock was granted.
Signed-off-by: David Teigland <teigland@redhat.com>
I don't know exactly how, but in some cases, a dir
record is not removed, or a new one is created when
it shouldn't be. The result is that the dir node
lookup returns a master node where the rsb does not
exist. In this case, The master node will repeatedly
return -EBADR for requests, and the lock requests will
be stuck.
Until all possible ways for this to happen can be
eliminated, a simple and effective way to recover from
this situation is for the supposed master node to send
a standard remove message to the dir node when it
receives a request for a resource it has no rsb for.
Signed-off-by: David Teigland <teigland@redhat.com>
The process of rebuilding locks on a new master during
recovery could re-order the locks on the convert queue,
creating an "in place" conversion deadlock that would
not be resolved. Fix this by not considering queue
order when granting conversions after recovery.
Signed-off-by: David Teigland <teigland@redhat.com>
It was possible for a remove message on an old
rsb to be sent after a lookup message on a new
rsb, where the rsbs were for the same resource
name. This could lead to a missing directory
entry for the new rsb.
It is fixed by keeping a copy of the resource
name being removed until after the remove has
been sent. A lookup checks if this in-progress
remove matches the name it is looking up.
Signed-off-by: David Teigland <teigland@redhat.com>
Remove the dir hash table (dirtbl), and use
the rsb hash table (rsbtbl) as the resource
directory. It has always been an unnecessary
duplication of information.
This improves efficiency by using a single rsbtbl
lookup in many cases where both rsbtbl and dirtbl
lookups were needed previously.
This eliminates the need to handle cases of rsbtbl
and dirtbl being out of sync.
In many cases there will be memory savings because
the dir hash table no longer exists.
Signed-off-by: David Teigland <teigland@redhat.com>
The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used. This commit
fixes a number of problems, making nodir much more usable.
- Major change to recovery: recover all locks and restart
all in-progress operations after recovery. In some
cases it's not possible to know which in-progess locks
to recover, so recover all. (Most require recovery
in nodir mode anyway since rehashing changes most
master nodes.)
- Change the way nodir mode is enabled, from a command
line mount arg passed through gfs2, into a sysfs
file managed by dlm_controld, consistent with the
other config settings.
- Allow recovering MSTCPY locks on an rsb that has not
yet been turned into a master copy.
- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
from a previous, aborted recovery cycle. Base this
on the local recovery status not being in the state
where any nodes should be sending LOCK messages for the
current recovery cycle.
- Hold rsb lock around dlm_purge_mstcpy_locks() because it
may run concurrently with dlm_recover_master_copy().
- Maintain highbast on process-copy lkb's (in addition to
the master as is usual), because the lkb can switch
back and forth between being a master and being a
process copy as the master node changes in recovery.
- When recovering MSTCPY locks, flag rsb's that have
non-empty convert or waiting queues for granting
at the end of recovery. (Rename flag from LOCKS_PURGED
to RECOVER_GRANT and similar for the recovery function,
because it's not only resources with purged locks
that need grant a grant attempt.)
- Replace a couple of unnecessary assertion panics with
error messages.
Signed-off-by: David Teigland <teigland@redhat.com>