Commit Graph

151 Commits

Author SHA1 Message Date
Alexander Aring
8d614a4457 fs: dlm: remove timeout from dlm_user_adopt_orphan
Remove the unused timeout parameter from dlm_user_adopt_orphan().

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:53 -05:00
Alexander Aring
2bb2a3d66c fs: dlm: remove waiter warnings
This patch removes warning messages that could be logged when
remote requests had been waiting on a reply message for some timeout
period (which could be set through configfs, but was rarely enabled.)
The improved midcomms layer now carefully tracks all messages and
replies, and logs much more useful messages if there is an actual
problem.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:57:52 -05:00
Alexander Aring
5d92a30e90 fs: dlm: add resource name to tracepoints
This patch adds the resource name to dlm tracepoints.  The name
usually comes through the lkb_resource, but in some cases a resource
may not yet be associated with an lkb, in which case the name and
namelen parameters are used.

It should be okay to access the lkb_resource and the res_name field at
the time when the tracepoint is invoked. The resource is assigned to a
lkb and it's reference is being held during the tracepoint call. During
this time the resource cannot be freed. Also a lkb will never switch
its assigned resource. The name of a dlm_rsb is assigned at creation
time and should never be changed during runtime as well.

The TP_printk() call uses always a hexadecimal string array
representation for the resource name (which is not necessarily ascii.)

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-06-24 11:53:09 -05:00
Alexander Aring
8e51ec6146 dlm: use kref_put_lock in __put_lkb
This patch will optimize __put_lkb() by using kref_put_lock(). The
function kref_put_lock() will only take the lock if the reference is
going to be zero, if not the lock will never be held.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-05-02 11:23:49 -05:00
Alexander Aring
9502a7f688 dlm: use kref_put_lock in put_rsb
This patch will optimize put_rsb() by using kref_put_lock(). The
function kref_put_lock() will only take the lock if the reference is
going to be zero, if not the lock will never be held.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-05-02 11:22:56 -05:00
Alexander Aring
0ccc106052 dlm: remove unnecessary error assign
This patch removes unnecessary error assigns to 0 at places we know that
error is zero because it was checked on non-zero before.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-05-02 11:22:15 -05:00
Alexander Aring
1689c16913 dlm: fix missing lkb refcount handling
We always call hold_lkb(lkb) if we increment lkb->lkb_wait_count.
So, we always need to call unhold_lkb(lkb) if we decrement
lkb->lkb_wait_count. This patch will add missing unhold_lkb(lkb) if we
decrement lkb->lkb_wait_count. In case of setting lkb->lkb_wait_count to
zero we need to countdown until reaching zero and call unhold_lkb(lkb).
The waiters list unhold_lkb(lkb) can be removed because it's done for
the last lkb_wait_count decrement iteration as it's done in
_remove_from_waiters().

This issue was discovered by a dlm gfs2 test case which use excessively
dlm_unlock(LKF_CANCEL) feature. Probably the lkb->lkb_wait_count value
never reached above 1 if this feature isn't used and so it was not
discovered before.

The testcase ended in a rsb on the rsb keep data structure with a
refcount of 1 but no lkb was associated with it, which is itself
an invalid behaviour. A side effect of that was a condition in which
the dlm was sending remove messages in a looping behaviour. With this
patch that has not been reproduced.

Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-05-02 11:15:59 -05:00
Jakob Koschel
dc1acd5c94 dlm: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean [1].

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:03:14 -05:00
Jakob Koschel
c490b3afaa dlm: remove usage of list iterator for list_add() after the loop body
In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer to point to the found element [1].

Before, the code implicitly used the head when no element was found
when using &pos->list. Since the new variable is only set if an
element was found, the list_add() is performed within the loop
and only done after the loop if it is done on the list head directly.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:03:13 -05:00
Alexander Aring
ba58995909 dlm: fix pending remove if msg allocation fails
This patch unsets ls_remove_len and ls_remove_name if a message
allocation of a remove messages fails. In this case we never send a
remove message out but set the per ls ls_remove_len ls_remove_name
variable for a pending remove. Unset those variable should indicate
possible waiters in wait_pending_remove() that no pending remove is
going on at this moment.

Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:03:09 -05:00
Alexander Aring
f6f7418357 dlm: fix wake_up() calls for pending remove
This patch move the wake_up() call at the point when a remove message
completed. Before it was only when a remove message was going to be
sent. The possible waiter in wait_pending_remove() waits until a remove
is done if the resource name matches with the per ls variable
ls->ls_remove_name. If this is the case we must wait until a pending
remove is done which is indicated if DLM_WAIT_PENDING_COND() returns
false which will always be the case when ls_remove_len and
ls_remove_name are unset to indicate that a remove is not going on
anymore.

Fixes: 21d9ac1a53 ("fs: dlm: use event based wait for pending remove")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:03:05 -05:00
Alexander Aring
401597485c dlm: cleanup lock handling in dlm_master_lookup
This patch will remove the following warning by sparse:

fs/dlm/lock.c:1049:9: warning: context imbalance in 'dlm_master_lookup' - different lock contexts for basic block

I tried to find any issues with the current handling and I did not find
any. However it is hard to follow the lock handling in this area of
dlm_master_lookup() and I suppose that sparse cannot realize that there
are no issues. The variable "toss_list" makes it really hard to follow
the lock handling because if it's set the rsb lock/refcount isn't held
but the ls->ls_rsbtbl[b].lock is held and this is one reason why the rsb
lock/refcount does not need to be held. If it's not set the
ls->ls_rsbtbl[b].lock is not held but the rsb lock/refcount is held. The
indicator of toss_list will be used to store the actual lock state.
Another possibility is that a retry can happen and then it's hard to
follow the specific code part. I did not find any issues but sparse
cannot realize that there are no issues.

To make it more easier to understand for developers and sparse as well,
we remove the toss_list variable which indicates a specific lock state
and move handling in between of this lock state in a separate function.
This function can be called now in case when the initial lock states are
taken which was previously signalled if toss_list was set or not. The
advantage here is that we can release all locks/refcounts in mostly the
same code block as it was taken.

Afterwards sparse had no issues to figure out that there are no problems
with the current lock behaviour.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:58 -05:00
Alexander Aring
e91ce03b27 dlm: remove found label in dlm_master_lookup
This patch cleanups a not necessary label found which can be replaced by
a proper else handling to jump over a specific code block.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:54 -05:00
Alexander Aring
00e99ccde7 dlm: use __le types for dlm messages
This patch changes to use __le types directly in the dlm message
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:37 -05:00
Alexander Aring
2f9dbeda8d dlm: use __le types for rcom messages
This patch changes to use __le types directly in the dlm rcom
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:32 -05:00
Alexander Aring
3428785a65 dlm: use __le types for dlm header
This patch changes to use __le types directly in the dlm header
structure which is casted at the right dlm message buffer positions.

The main goal what is reached here is to remove sparse warnings
regarding to host to little byte order conversion or vice versa. Leaving
those sparse issues ignored and always do it in out/in functionality
tends to leave it unknown in which byte order the variable is being
handled.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:02:28 -05:00
Alexander Aring
67e4d8c51d dlm: fix missing check in validate_lock_args
This patch adds a additional check if lkb->lkb_wait_count is non zero as
it is done in validate_unlock_args() to check if any operation is in
progress. While on it add a comment taken from validate_unlock_args() to
signal what the check is doing.

There might be no changes because if lkb->lkb_wait_type is non zero
implies that lkb->lkb_wait_count is non zero. However we should add the
check as it does validate_unlock_args().

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2022-04-06 14:01:49 -05:00
Alexander Aring
21d9ac1a53 fs: dlm: use event based wait for pending remove
This patch will use an event based waitqueue to wait for a possible clash
with the ls_remove_name field of dlm_ls instead of doing busy waiting.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-12-07 12:42:26 -06:00
Alexander Aring
6c2e3bf68f fs: dlm: filter user dlm messages for kernel locks
This patch fixes the following crash by receiving a invalid message:

[  160.672220] ==================================================================
[  160.676206] BUG: KASAN: user-memory-access in dlm_user_add_ast+0xc3/0x370
[  160.679659] Read of size 8 at addr 00000000deadbeef by task kworker/u32:13/319
[  160.681447]
[  160.681824] CPU: 10 PID: 319 Comm: kworker/u32:13 Not tainted 5.14.0-rc2+ #399
[  160.683472] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.14.0-1.module+el8.6.0+12648+6ede71a5 04/01/2014
[  160.685574] Workqueue: dlm_recv process_recv_sockets
[  160.686721] Call Trace:
[  160.687310]  dump_stack_lvl+0x56/0x6f
[  160.688169]  ? dlm_user_add_ast+0xc3/0x370
[  160.689116]  kasan_report.cold.14+0x116/0x11b
[  160.690138]  ? dlm_user_add_ast+0xc3/0x370
[  160.690832]  dlm_user_add_ast+0xc3/0x370
[  160.691502]  _receive_unlock_reply+0x103/0x170
[  160.692241]  _receive_message+0x11df/0x1ec0
[  160.692926]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  160.693700]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  160.694427]  ? lock_acquire+0x175/0x400
[  160.695058]  ? do_purge.isra.51+0x200/0x200
[  160.695744]  ? lock_acquired+0x360/0x5d0
[  160.696400]  ? lock_contended+0x6a0/0x6a0
[  160.697055]  ? lock_release+0x21d/0x5e0
[  160.697686]  ? lock_is_held_type+0xe0/0x110
[  160.698352]  ? lock_is_held_type+0xe0/0x110
[  160.699026]  ? ___might_sleep+0x1cc/0x1e0
[  160.699698]  ? dlm_wait_requestqueue+0x94/0x140
[  160.700451]  ? dlm_process_requestqueue+0x240/0x240
[  160.701249]  ? down_write_killable+0x2b0/0x2b0
[  160.701988]  ? do_raw_spin_unlock+0xa2/0x130
[  160.702690]  dlm_receive_buffer+0x1a5/0x210
[  160.703385]  dlm_process_incoming_buffer+0x726/0x9f0
[  160.704210]  receive_from_sock+0x1c0/0x3b0
[  160.704886]  ? dlm_tcp_shutdown+0x30/0x30
[  160.705561]  ? lock_acquire+0x175/0x400
[  160.706197]  ? rcu_read_lock_sched_held+0xa1/0xd0
[  160.706941]  ? rcu_read_lock_bh_held+0xb0/0xb0
[  160.707681]  process_recv_sockets+0x32/0x40
[  160.708366]  process_one_work+0x55e/0xad0
[  160.709045]  ? pwq_dec_nr_in_flight+0x110/0x110
[  160.709820]  worker_thread+0x65/0x5e0
[  160.710423]  ? process_one_work+0xad0/0xad0
[  160.711087]  kthread+0x1ed/0x220
[  160.711628]  ? set_kthread_struct+0x80/0x80
[  160.712314]  ret_from_fork+0x22/0x30

The issue is that we received a DLM message for a user lock but the
destination lock is a kernel lock. Note that the address which is trying
to derefence is 00000000deadbeef, which is in a kernel lock
lkb->lkb_astparam, this field should never be derefenced by the DLM
kernel stack. In case of a user lock lkb->lkb_astparam is lkb->lkb_ua
(memory is shared by a union field). The struct lkb_ua will be handled
by the DLM kernel stack but on a kernel lock it will contain invalid
data and ends in most likely crashing the kernel.

It can be reproduced with two cluster nodes.

node 2:
dlm_tool join test
echo "862 fooobaar 1 2 1" > /sys/kernel/debug/dlm/test_locks
echo "862 3 1" > /sys/kernel/debug/dlm/test_waiters

node 1:
dlm_tool join test

python:
foo = DLM(h_cmd=3, o_nextcmd=1, h_nodeid=1, h_lockspace=0x77222027, \
          m_type=7, m_flags=0x1, m_remid=0x862, m_result=0xFFFEFFFE)
newFile = open("/sys/kernel/debug/dlm/comms/2/rawmsg", "wb")
newFile.write(bytes(foo))

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
63eab2b00b fs: dlm: add lkb waiters debugfs functionality
This patch adds functionality to put a lkb to the waiters state. It can
be useful to combine this feature with the "rawmsg" debugfs
functionality. It will bring the DLM lkb into a state that a message
will be parsed by the kernel.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
5054e79de9 fs: dlm: add lkb debugfs functionality
This patch adds functionality to add an lkb during runtime. This is a
highly debugging feature only, wrong input can crash the kernel. It is a
early state feature as well. The goal is to provide a user interface for
manipulate dlm state and combine it with the rawmsg feature. It is
debugfs functionality, we don't care about UAPI breakage. Even it's
possible to add lkb's/rsb's which could never be exists in such wat by
using normal DLM operation. The user of this interface always need to
think before using this feature, not every crash which happens can really
occur during normal dlm operation.

Future there should be more functionality to add a more realistic lkb
which reflects normal DLM state inside the kernel. For now this is
enough.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
75d25ffe38 fs: dlm: allow create lkb with specific id range
This patch adds functionality to add a lkb with a specific id range.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
f1d3b8f91d fs: dlm: initial support for tracepoints
This patch adds initial support for dlm tracepoints. It will introduce
tracepoints to dlm main functionality dlm_lock()/dlm_unlock() and their
complete ast() callback or blocking bast() callback.

The lock/unlock functionality has a start and end tracepoint, this is
because there exists a race in case if would have a tracepoint at the
end position only the complete/blocking callbacks could occur before. To
work with eBPF tracing and using their lookup hash functionality there
could be problems that an entry was not inserted yet. However use the
start functionality for hash insert and check again in end functionality
if there was an dlm internal error so there is no ast callback. In further
it might also that locks with local masters will occur those callbacks
immediately so we must have such functionality.

I did not make everything accessible yet, although it seems eBPF can be
used to access a lot of internal datastructures if it's aware of the
struct definitions of the running kernel instance. We still can change
it, if you do eBPF experiments e.g. time measurements between lock and
callback functionality you can simple use the local lkb_id field as hash
value in combination with the lockspace id if you have multiple
lockspaces. Otherwise you can simple use trace-cmd for some functionality,
e.g. `trace-cmd record -e dlm` and `trace-cmd report` afterwards.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-11-02 14:39:20 -05:00
Alexander Aring
8e2e40860c fs: dlm: add union in dlm header for lockspace id
This patch adds union inside the lockspace id to handle it also for
another use case for a different dlm command.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25 09:22:20 -05:00
Alexander Aring
a070a91cf1 fs: dlm: add more midcomms hooks
This patch prepares hooks to redirect to the midcomms layer which will
be used by the midcomms re-transmit handling.

There exists the new concept of stateless buffers allocation and
commits. This can be used to bypass the midcomms re-transmit handling. It
is used by RCOM_STATUS and RCOM_NAMES messages, because they have their
own ping-like re-transmit handling. As well these two messages will be
used to determine the DLM version per node, because these two messages
are per observation the first messages which are exchanged.

Cluster manager events for node membership are added to add support for
half-closed connections in cases that the peer connection get to
an end of file but DLM still holds membership of the node. In
this time DLM can still trigger new message which we should allow. After
the cluster manager node removal event occurs it safe to close the
connection.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-05-25 09:22:20 -05:00
Alexander Aring
e1a7cbce53 fs: dlm: use GFP_ZERO for page buffer
This patch uses GFP_ZERO for allocate a page for the internal dlm
sending buffer allocator instead of calling memset zero after every
allocation. An already allocated space will never be reused again.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2021-03-09 08:56:42 -06:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Thomas Gleixner
2522fe45a1 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193
Based on 1 normalized pattern(s):

  this copyrighted material is made available to anyone wishing to use
  modify copy or redistribute it subject to the terms and conditions
  of the gnu general public license v 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 45 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528170027.342746075@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-30 11:29:21 -07:00
Vasily Averin
d47b41acee dlm: memory leaks on error path in dlm_user_request()
According to comment in dlm_user_request() ua should be freed
in dlm_free_lkb() after successful attach to lkb.

However ua is attached to lkb not in set_lock_args() but later,
inside request_lock().

Fixes 597d0cae0f ("[DLM] dlm: user locks")
Cc: stable@kernel.org # 2.6.19

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15 09:57:22 -06:00
Vasily Averin
c0174726c3 dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
Fixes 6d40c4a708 ("dlm: improve error and debug messages")
Cc: stable@kernel.org # 3.5

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15 09:57:22 -06:00
Vasily Averin
23851e978f dlm: possible memory leak on error path in create_lkb()
Fixes 3d6aa675ff ("dlm: keep lkbs in idr")
Cc: stable@kernel.org # 3.1

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2018-11-15 09:57:22 -06:00
David Teigland
9250e52359 dlm: remove dlm_send_rcom_lookup_dump
This function was only for debugging.  It would be
called in a condition that should not happen, and
should probably have been removed from the final
version of the original commit.

Remove it because it does mutex lock under spin lock.

Signed-off-by: David Teigland <teigland@redhat.com>
2017-10-09 09:29:31 -05:00
tsutomu.owa@toshiba.co.jp
294e7e4587 DLM: fix conversion deadlock when DLM_LKF_NODLCKWT flag is set
When the DLM_LKF_NODLCKWT flag was set, even if conversion deadlock
was detected, the caller of can_be_granted() was unknown.
We change the behavior of can_be_granted() and change it to detect
conversion deadlock regardless of whether the DLM_LKF_NODLCKWT flag
is set or not. And depending on whether the DLM_LKF_NODLCKWT flag
is set or not, we change the behavior at the caller of can_be_granted().

This fix has no effect except when using DLM_LKF_NODLCKWT flag.
Currently, ocfs2 uses the DLM_LKF_NODLCKWT flag and does not expect a
cancel operation from conversion deadlock when calling dlm_lock().
ocfs2 is implemented to perform a cancel operation by requesting
BASTs (callback).

Signed-off-by: Tadashi Miyauchi <miyauchi@toshiba-tops.co.jp>
Signed-off-by: Tsutomu Owa <tsutomu.owa@toshiba.co.jp>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-09-25 12:45:21 -05:00
Markus Elfring
0d37eca752 dlm: Delete an error message for a failed memory allocation in dlm_recover_waiters_pre()
Omit an extra message for a memory allocation failure in this function.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07 11:23:09 -05:00
Markus Elfring
102e67d4e3 dlm: Improve a size determination in dlm_recover_waiters_pre()
Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07 11:23:09 -05:00
Markus Elfring
fbb1008151 dlm: Use kcalloc() in dlm_scan_waiters()
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kcalloc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David Teigland <teigland@redhat.com>
2017-08-07 11:23:09 -05:00
Thomas Gleixner
1f3a8e49d8 ktime: Get rid of ktime_equal()
No point in going through loops and hoops instead of just comparing the
values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:23 +01:00
Thomas Gleixner
8b0e195314 ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
David Teigland
2ab4bd8ea3 dlm: adopt orphan locks
A process may exit, leaving an orphan lock in the lockspace.
This adds the capability for another process to acquire the
orphan lock.  Acquiring the orphan just moves the lock from
the orphan list onto the acquiring process's list of locks.

An adopting process must specify the resource name and mode
of the lock it wants to adopt.  If a matching lock is found,
the lock is moved to the caller's 's list of locks, and the
lkid of the lock is returned like the lkid of a new lock.

If an orphan with a different mode is found, then -EAGAIN is
returned.  If no orphan lock is found on the resource, then
-ENOENT is returned.  No async completion is used because
the result is immediately available.

Also, when orphans are purged, allow a zero nodeid to refer
to the local nodeid so the caller does not need to look up
the local nodeid.

Signed-off-by: David Teigland <teigland@redhat.com>
2014-11-19 14:48:02 -06:00
David Teigland
075f01775f dlm: use INFO for recovery messages
The log messages relating to the progress of recovery
are minimal and very often useful.  Change these to
the KERN_INFO level so they are always available.

Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-14 11:54:44 -06:00
Dan Carpenter
e8243f32f2 dlm: silence a harmless use after free warning
We pass the freed "r" pointer back to the caller.  It's harmless but it
upsets the static checkers.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2014-02-12 15:44:03 -06:00
Bart Van Assche
cfa805f6f1 dlm: Avoid LVB truncation
For lockspaces with an LVB length above 64 bytes, avoid truncating
the LVB while exchanging it with another node in the cluster.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: David Teigland <teigland@redhat.com>
2013-06-26 11:38:02 -05:00
Tejun Heo
2a86b3e74f dlm: convert to idr_alloc()
Convert to the much saner new idr interface.  Error return values from
recover_idr_add() mix -1 and -errno.  The conversion doesn't change
that but it looks iffy.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:19 -08:00
David Teigland
f117228346 dlm: avoid scanning unchanged toss lists
Keep track of whether a toss list contains any
shrinkable rsbs.  If not, dlm_scand can avoid
scanning the list for rsbs to shrink.  Unnecessary
scanning can otherwise waste a lot of time because
the toss lists can contain a large number of rsbs
that are non-shrinkable (directory records).

Signed-off-by: David Teigland <teigland@redhat.com>
2013-01-07 12:02:49 -06:00
David Teigland
da8c66638a dlm: fix lvb invalidation conditions
When a node is removed that held a PW/EX lock, the
existing master node should invalidate the lvb on the
resource due to the purged lock.

Previously, the existing master node was invalidating
the lvb if it found only NL/CR locks on the resource
during recovery for the removed node.  This could lead
to cases where it invalidated the lvb and shouldn't
have, or cases where it should have invalidated and
didn't.

When recovery selects a *new* master node for a
resource, and that new master finds only NL/CR locks
on the resource after lock recovery, it should
invalidate the lvb.  This case was handled correctly
(but was incorrectly applied to the existing master
case also.)

When a process exits while holding a PW/EX lock,
the lvb on the resource should be invalidated.
This was not happening.

The lvb contents and VALNOTVALID flag should be
recovered before granting locks in recovery so that
the recovered lvb state is provided in the callback.
The lvb was being recovered after the lock was granted.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-11-16 11:20:42 -06:00
David Teigland
96006ea6d4 dlm: fix missing dir remove
I don't know exactly how, but in some cases, a dir
record is not removed, or a new one is created when
it shouldn't be.  The result is that the dir node
lookup returns a master node where the rsb does not
exist.  In this case, The master node will repeatedly
return -EBADR for requests, and the lock requests will
be stuck.

Until all possible ways for this to happen can be
eliminated, a simple and effective way to recover from
this situation is for the supposed master node to send
a standard remove message to the dir node when it
receives a request for a resource it has no rsb for.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:24:43 -05:00
David Teigland
c503a62103 dlm: fix conversion deadlock from recovery
The process of rebuilding locks on a new master during
recovery could re-order the locks on the convert queue,
creating an "in place" conversion deadlock that would
not be resolved.  Fix this by not considering queue
order when granting conversions after recovery.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:18:22 -05:00
David Teigland
05c32f47bf dlm: fix race between remove and lookup
It was possible for a remove message on an old
rsb to be sent after a lookup message on a new
rsb, where the rsbs were for the same resource
name.  This could lead to a missing directory
entry for the new rsb.

It is fixed by keeping a copy of the resource
name being removed until after the remove has
been sent.  A lookup checks if this in-progress
remove matches the name it is looking up.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:18:01 -05:00
David Teigland
c04fecb4d9 dlm: use rsbtbl as resource directory
Remove the dir hash table (dirtbl), and use
the rsb hash table (rsbtbl) as the resource
directory.  It has always been an unnecessary
duplication of information.

This improves efficiency by using a single rsbtbl
lookup in many cases where both rsbtbl and dirtbl
lookups were needed previously.

This eliminates the need to handle cases of rsbtbl
and dirtbl being out of sync.

In many cases there will be memory savings because
the dir hash table no longer exists.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-07-16 14:16:19 -05:00
David Teigland
4875647a08 dlm: fixes for nodir mode
The "nodir" mode (statically assign master nodes instead
of using the resource directory) has always been highly
experimental, and never seriously used.  This commit
fixes a number of problems, making nodir much more usable.

- Major change to recovery: recover all locks and restart
  all in-progress operations after recovery.  In some
  cases it's not possible to know which in-progess locks
  to recover, so recover all.  (Most require recovery
  in nodir mode anyway since rehashing changes most
  master nodes.)

- Change the way nodir mode is enabled, from a command
  line mount arg passed through gfs2, into a sysfs
  file managed by dlm_controld, consistent with the
  other config settings.

- Allow recovering MSTCPY locks on an rsb that has not
  yet been turned into a master copy.

- Ignore RCOM_LOCK and RCOM_LOCK_REPLY recovery messages
  from a previous, aborted recovery cycle.  Base this
  on the local recovery status not being in the state
  where any nodes should be sending LOCK messages for the
  current recovery cycle.

- Hold rsb lock around dlm_purge_mstcpy_locks() because it
  may run concurrently with dlm_recover_master_copy().

- Maintain highbast on process-copy lkb's (in addition to
  the master as is usual), because the lkb can switch
  back and forth between being a master and being a
  process copy as the master node changes in recovery.

- When recovering MSTCPY locks, flag rsb's that have
  non-empty convert or waiting queues for granting
  at the end of recovery.  (Rename flag from LOCKS_PURGED
  to RECOVER_GRANT and similar for the recovery function,
  because it's not only resources with purged locks
  that need grant a grant attempt.)

- Replace a couple of unnecessary assertion panics with
  error messages.

Signed-off-by: David Teigland <teigland@redhat.com>
2012-05-02 14:15:27 -05:00