I have evidence of an Linux NFS client getting NFS4ERR_BAD_SEQID to a
v4.0 LOCK request to a Linux server (which had fixed the problem with
RELEASE_LOCKOWNER bug fixed).
The LOCK request presented a "new" lock owner so there are two seq ids
in the request: that for the open file, and that for the new lock.
Given the context I am confident that the new lock owner was reported to
have the wrong seqid. As lock owner identifiers are reused, the server
must still have a lock owner active which the client thinks is no longer
active.
I wasn't able to determine a root-cause but the simplest fix seems to be
to ensure lock owners are always unique much as open owners are (thanks
to a time stamp). The easiest way to ensure uniqueness is with a 64bit
counter for each server. That will never cycle (if updated once a
nanosecond the last 584 years. A single NFS server would not handle
open/lock requests nearly that fast, and a Linux node is unlikely to
have an uptime approaching that).
This patch removes the 2 ida and instead uses a per-server
atomic64_t to provide uniqueness.
Note that the lock owner already encodes the id as 64 bits even though
it is a 32bit value. So changing to a 64bit value does not change the
encoding of the lock owner. The open owner encoding is now 4 bytes
larger.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>