Lustre puts system errors (e.g., ENOTCONN) on wire as numbers
essentially specific to senders' architectures. While this is fine
for x86-only sites, where receivers share the same error number
definition with senders, problems will arise, however, for sites
involving multiple architectures with different error number
definitions. For instance, an ENOTCONN reply from a sparc server will
be put on wire as -57, which, for an x86 client, means EBADSLT
instead.
To solve the problem, this patch defines a set of network errors for
on-wire or on-disk uses. These errors correspond to a subset of the
x86 system errors and share the same number definition, maintaining
compatibility with existing x86 clients and servers.
Then, either error numbers could be translated at run time, or all
host errors going on wire could be replaced with network errors in the
code. This patch does the former by introducing both generic and
field-specific translation routines and calling them at proper places,
so that translations for existing fields are transparent.
(Personally, I tend to think the latter way might be worthwhile, as it
is more straightforward conceptually. Do we really need so many
different errors? Should errors returned by kernel routines really be
passed up and eventually put on wire? There could even be security
implications in that.)
Thank Fujitsu for the original idea and their contributions that make
this available upstream.
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2743
Lustre-change: http://review.whamcloud.com/5577
Signed-off-by: Li Wei <wei.g.li@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The race is result of use-after-free situation:
~ ptlrpc_stop_pinger() ~ ptlrpc_pinger_main()
---------------------------------------------------------------
thread_set_flags(SVC_STOPPING)
cfs_waitq_signal(pinger_thread) ...
... thread_set_flags(SVC_STOPPED)
l_wait_event(thread_is_stopped)
OBD_FREE_PTR(pinger_thread)
... cfs_waitq_signal(pinger_thread)
---------------------------------------------------------------
The memory used by pinger_thread might have been freed and
reallocated to something else, when ptlrpc_pinger_main()
used it in cvs_waitq_signal().
Signed-off-by: Li Wei <wei.g.li@intel.com>
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3032
Lustre-change: http://review.whamcloud.com/6040
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Print the namespace and OBD device name, as well as the first two
lock resource fields (typically the FID) if there is an error with
loading the object from disk. This will be more important with
FID-on-OST and also the MDS. Using fid_extract_from_res_name() isn't
possible in the LDLM code, since the lock resource may not be a FID.
Make fid_extract_quota_resid() argument order and name consistent
with other fid_*_res() functions, with FID first and resource second.
Fix a bug in ofd_lvbo_init() where NULL lvb is accessed on error.
Print FID in ofd_lvbo_update() CDEBUG() and CERROR() messages.
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2193
Lustre-change: http://review.whamcloud.com/4501
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Bobi Jam <bobijam@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In ll_file_ioctl() and ll_swap_layouts() check the result of
ll_prep_md_op_data() using IS_ERR().
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3283
Lustre-change: http://review.whamcloud.com/6275
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Fan Yong <fan.yong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of memory pressure a not locked mutex can be unlocked
in function ll_file_open(). This is not allowed and subsequent
behavior is not defined.
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3157
Lustre-change: http://review.whamcloud.com/6028
Signed-off-by: Dmitry Eremin <dmitry.eremin@intel.com>
Reviewed-by: John Hammond <johnlockwoodhammond@gmail.com>
Reviewed-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Reviewed-by: Sebastien Buisson <sebastien.buisson@bull.net>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Missing the last bit during INODELOCK check in ll_have_md_lock.
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3385
Lustre-change: http://review.whamcloud.com/6438
Signed-off-by: wang di <di.wang@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In vvp_io_write_start() the stats function ll_rw_stats_tally() was
incorrectly called with a rw argument of 0. Correct this and use the
macros READ and WRITE in and around ll_rw_stats_tally() for clarity.
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3384
Lustre-change: http://review.whamcloud.com/6447
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 4b5b4c7222 ("staging/lustre/libcfs: restore LINVRNT") added
"default false" to this Kconfig file. It was obviously meant to use
"default n" here. But we might as well drop this line, as a Kconfig bool
defaults to 'n' anyway.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Confirmed by cscope that the functions are not used anymore. A fresh compilation does not yield any errors.
Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kmap_atomic allows only one argument now, just remove the second.
Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 0ad1ea6954
I didn't use git revert because it can not be done cleanly.
Hopefully it will be the last time we do it...
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
somehow this got dropped during merge window...
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Building on 32bit system, I got warnings like below:
drivers/staging/lustre/lustre/llite/../include/lprocfs_status.h:666:7: note: expected ‘long unsigned int *’ but argument is of type ‘size_t *’
char *lprocfs_find_named_value(const char *buffer, const char *name,
drivers/staging/lustre/lustre/lov/lov_io.c: In function ‘lov_io_rw_iter_init’:
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
(void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
dump_trace() is only available on X86. Without it, Lustre's own
watchdog is broken. We can only dump current task's stack.
The client-side this code is much less likely to hit deadlocks and
it's probably OK to drop this altogether, since we hardly have any
ptlrpc threads on clients, most notable ones are ldlm cb threads
that should not really be blocking on the client anyway.
Remove libcfs watchdog for now, until the upstream kernel watchdog
can detect distributed deadlocks and dump other kernel threads.
Cc: Oleg Drokin <oleg.drokin@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kuid_t/kgid_t are wrappered when CONFIG_UIDGID_STRICT_TYPE_CHECKS is on.
Lustre build is broken because we always treat them as plain __u32.
The patch fixes it. Internally, Lustre always use __u32 uid/gid, and
convert to kuid_t/kgid_t when necessary.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Got below errors on s390 build:
CC [M] drivers/staging/lustre/lustre/llite/dir.o
drivers/staging/lustre/lustre/llite/dir.c: In function 'll_dir_filler':
drivers/staging/lustre/lustre/llite/dir.c:225:3: error: implicit declaration of function 'prefetchw' [-Werror=implicit-function-declaration]
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As reported by Fengguang:
In file included from drivers/staging/lustre/lustre/obdclass/../include/lustre/lustre_idl.h:99:0,
from drivers/staging/lustre/lustre/obdclass/../include/lprocfs_status.h:46,
from drivers/staging/lustre/lustre/obdclass/../include/obd_support.h:42,
from drivers/staging/lustre/lustre/obdclass/../include/obd_class.h:40,
from drivers/staging/lustre/lustre/obdclass/lu_object.c:53:
drivers/staging/lustre/lustre/obdclass/../include/lustre/lustre_user.h:356:10: error: field 'lmd_st' has incomplete type
drivers/staging/lustre/lustre/obdclass/../include/lustre/lustre_user.h:361:10: error: field 'lmd_st' has incomplete type
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Three functions cfs_cpu_ht_nsiblings, cfs_cpt_cpumask and
cfs_cpt_table_print are missing if !CONFIG_SMP.
cpumask_t/nodemask_t/__read_mostly/____cacheline_aligned
are redefined.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stephen Rothwell reported below error on powerpc:
In file included from drivers/staging/lustre/include/linux/libcfs/libcfs.h:203:0,
from drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.h:67,
from drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c:41:
drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd.c: In function 'kiblnd_dev_need_failover':
drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h:215:16: error: implicit declaration of function 'NIPQUAD' [-Werror=implicit-function-declaration]
static struct libcfs_debug_msg_data msgdata; \
^
We should just remove HIPQUAD and replace it with %pI4h.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lustre internal dependency needs to be cleaned up. Currently,
libcfs is acting as a basis of all other modules, while other
modules in lustre/ directory in turn depend on lnet modules.
It creates a dependency loop that need to be fixed. Hopefully
we will remove libcfs in the end. So just disable buildin for
now.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If LNetNIInit() fails, we'll get zero ln_refcount. So fail
LNetGetId() properly instead of asserting.
We can get to it when socklnd fails to scan network interfaces,
which is possible if Lustre is builtin.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It can well be NULL if Lustre is builtin.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Change Makefiles to keep link order in match with Lustre module
dependency, so that when Lustre is built in kernel, we'll have
the same dependency. Otherwise we'll crash kernel if Lustre is
builtin due to missing internal dependency.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The global variable num_physpages is going away. Replace it
with totalram_pages.
Cc: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit ee04fd11f1.
How many times can we do this...
Not all the world is x86-64, breaking other arch builds, especially for
weeks at a time, isn't acceptable at all.
So, it's nice to get this code into the tree, just don't build it as
it's obviously not ready for "real world" systems :(
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
make ARCH=i386 allyesconfig gave bellow errors:
drivers/built-in.o: In function `kiblnd_create_conn':
>> (.text+0x1a74425): undefined reference to `__umoddi3'
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No need to cast count since it is already ssize_t. No need
to cast payload to const, but need __force instead to avoid
Sparse complaining.
Reported-and-Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mdc_kuc_fops is missing open/release handlers. I fixed it before but
somehow forgot to amend to the patch sent out. Sorry...
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 37d4093fd3.
I've verified that we now don't break build on X86_64 allmodconfig.
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We don't need to implement crc32 and crc32pclmul on our own.
The crc32-pclmul support was merged into the 3.8 kernel in commit
78c37d1, no need to keep a local copy in Lustre anymore.
The crc32 implementation is identical to crypto-crc32. So drop
Lustre's private implementation and select kernel crypto in Kconfig.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
module_refcount() is not available when CONFIG_MODULE_UNLOAD is off.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was mistakenly removed by coan. Add it back and also with a new
Kconfig option to enable it.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are no callers of them. Besides, lu_context_keys_dump breaks
build when CONFIG_MODULES is not set.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
set_cpus_allowed is not available with CONFIG_CPUMASK_OFFSTACK on.
We should call set_cpus_allowed_ptr instead.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 52f6317528.
It's still broken, especially for a simple build on x86 with 'make
allmodconfig'
As no fixes seem forthcoming, just mark it broken.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I accidentally reversed the sense of the error check after the call to
dt_statfs() in lprocfs_dt_rd_{blksize,{files,kbytes}{free,avail}.
Unreverse the error checking.
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3300
Lustre-change: http://review.whamcloud.com/6385
Signed-off-by: John L. Hammond <john.hammond@intel.com>
Reviewed-by: Robert Read <robert.read@intel.com>
Reviewed-by: Emoly Liu <emoly.liu@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In case of error, the function sock_alloc_file() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Those are simple wrappers for numa allocator. We don't need them.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was emptied by coan. So drop it.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It is not used any more.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It was emptied by coan. So we no longer need it.
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Andreas Dilger <andreas.dilger@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>