kernfs has just been separated out from sysfs and we're already in
full conflict mode. Nothing can make the situation any worse. Let's
take the chance to name things properly.
This patch performs the following renames.
* s/sysfs_elem_dir/kernfs_elem_dir/
* s/sysfs_elem_symlink/kernfs_elem_symlink/
* s/sysfs_elem_attr/kernfs_elem_file/
* s/sysfs_dirent/kernfs_node/
* s/sd/kn/ in kernfs proper
* s/parent_sd/parent/
* s/target_sd/target/
* s/dir_sd/parent/
* s/to_sysfs_dirent()/rb_to_kn()/
* misc renames of local vars when they conflict with the above
Because md, mic and gpio dig into sysfs details, this patch ends up
modifying them. All are sysfs_dirent renames and trivial. While we
can avoid these by introducing a dummy wrapping struct sysfs_dirent
around kernfs_node, given the limited usage outside kernfs and sysfs
proper, I don't think such workaround is called for.
This patch is strictly rename only and doesn't introduce any
functional difference.
- mic / gpio renames were missing. Spotted by kbuild test robot.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While restructuring the [u]mount path, 4b93dc9b1c ("sysfs, kernfs:
prepare mount path for kernfs") incorrectly updated sysfs_kill_sb() so
that it first kills super_block and then tries to dereference its
namespace tag to drop it. Fix it by caching namespace tag before
killing the superblock and then drop the cached namespace tag.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/g/20131205031051.GC5135@yliu-dev.sh.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is v3.14 fix for the same issue that a8b1474442 ("sysfs: give
different locking key to regular and bin files") addresses for v3.13.
Due to the extensive kernfs reorganization in v3.14 branch, the same
fix couldn't be ported as-is. The v3.13 fix was ignored while merging
it into v3.14 branch.
027a485d12 ("sysfs: use a separate locking class for open files
depending on mmap") assigned different lockdep key to
sysfs_open_file->mutex depending on whether the file implements mmap
or not in an attempt to avoid spurious lockdep warning caused by
merging of regular and bin file paths.
While this restored some of the original behavior of using different
locks (at least lockdep is concerned) for the different clases of
files. The restoration wasn't full because now the lockdep key
assignment depends on whether the file has mmap or not instead of
whether it's a regular file or not.
This means that bin files which don't implement mmap will get assigned
the same lockdep class as regular files. This is problematic because
file_operations for bin files still implements the mmap file operation
and checking whether the sysfs file actually implements mmap happens
in the file operation after grabbing @sysfs_open_file->mutex. We
still end up adding locking dependency from mmap locking to
sysfs_open_file->mutex to the regular file mutex which triggers
spurious circular locking warning.
For v3.13, a8b1474442 ("sysfs: give different locking key to regular
and bin files") fixed it by giving sysfs_open_file->mutex different
lockdep keys depending on whether the file is regular or bin instead
of whether mmap exists or not; however, due to the way sysfs is now
layered behind kernfs, this approach is no longer viable. kernfs can
tell whether a sysfs node has mmap implemented or not but can't tell
whether a bin file from a regular one.
This patch updates kernfs such that kernfs_file_mmap() checks
SYSFS_FLAG_HAS_MMAP and bail before grabbing sysfs_open_file->mutex so
that it doesn't add spurious locking dependency from mmap to
sysfs_open_file->mutex and changes sysfs so that it specifies
kernfs_ops->mmap iff the sysfs file implements mmap. Combined, this
ensures that sysfs_open_file->mutex is grabbed under mmap path iff the
sysfs file actually implements mmap. As sysfs_open_file->mutex is
already given a different lockdep key if mmap is implemented, this
removes the spurious locking dependency.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/g/20131203184324.GA11320@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DRC code will attempt to reuse an existing, expired cache entry in
preference to allocating a new one. It'll then search the cache, and if
it gets a hit it'll then free the cache entry that it was going to
reuse.
The cache code doesn't unhash the entry that it's going to reuse
however, so it's possible for it end up designating an entry for reuse
and then subsequently freeing the same entry after it finds it. This
leads it to a later use-after-free situation and usually some list
corruption warnings or an oops.
Fix this by simply unhashing the entry that we intend to reuse. That
will mean that it's not findable via a search and should prevent this
situation from occurring.
Cc: stable@vger.kernel.org # v3.10+
Reported-by: Christoph Hellwig <hch@infradead.org>
Reported-by: g. artim <gartim@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This loop in xfs_growfs_data_private() is incorrect for V4
superblocks filesystems:
for (bucket = 0; bucket < XFS_AGFL_SIZE(mp); bucket++)
agfl->agfl_bno[bucket] = cpu_to_be32(NULLAGBLOCK);
For V4 filesystems, we don't have a agfl header structure, and so
XFS_AGFL_SIZE() returns an entire sector's worth of entries, which
we then index from an offset into the sector. Hence: buffer overrun.
This problem was introduced in 3.10 by commit 77c95bba ("xfs: add
CRC checks to the AGFL") which changed the AGFL structure but failed
to update the growfs code to handle the different structures.
Fix it by using the correct offset into the buffer for both V4 and
V5 filesystems.
Cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit b7d961b35b)
For discard operation, we should return EINVAL if the given range length
is less than a block size, otherwise it will go through the file system
to discard data blocks as the end range might be evaluated to -1, e.g,
# fstrim -v -o 0 -l 100 /xfs7
/xfs7: 9811378176 bytes were trimmed
This issue can be triggered via xfstests/generic/288.
Also, it seems to get the request queue pointer via bdev_get_queue()
instead of the hard code pointer dereference is not a bad thing.
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit f9fd013561)
If we allocate less than sizeof(struct attrlist) then we end up
corrupting memory or doing a ZERO_PTR_SIZE dereference.
This can only be triggered with CAP_SYS_ADMIN.
Reported-by: Nico Golde <nico@ngolde.de>
Reported-by: Fabian Yamaguchi <fabs@goesec.de>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 071c529eb6)
a8b1474442 ("sysfs: give different locking key to regular and bin
files") in driver-core-linus modifies sysfs_open_file() so that it
gives out different locking classes to sysfs_open_files depending on
whether the file is bin or not. Due to the massive kernfs
reorganization in driver-core-next, this naturally causes merge
conflict in fs/sysfs/file.c.
Due to the way things are split between kernfs and sysfs in
driver-core-next, the same fix can't easily be applied to
driver-core-next. This merge simply ignores the offending commit. A
following patch will implement a separate fix for the issue.
Signed-off-by: Tejun Heo <tj@kernel.org>
027a485d12 ("sysfs: use a separate locking class for open files
depending on mmap") assigned different lockdep key to
sysfs_open_file->mutex depending on whether the file implements mmap
or not in an attempt to avoid spurious lockdep warning caused by
merging of regular and bin file paths.
While this restored some of the original behavior of using different
locks (at least lockdep is concerned) for the different clases of
files. The restoration wasn't full because now the lockdep key
assignment depends on whether the file has mmap or not instead of
whether it's a regular file or not.
This means that bin files which don't implement mmap will get assigned
the same lockdep class as regular files. This is problematic because
file_operations for bin files still implements the mmap file operation
and checking whether the sysfs file actually implements mmap happens
in the file operation after grabbing @sysfs_open_file->mutex. We
still end up adding locking dependency from mmap locking to
sysfs_open_file->mutex to the regular file mutex which triggers
spurious circular locking warning.
Fix it by restoring the original behavior fully by differentiating
lockdep key by whether the file is regular or bin, instead of the
existence of mmap.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/g/20131203184324.GA11320@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull aio fix from Benjamin LaHaise:
"AIO fix from Gu Zheng that fixes a GPF that Dave Jones uncovered with
trinity"
* git://git.kvack.org/~bcrl/aio-next:
aio: clean up aio ring in the fail path
Clean up the aio ring file in the fail path of aio_setup_ring
and ioctx_alloc. And maybe it can fix the GPF issue reported by
Dave Jones:
https://lkml.org/lkml/2013/11/25/898
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
- cpufreq regression fix from Bjørn Mork restoring the pre-3.12
behavior of the framework during system suspend/hibernation to
avoid garbage sysfs files from being left behind in case of a
suspend error.
- PNP regression fix to restore the correct states of devices after
resume from hibernation broken in 3.12. From Dmitry Torokhov.
- cpuidle fix to prevent cpuidle device unregistration from crashing
due to a NULL pointer dereference if cpuidle has been disabled
from the kernel command line. From Konrad Rzeszutek Wilk.
- intel_idle fix for the C6 state definition on Intel Avoton/Rangeley
processors from Arne Bockholdt.
- Power capping framework fix to make the energy_uj sysfs attribute
work in accordance with the documentation. From Srinivas Pandruvada.
- epoll fix to make it ignore the EPOLLWAKEUP flag if the kernel has
been compiled with CONFIG_PM_SLEEP unset (in which case that flag
should not have any effect). From Amit Pundir.
- cpufreq fix to prevent governor sysfs files from being lost over
system suspend/resume in some (arguably unusual) situations. From
Viresh Kumar.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABCAAGBQJSoTE5AAoJEILEb/54YlRxh/sP/jFZhLTc8g4MC3XzhguuROzT
u0Pu9VJVfACqz8LyiCOtfOvvb2EPV7VSq7qYqszL5y9Dn5gwIHvQMMBK2PjZ4cc2
MtSiw02Bk/DEESXYOjt++n5ja/0lc05CtJTlb3uoJXBOCqp3cMrvW7+QqnLbEfbG
S+TcPFBr+4Owt/J7r2Z2JBYGZ6NbVol/x1hAFjiM+rBan6UGw7uNcg2LgQrVHcs4
S1Cm6lsJTwRcSiswvJv9/C+ML9Z/1gYYUyu7ijQnGdbNUolyzHY6AxLWZdnSkAQO
s8JVDRKy9+V44LtnWSENnJNftjlOoXWcZRJxvDePyM3dVpxESBa8Z/AxYWwCcmcB
e4rsgm/WOF86DMhRu+gfeTF+1OkU7KhuPhbXskbw+JDcZKCui2FP/xti6IAaTsU/
9M30/VeOpD1UBqckLnDTGcsFif7hVZ9LOHH5wK8OctjyaTMfUYtPd7WxfTQCpcSc
1M0NQapwfXHASmPmMW4SszAaeduecUdgXU1epOPx0EpOMQhvuLeENJBgVC5uu1cA
KAQ7suOx9ReS3slso8lpTlTEw2rsDPRuiHcF3hv7YpklNXV2jvtxsl4upHT5VN2t
3L59Unq8vY2lt3SdzWMypaAquphc3Te5woYoFwgsSPfw40Kr+jg+oUtO6IHqM/ga
OATUkTffzp+Rp3pg036Y
=gHBV
-----END PGP SIGNATURE-----
Merge tag 'pm-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
- cpufreq regression fix from Bjørn Mork restoring the pre-3.12
behavior of the framework during system suspend/hibernation to avoid
garbage sysfs files from being left behind in case of a suspend error
- PNP regression fix to restore the correct states of devices after
resume from hibernation broken in 3.12. From Dmitry Torokhov.
- cpuidle fix to prevent cpuidle device unregistration from crashing
due to a NULL pointer dereference if cpuidle has been disabled from
the kernel command line. From Konrad Rzeszutek Wilk.
- intel_idle fix for the C6 state definition on Intel Avoton/Rangeley
processors from Arne Bockholdt.
- Power capping framework fix to make the energy_uj sysfs attribute
work in accordance with the documentation. From Srinivas Pandruvada.
- epoll fix to make it ignore the EPOLLWAKEUP flag if the kernel has
been compiled with CONFIG_PM_SLEEP unset (in which case that flag
should not have any effect). From Amit Pundir.
- cpufreq fix to prevent governor sysfs files from being lost over
system suspend/resume in some (arguably unusual) situations. From
Viresh Kumar.
* tag 'pm-3.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PowerCap: Fix mode for energy counter
PNP: fix restoring devices after hibernation
cpuidle: Check for dev before deregistering it.
epoll: drop EPOLLWAKEUP if PM_SLEEP is disabled
cpufreq: fix garbage kobjects on errors during suspend/resume
cpufreq: suspend governors on system suspend/hibernate
intel_idle: Fixed C6 state on Avoton/Rangeley processors
Pull block layer fixes from Jens Axboe:
"A small collection of fixes for the current series. It contains:
- A fix for a use-after-free of a request in blk-mq. From Ming Lei
- A fix for a blk-mq bug that could attempt to dereference a NULL rq
if allocation failed
- Two xen-blkfront small fixes
- Cleanup of submit_bio_wait() type uses in the kernel, unifying
that. From Kent
- A fix for 32-bit blkg_rwstat reading. I apologize for this one
looking mangled in the shortlog, it's entirely my fault for missing
an empty line between the description and body of the text"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: fix use-after-free of request
blk-mq: fix dereference of rq->mq_ctx if allocation fails
block: xen-blkfront: Fix possible NULL ptr dereference
xen-blkfront: Silence pfn maybe-uninitialized warning
block: submit_bio_wait() conversions
Update of blkg_stat and blkg_rwstat may happen in bh context
- Stable fix for a NFSv4.1 delegation and state recovery deadlock
- Stable fix for a loop on irrecoverable errors when returning delegations
- Fix a 3-way deadlock between layoutreturn, open, and state recovery
- Update the MAINTAINERS file with contact information for Trond Myklebust
- Close needs to handle NFS4ERR_ADMIN_REVOKED
- Enabling v4.2 should not recompile nfsd and lockd
- Fix a couple of compile warnings
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSoLTpAAoJEGcL54qWCgDy2dgQAIKkKAXccg3OG2b1SxJmiaja
PcrovNmgg3HvYQ7clUMqtrMByiXEpSybl6tAeXYUWE3sS1DISSBVEwO3MoOiASiM
951Ssx+CoyhsHYo5aH83sUIiWFl/YsRhpKmSr2cdQd13DQTFbPq896k64Inf6L2/
9fngoqOD7FunQHn8AiVPoDOQzObB0OuKhYCwuwLt47oPiwgmm12JQNCDxU1i4sxb
lkGUBLkPMs6D5IyI8XHaMyX3+8MvmPiIsjIKaNJRdhkuX/k7ollucTJXyvyEQKK0
PhBIWyUULmKcAXYwCfHf9UoyGZFvmj47YggyKcBd26OZUEFekcWrULfym46F1xak
EcO6D4mlTy5i5W0RBqYCj1oGud57rixZBmhLTbeq6sSJaiqBfGEs225Q17H7rsEB
YIghHiEFNnBmVWELhHxbJHQoY6HOugmZOuc0dxopaikN/7to8gnYoVyTIVlMfe/t
UNXZoer6GOOohJGtZ7s7v4Al7EzvwnVnBCBklEAKFJ7Ca2LEmq+b58oQW3nJ1mPn
y4TnihxYXsSEbqy+Lds9rumRhJLG1oVTpwficAm7N3HdK3abzCIPEt6iOHoCmXQz
J1B4gmwOKsDqVlCSpBsnc3ZiBlSJGOn6MmVQUCNFpzv/DetWn/BxEUPE8cNm8DaI
WioD0grC0/9bR8oD1m+w
=UZ51
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for a NFSv4.1 delegation and state recovery deadlock
- Stable fix for a loop on irrecoverable errors when returning
delegations
- Fix a 3-way deadlock between layoutreturn, open, and state recovery
- Update the MAINTAINERS file with contact information for Trond
Myklebust
- Close needs to handle NFS4ERR_ADMIN_REVOKED
- Enabling v4.2 should not recompile nfsd and lockd
- Fix a couple of compile warnings
* tag 'nfs-for-3.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix do_div() warning by instead using sector_div()
MAINTAINERS: Update contact information for Trond Myklebust
NFSv4.1: Prevent a 3-way deadlock between layoutreturn, open and state recovery
SUNRPC: do not fail gss proc NULL calls with EACCES
NFSv4: close needs to handle NFS4ERR_ADMIN_REVOKED
NFSv4: Update list of irrecoverable errors on DELEGRETURN
NFSv4 wait on recovery for async session errors
NFS: Fix a warning in nfs_setsecurity
NFS: Enabling v4.2 should not recompile nfsd and lockd
When compiling a 32bit kernel with CONFIG_LBDAF=n the compiler complains like
shown below. Fix this warning by instead using sector_div() which is provided
by the kernel.h header file.
fs/nfs/blocklayout/extents.c: In function ‘normalize’:
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
fs/nfs/blocklayout/extents.c:47:13: note: in expansion of macro ‘do_div’
nfs/blocklayout/extents.c:47:2: warning: right shift count >= width of type [enabled by default]
fs/nfs/blocklayout/extents.c:47:2: warning: passing argument 1 of ‘__div64_32’ from incompatible pointer type [enabled by default]
include/asm-generic/div64.h:35:17: note: expected ‘uint64_t *’ but argument is of type ‘sector_t *’
extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Andy Adamson reports:
The state manager is recovering expired state and recovery OPENs are being
processed. If kswapd is pruning inodes at the same time, a deadlock can occur
when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the
resultant layoutreturn gets an error that the state mangager is to handle,
causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq.
At the same time an open is waiting for the inode deletion to complete in
__wait_on_freeing_inode.
If the open is either the open called by the state manager, or an open from
the same open owner that is holding the NFSv4 sequence id which causes the
OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue,
then the state is deadlocked with kswapd.
The fix is simply to have layoutreturn ignore all errors except NFS4ERR_DELAY.
We already know that layouts are dropped on all server reboots, and that
it has to be coded to deal with the "forgetful client model" that doesn't
send layoutreturns.
Reported-by: Andy Adamson <andros@netapp.com>
Link: http://lkml.kernel.org/r/1385402270-14284-1-git-send-email-andros@netapp.com
Signed-off-by: Trond Myklebust <Trond.Myklebust@primarydata.com>
page cache" code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJSnpD0AAoJEJAch/D1fbHUYFwP/iGUTnyqJYMLOyvDz4CjPCjr
w0Pfncr78oXiKkOe/ExBVH5JHlkmSa2Lfmt28tiltPgfNQbqLQB9eSvjOzOg+c73
KNqCAXXUp5a+Jhjlapo1iuyZEnOEPaJHUgiRmneFHzesE9eW6u86JntHR3wLbD8v
Hw0Z5zkrPGB+M/AsX96dt7hBiz9HJI7IQrurtx2ijrpCBeXq30YNtBMjxzqoYx1C
qcyMw/qGREObD3qlWKeBgmsKguwMmaygLaA2lUk73RcswKcISwkrUQN+BEqfP+Sc
cXvXf1C2lG7SOLQH29Osa5Y8EGFnRQlFyTaIxLFWopl3Oks7emDJOrH4rW2SuPLt
mEUOIVhZ0v6VhJ5uRmauyH4bQxBl0Uonc74bxRm602ThnUEIg2E6i+RBiy0tr/3Z
UOQB2lACEIfz/jspeCcTTOlnR1r6fKp89XLF+OLKq7qJN3XQ2EtNKrfGHyEUVBxq
GqLqJuTYbqxidJtx+PDZPW3AT71f4GSJGnL4Tdg13oHoJmLQagIRJCh6Gf5pMp1W
za0g7jDHimHJbQW/jujDhVu9bLEoWhfq+jo/JW6x+0bblrm8GM4h1FFEqcVjh0SV
UyG3QjMA4uMLbofB4MNNkgRhGC3o0L2Z3CuYVCNIz6fbQKP1tp6NqyVP89uJ1Br9
VM3Hhw2TWhXxPYDsDYeg
=OvYc
-----END PGP SIGNATURE-----
Merge tag 'squashfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next
Pull squashfs bugfix from Phillip Lougher:
"Just a single bug fix to the new "directly decompress into the page
cache" code"
* tag 'squashfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
Squashfs: fix failure to unlock pages on decompress error
kernfs inherited "security.*" xattr support from sysfs. This patch
extends xattr support to "trusted.*" using simple_xattr_*(). As
trusted xattrs are restricted to CAP_SYS_ADMIN, simple_xattr_*() which
uses kernel memory for storage shouldn't be problematic.
Note that the existing "security.*" support doesn't implement
get/remove/list and the this patch only implements those ops for
"trusted.*". We probably want to extend those ops to include support
for "security.*".
This patch will allow using kernfs from cgroup which requires
"trusted.*" xattr support.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David P. Quigley <dpquigl@tycho.nsa.gov>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs_init_inode_attrs() is a bit clumsy to use requiring the caller
to check whether @sd->s_iattr is already set or not. Rename it to
sysfs_inode_attrs(), update it to check whether @sd->s_iattr is
already initialized before trying to initialize it and return
@sd->s_iattr. This simplifies the callers.
While at it,
* Rename struct sysfs_inode_attrs pointer variables to "attrs". As
kernfs no longer deals with "struct attribute", this isn't confusing
and makes it easier to distinguish from struct iattr pointers.
* A new field will be added to sysfs_inode_attrs. Reindent in
preparation.
This patch doesn't introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Drop EPOLLWAKEUP from epoll events mask if CONFIG_PM_SLEEP is disabled.
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The pipe code was trying (and failing) to be very careful about freeing
the pipe info only after the last access, with a pattern like:
spin_lock(&inode->i_lock);
if (!--pipe->files) {
inode->i_pipe = NULL;
kill = 1;
}
spin_unlock(&inode->i_lock);
__pipe_unlock(pipe);
if (kill)
free_pipe_info(pipe);
where the final freeing is done last.
HOWEVER. The above is actually broken, because while the freeing is
done at the end, if we have two racing processes releasing the pipe
inode info, the one that *doesn't* free it will decrement the ->files
count, and unlock the inode i_lock, but then still use the
"pipe_inode_info" afterwards when it does the "__pipe_unlock(pipe)".
This is *very* hard to trigger in practice, since the race window is
very small, and adding debug options seems to just hide it by slowing
things down.
Simon originally reported this way back in July as an Oops in
kmem_cache_allocate due to a single bit corruption (due to the final
"spin_unlock(pipe->mutex.wait_lock)" incrementing a field in a different
allocation that had re-used the free'd pipe-info), it's taken this long
to figure out.
Since the 'pipe->files' accesses aren't even protected by the pipe lock
(we very much use the inode lock for that), the simple solution is to
just drop the pipe lock early. And since there were two users of this
pattern, create a helper function for it.
Introduced commit ba5bb14733 ("pipe: take allocation and freeing of
pipe_inode_info out of ->i_mutex").
Reported-by: Simon Kirby <sim@hostway.ca>
Reported-by: Ian Applegate <ia@cloudflare.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org # v3.10+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/kernfs/kernfs-internal.h needed to include fs/sysfs/sysfs.h because
part of kernfs core implementation was living in sysfs.
fs/sysfs/sysfs.h needed to include fs/kernfs/kernfs-internal.h because
include/linux/kernfs.h didn't expose enough interface.
The separation is complete and neither is true anymore. Remove the
cross inclusion and make sysfs a proper user of kernfs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
fs/sysfs/symlink.c::sysfs_delete_link() tests @sd->s_flags for
SYSFS_FLAG_NS. Let's add kernfs_ns_enabled() so that sysfs doesn't
have to test sysfs_dirent flag directly. This makes things tidier for
kernfs proper too.
This is purely cosmetic.
v2: To avoid possible NULL deref, use noop dummy implementation which
always returns false when !CONFIG_SYSFS.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs_dirent includes some information which should be available to
kernfs users - the type, flags, name and parent pointer. This patch
moves sysfs_dirent definition from kernfs/kernfs-internal.h to
include/linux/kernfs.h so that kernfs users can access them.
The type part of flags is exported as enum kernfs_node_type, the flags
kernfs_node_flag, sysfs_type() and kernfs_enable_ns() are moved to
include/linux/kernfs.h and the former is updated to return the enum
type. sysfs_dirent->s_parent and ->s_name are marked explicitly as
public.
This patch doesn't introduce any functional changes.
v2: Flags exported too and kernfs_enable_ns() definition moved.
v3: While moving kernfs_enable_ns() to include/linux/kernfs.h, v1 and
v2 put the definition outside CONFIG_SYSFS replacing the dummy
implementation with the actual implementation too. Unfortunately,
this can lead to oops when !CONFIG_SYSFS because
kernfs_enable_ns() may be called on a NULL @sd and now tries to
dereference @sd instead of not doing anything. This issue was
reported by Yuanhan Liu.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move core mount code to fs/kernfs/mount.c. The respective
declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.
This is pure relocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly. This patch
rearranges mount path so that the kernfs and sysfs parts are separate.
* As sysfs_super_info won't be visible outside kernfs proper,
kernfs_super_ns() is added to allow kernfs users to access a
super_block's namespace tag.
* Generic mount operation is separated out into kernfs_mount_ns().
sysfs_mount() now just performs sysfs-specific permission check,
acquires namespace tag, and invokes kernfs_mount_ns().
* Generic superblock release is separated out into kernfs_kill_sb()
which can be used directly as file_system_type->kill_sb(). As sysfs
needs to put the namespace tag, sysfs_kill_sb() wraps
kernfs_kill_sb() with ns tag put.
* sysfs_dir_cachep init and sysfs_inode_init() are separated out into
kernfs_init(). kernfs_init() uses only small amount of memory and
trying to handle and propagate kernfs_init() failure doesn't make
much sense. Use SLAB_PANIC for sysfs_dir_cachep and make
sysfs_inode_init() panic on failure.
After this change, kernfs_init() should be called before
sysfs_init(), fs/namespace.c::mnt_init() modified accordingly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernfs is being updated to allow multiple sysfs_dirent hierarchies so
that it can also be used by other users. Currently, sysfs
super_blocks are always attached to one kernfs_root - sysfs_root - and
distinguished only by their namespace tags.
This patch adds sysfs_super_info->root and update
sysfs_fill/test_super() so that super_blocks are identified by the
combination of both the associated kernfs_root and namespace tag.
This allows mounting different kernfs hierarchies.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernfs is being updated to allow multiple sysfs_dirent hierarchies so
that it can also be used by other users. Currently, inode number is
allocated using a global ida, sysfs_ino_ida; however, inos for
different hierarchies should be handled separately.
This patch makes ino allocation per kernfs_root. sysfs_ino_ida is
replaced by kernfs_root->ino_ida and sysfs_new_dirent() is updated to
take @root and allocate ino from it. ida_simple_get/remove() are used
instead of sysfs_ino_lock and sysfs_alloc/free_ino().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There currently is single kernfs hierarchy in the whole system which
is used for sysfs. kernfs needs to support multiple hierarchies to
allow other users. This patch introduces struct kernfs_root which
serves as the root of each kernfs hierarchy and implements
kernfs_create/destroy_root().
* Each kernfs_root is associated with a root sd (sysfs_dentry). The
root is freed when the root sd is released and kernfs_destory_root()
simply invokes kernfs_remove() on the root sd. sysfs_remove_one()
is updated to handle release of the root sd. Note that ps_iattr
update in sysfs_remove_one() is trivially updated for readability.
* Root sd's are now dynamically allocated using sysfs_new_dirent().
Update sysfs_alloc_ino() so that it gives out ino from 1 so that the
root sd still gets ino 1.
* While kernfs currently only points to the root sd, it'll soon grow
fields which are specific to each hierarchy. As determining a given
sd's root will be necessary, sd->s_dir.root is added. This backlink
fits better as a separate field in sd; however, sd->s_dir is inside
union with space to spare, so use it to save space and provide
kernfs_root() accessor to determine the root sd.
* As hierarchies may be destroyed now, each mount needs to hold onto
the hierarchy it's attached to. Update sysfs_fill_super() and
sysfs_kill_sb() so that they get and put the kernfs_root
respectively.
* sysfs_root is replaced with kernfs_root which is dynamically created
by invoking kernfs_create_root() from sysfs_init().
This patch doesn't introduce any visible behavior changes.
v2: kernfs_create_root() forgot to set @sd->priv. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, it's assumed that there's a single kernfs hierarchy in the
system anchored at sysfs_root which is defined as a global struct. To
allow other users of kernfs, this will be made dynamic. Introduce a
new global variable sysfs_root_sd which points to &sysfs_root and
convert all &sysfs_root users.
This patch doesn't introduce any behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It has been very long since sysfs depended on vfs to keep track of
internal states and whether sysfs is mounted or not doesn't make any
difference to sysfs's internal operation.
In addition to init and filesystem type registration, sysfs_init()
invokes kern_mount() to create in-kernel mount of sysfs. This
internal mounting doesn't server any purpose anymore. Remove it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add const qualifier to sysfs_super_info->ns so that it's consistent
with other namespace tag usages in sysfs. Because kobject doesn't use
const qualifier for namespace tags, this ends up requiring an explicit
cast to drop const qualifier in free_sysfs_super_info().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs_fill_super() takes three params - @sb, @data and @silent - but
uses only @sb. Drop the latter two.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move core symlink code to fs/kernfs/symlink.c. fs/sysfs/symlink.c now
only contains sysfs wrappers around kernfs interfaces. The respective
declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.
This is pure relocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move core file code to fs/kernfs/file.c. fs/sysfs/file.c now contains
sysfs kernfs_ops callbacks, sysfs wrappers around kernfs interfaces,
and sysfs_schedule_callback(). The respective declarations in
fs/sysfs/sysfs.h are moved to fs/kernfs/kernfs-internal.h.
This is pure relocation.
v2: Refreshed on top of the v2 of "sysfs, kernfs: prepare read path
for kernfs".
v3: Refreshed on top of the v3 of "sysfs, kernfs: prepare read path
for kernfs".
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move core dir code to fs/kernfs/dir.c. fs/sysfs/dir.c now only
contains sysfs_warn_dup() and sysfs wrappers around kernfs interfaces.
The respective declarations in fs/sysfs/sysfs.h are moved to
fs/kernfs/kernfs-internal.h.
This is pure relocation.
v2: sysfs_symlink_target_lock was mistakenly relocated to kernfs. It
should remain with sysfs. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's nothing sysfs-specific in fs/sysfs/inode.c. Move everything
in it to fs/kernfs/inode.c. The respective declarations in
fs/sysfs/sysfs.h are moved to fs/kernfs/kernfs-internal.h.
This is pure relocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Move data structure, constant and basic accessor declarations from
fs/sysfs/sysfs.h to fs/kernfs/kernfs-internal.h. The two files
currently include each other. Once kernfs / sysfs separation is
complete, the cross inclusions will be removed. Inclusion protectors
are added to fs/sysfs/sysfs.h to allow cross-inclusion.
This patch doesn't introduce any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce kernfs interface for finding, getting and putting
sysfs_dirents.
* sysfs_find_dirent() is renamed to kernfs_find_ns() and lockdep
assertion for sysfs_mutex is added.
* sysfs_get_dirent_ns() is renamed to kernfs_find_and_get().
* Macro inline dancing around __sysfs_get/put() are removed and
kernfs_get/put() are made proper functions implemented in
fs/sysfs/dir.c.
While the conversions are mostly equivalent, there's one difference -
kernfs_get() doesn't return the input param as its return value. This
change is intentional. While passing through the input increases
writability in some areas, it is unnecessary and has been shown to
cause confusion regarding how the last ref is handled.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, sysfs_dirent active_ref lockdep annotation uses
attribute->[s]key as the lockdep key, which forces
kernfs_create_file_ns() to assume that sysfs_dirent->priv is pointing
to a struct attribute which may not be true for non-sysfs users. This
patch restructures the lockdep annotation such that
* kernfs_ops contains lockdep_key which is used by default for files
created kernfs_create_file_ns().
* kernfs_create_file_ns_key() is introduced which takes an extra @key
argument. The created file will use the specified key for
active_ref lockdep annotation. If NULL is specified, lockdep for
the file is disabled.
* sysfs_add_file_mode_ns() is updated to use
kernfs_create_file_ns_key() with the appropriate key from the
attribute or NULL if ignore_lockdep is set.
This makes the lockdep annotation properly contained in kernfs while
allowing sysfs to cleanly keep its current behavior. This patch
doesn't introduce any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We want to add one more SYSFS_FLAG_* but we can't use the next higher
bit, 0x10000, as the flag field is 16bits wide. The flags are
currently arranged weirdly - 8 bits are set aside for the type flags
when there are only three three used, the first flag starts at 0x1000
instead of 0x0100 and flag literals have 5 digits (20 bits) when only
4 digits can be used.
Rearrange them so that type bits are only the lowest four, flags start
at 0x0010 and similar flags are grouped.
This patch doesn't cause any behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce kernfs interface to wake up poll(2) which takes and returns
sysfs_dirents.
sysfs_notify_dirent() is renamed to kernfs_notify() and sysfs_notify()
is updated so that it doesn't directly grab sysfs_mutex but acquires
the target sysfs_dirents using sysfs_get_dirent().
sysfs_notify_dirent() is reimplemented as a dumb inline wrapper around
kernfs_notify().
This patch doesn't introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
kernfs_ops currently only supports single_open() behavior which is
pretty restrictive. Add optional callbacks ->seq_{start|next|stop}()
which, when implemented, are invoked for seq_file traversal. This
allows full seq_file functionality for kernfs users. This currently
doesn't have any user and doesn't change any behavior.
v2: Refreshed on top of the updated "sysfs, kernfs: prepare read path
for kernfs".
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs_add_one() is a wrapper around __sysfs_add_one() which prints out
duplicate name warning if __sysfs_add_one() fails with -EEXIST. The
previous kernfs conversions moved all dup warnings to sysfs interface
functions and sysfs_add_one() doesn't have any user left.
Remove sysfs_add_one() and update __sysfs_add_one() to take its name.
This patch doesn't make any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce kernfs interface to create a file which takes and returns
sysfs_dirents.
The actual file creation part is separated out from
sysfs_add_file_mode_ns() into kernfs_create_file_ns(). The former now
only decides the kernfs_ops to use and the file's size and invokes the
latter.
This patch doesn't introduce behavior changes.
v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After kernfs_ops and sysfs_dirent->s_attr.size addition, the
distinction between SYSFS_KOBJ_BIN_ATTR and SYSFS_KOBJ_ATTR is only
necessary while creating files to decide which kernfs_ops to use.
Afterwards, they behave exactly the same.
This patch removes SYSFS_KOBJ_BIN_ATTR along with sysfs_is_bin().
sysfs_add_file[_mode_ns]() are updated to take bool @is_bin instead of
@type.
This patch doesn't introduce any behavior changes. This completely
isolates the distinction between the two sysfs file types in the sysfs
layer proper.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs sets the size of regular files unconditionally at PAGE_SIZE and
takes the size of bin files from bin_attribute. The latter is a
pretty bad interface which forces bin_attribute users to create a
separate copy of bin_attribute for each instance of the file -
e.g. pci resource files.
Add sysfs_dirent->s_attr.size so that the size can be specified
separately. This unifies inode init paths of ATTR and BIN_ATTR
identical and allows for generic size handling for kernfs.
Unfortunately, this grows the size of sysfs_dirent by sizeof(loff_t).
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly. This patch
introduces kernfs_ops which hosts methods kernfs users implement and
updates fs/sysfs/file.c such that sysfs_kf_*() functions populate
kernfs_ops and kernfs_file_*() functions call the matching entries
from kernfs_ops.
kernfs_ops contains the following groups of methods.
* seq_show() - for kernfs files which use seq_file for reads.
* read() - for direct read implementations. Used iff seq_show() is
not implemented.
* write() - for writes.
* mmap() - for mmaps.
Notes:
* sysfs_elem_attr->ops is added so that kernfs_ops can be accessed
from sysfs_dirent. kernfs_ops() helper is added to verify locking
and access the field.
* SYSFS_FLAG_HAS_(SEQ_SHOW|MMAP) added. sd->s_attr->ops is accessible
only while holding active_ref and there are cases where we want to
take different actions depending on which ops are implemented.
These two flags cache whether the two ops are implemented for those.
* kernfs_file_*() no longer test sysfs type but chooses different
behaviors depending on which methods in kernfs_ops are implemented.
The conversions are trivial except for the open path. As
kernfs_file_open() now decides whether to allow read/write accesses
depending on the kernfs_ops implemented, the presence of methods in
kobjs and attribute_bin should be propagated to kernfs_ops.
sysfs_add_file_mode_ns() is updated so that it propagates presence /
absence of the callbacks through _empty, _ro, _wo, _rw kernfs_ops.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs_open_file will be used as the primary handle for kernfs methods.
Move its definition from fs/sysfs/file.c to include/linux/kernfs.h and
mark the public and private fields.
This is pure relocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly. This patch
prepares the rest - open, release and poll. There isn't much to do.
Just renaming is enough. As sysfs_file_operations and
sysfs_bin_operations are identical now, use the same file_operations
for both - kernfs_file_operations.
This patch doesn't introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly. This patch
rearranges mmap path so that the kernfs and sysfs parts are separate.
sysfs_kf_bin_mmap() which handles the interaction with bin_attribute
mmap method is factored out of sysfs_bin_mmap(), which is renamed to
kernfs_file_mmap(). All vma ops are renamed accordingly.
sysfs_bin_mmap() is updated such that it can be used for both file
types. This will eventually allow using the same file_operations for
both file types, which is necessary to separate out kernfs.
This patch doesn't introduce any behavior changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly. This patch
rearranges write path so that the kernfs and sysfs parts are separate.
kernfs_file_write() handles all boilerplate work including buffer
management and locking and invokes sysfs_kf_write() or
sysfs_kf_bin_write() depending on the file type which deals with the
interaction with kobj store or bin_attribute write method.
While this patch changes the order of some operations, it shouldn't
change any visible behavior.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We're in the process of separating out core sysfs functionality into
kernfs which will deal with sysfs_dirents directly. This patch
rearranges read path so that the kernfs and sysfs parts are separate.
* Regular file read path is refactored such that
kernfs_seq_start/next/stop/show() handle all the boilerplate work
including locking and updating event count for poll, while
sysfs_kf_seq_show() deals with interaction with kobj show method.
* Bin file read path is refactored such that kernfs_file_direct_read()
handles all the boilerplate work including buffer management and
locking, while sysfs_kf_bin_read() deals with interaction with
bin_attribute read method.
kernfs_file_read() is added. It invokes either the seq_file or direct
read path depending on the file type. This will eventually allow
using the same file_operations for both file types, which is necessary
to separate out kernfs.
While this patch changes the order of some operations, it shouldn't
change any visible behavior.
v2: Dropped unnecessary zeroing of @count from sysfs_kf_seq_show().
Add comments explaining single_open() behavior. Both suggested by
Pavel.
v3: seq_stop() is called even after seq_start() failed.
kernfs_seq_start() updated so that it doesn't unlock
sysfs_open_file->mutex on failure so that kernfs_seq_stop()
doesn't try to unlock an already unlocked mutex. Reported by
Fengguang.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce kernfs interface to manipulate a directory which takes and
returns sysfs_dirents.
create_dir() is renamed to kernfs_create_dir_ns() and its argumantes
and return value are updated. create_dir() usages are replaced with
kernfs_create_dir_ns() and sysfs_create_subdir() usages are replaced
with kernfs_create_dir(). Dup warnings are handled explicitly by
sysfs users of the kernfs interface.
sysfs_enable_ns() is renamed to kernfs_enable_ns().
This patch doesn't introduce any behavior changes.
v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.
v3: kernfs_enable_ns() added.
v4: Refreshed on top of "sysfs: drop kobj_ns_type handling, take #2"
so that this patch removes sysfs_enable_ns().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A directory sysfs_dirent points to the associated kobj. A regular or
bin file points to the associated [bin_]attribute. This patch
replaces sysfs_dirent->s_dir.kobj and ->s_attr.[bin_]attr with void *
->priv.
This is to prepare for kernfs interface so that sysfs can specify the
private data in the same way for directories and files. This lower
debuggability but not by much - the whole thing was overlaid in a
union anyway. If debuggability becomes an issue, we can later add
->priv accessors which explicitly check for the sysfs_dirent type and
performs casting.
This patch doesn't introduce any behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull vfs dentry reference count fix from Al Viro.
This fixes a possible inode_permission NULL pointer dereference (and
other problems) that were due to the root dentry count being decremented
too much. In commit 48a066e72d ("RCU'd vfsmounts") the placement of
clearing the LOOKUP_RCU bit changed, and we then returned failure of
incrementing the lockref on the parent dentry with LOOKUP_RCU cleared.
But that meant we needed to go through the same cleanup routines that
the later failures did wrt LOOKUP_ROOT and nd->root.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix bogus path_put() of nd->root after some unlazy_walk() failures
Failure to grab reference to parent dentry should go through the
same cleanup as nd->seq mismatch. As it is, we might end up with
caller thinking it needs to path_put() nd->root, with obvious
nasty results once we'd hit that bug enough times to drive the
refcount of root dentry all the way to zero...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull cifs fixes from Steve French:
"SMB3 "validate negotiate" is needed to prevent certain types of
downgrade attacks.
Also changes SMB2/SMB3 copy offload from using the BTRFS copy ioctl
(BTRFS_IOC_CLONE) to a cifs specific ioctl (CIFS_IOC_COPYCHUNK_FILE)
to address Christoph's comment that there are semantic differences
between requesting copy offload in which copy-on-write is mandatory
(as in the BTRFS ioctl) and optional in the SMB2/SMB3 case. Also
fixes SMB2/SMB3 copychunk for large files"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload
Check SMB3 dialects against downgrade attacks
Removed duplicated (and unneeded) goto
CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files
We need those sysfs fixes in this branch to make testing, and future
patches apply properly.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here are 3 patches for sysfs issues that have been reported. Well, 1
patch really, the first one is reverted as it's not really needed (the
correct fix is coming in through the different driver subsystems
instead.) But that 1 sysfs fix is needed, so this is still a good thing
to pull in now.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEABECAAYFAlKWdcsACgkQMUfUDdst+yn9HgCgvXeP/GeK7Bt+1YhFIsrdRrNq
7qsAnAvXOHh5FCn7h2Cw0yYb35kgUMQx
=ZZSr
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are 3 patches for sysfs issues that have been reported. Well, 1
patch really, the first one is reverted as it's not really needed (the
correct fix is coming in through the different driver subsystems
instead)
But that 1 sysfs fix is needed, so this is still a good thing to pull
in now"
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* tag 'driver-core-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "sysfs: handle duplicate removal attempts in sysfs_remove_group()"
sysfs: use a separate locking class for open files depending on mmap
sysfs: handle duplicate removal attempts in sysfs_remove_group()
This tool hasn't been maintained in over a decade, and is pretty much
useless these days. Let's pretend it never happened.
Also remove a long-dead email address.
Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce kernfs setattr interface - kernfs_setattr().
sysfs_sd_setattr() is renamed to __kernfs_setattr() and
kernfs_setattr() is a simple wrapper around it with sysfs_mutex
locking. sysfs_chmod_file() is updated to get an explicit ref on
kobj->sd and then invoke kernfs_setattr() so that it doesn't have to
use internal interface.
This patch doesn't introduce any behavior differences.
v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce kernfs rename interface, krenfs_rename[_ns]().
This is just rename of sysfs_rename(). No functional changes.
Function comment is added to kernfs_rename_ns() and @new_parent_sd is
renamed to @new_parent for consistency with other kernfs interfaces.
v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Separate out kernfs symlink interface - kernfs_create_link() - which
takes and returns sysfs_dirents, from sysfs_do_create_link_sd().
sysfs_do_create_link_sd() now just determines the parent and target
sysfs_dirents and invokes the new interface and handles dup warning.
This patch doesn't introduce behavior changes.
v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce kernfs removal interfaces - kernfs_remove() and
kernfs_remove_by_name[_ns]().
These are just renames of sysfs_remove() and sysfs_hash_and_remove().
No functional changes.
v2: Dummy kernfs_remove_by_name_ns() for !CONFIG_SYSFS updated to
return -ENOSYS instead of 0.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Core sysfs implementation will be separated into kernfs so that it can
be used by other non-kobject users.
This patch creates fs/kernfs/ directory and makes boilerplate changes.
kernfs interface will be directly based on sysfs_dirent and its
forward declaration is moved to include/linux/kernfs.h which is
included from include/linux/sysfs.h. sysfs core implementation will
be gradually separated out and moved to kernfs.
This patch doesn't introduce any functional changes.
v2: mount.c added.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the kobject based interface guarantees that a parent
sysfs_dirent is always a directory; however, the planned kernfs
interface will be directly based on sysfs_dirents and the caller may
specify non-directory node as the parent. Add an explicit check in
__sysfs_add_one() so that such attempts fail with -EINVAL.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The way namespace tags are implemented in sysfs is more complicated
than necessary. As each tag is a pointer value and required to be
non-NULL under a namespace enabled parent, there's no need to record
separately what type each tag is. If multiple namespace types are
needed, which currently aren't, we can simply compare the tag to a set
of allowed tags in the superblock assuming that the tags, being
pointers, won't have the same value across multiple types.
This patch rips out kobj_ns_type handling from sysfs. sysfs now has
an enable switch to turn on namespace under a node. If enabled, all
children are required to have non-NULL namespace tags and filtered
against the super_block's tag.
kobject namespace determination is now performed in
lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary.
The sanity checks are also moved. create_dir() is restructured to
ease such addition. This removes most kobject namespace knowledge
from sysfs proper which will enable proper separation and layering of
sysfs.
This is the second try. The first one was cb26a31157 ("sysfs: drop
kobj_ns_type handling") which tried to automatically enable namespace
if there are children with non-NULL namespace tags; however, it was
broken for symlinks as they should inherit the target's tag iff
namespace is enabled in the parent. This led to namespace filtering
enabled incorrectly for wireless net class devices through phy80211
symlinks and thus network configuration failure. a1212d278c
("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit.
This shouldn't introduce any behavior changes, for real.
v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was
missing and caused build failure. Reported by kbuild test robot.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 54d71145a4.
The root cause of these "inverted" sysfs removals have now been found,
so there is no need for this patch. Keep this functionality around so
that this type of error doesn't show up in driver code again.
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull ceph bug-fixes from Sage Weil:
"These include a couple fixes to the new fscache code that went in
during the last cycle (which will need to go stable@ shortly as well),
a couple client-side directory fragmentation fixes, a fix for a race
in the cap release queuing path, and a couple race fixes in the
request abort and resend code.
Obviously some of this could have gone into 3.12 final, but I
preferred to overtest rather than send things in for a late -rc, and
then my travel schedule intervened"
* 'for-linus-bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: allocate non-zero page to fscache in readpage()
ceph: wake up 'safe' waiters when unregistering request
ceph: cleanup aborted requests when re-sending requests.
ceph: handle race between cap reconnect and cap release
ceph: set caps count after composing cap reconnect message
ceph: queue cap release in __ceph_remove_cap()
ceph: handle frag mismatch between readdir request and reply
ceph: remove outdated frag information
ceph: hung on ceph fscache invalidate in some cases
Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
of BTRFS_IOC_CLONE to avoid confusion about whether
copy-on-write is required or optional for this operation.
SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
they both speed up copy by offloading the copy rather than
passing many read and write requests back and forth and both have
identical syntax (passing file handles), but for SMB2/SMB3
CopyChunk the server is not required to use copy-on-write
to make a copy of the file (although some do), and Christoph
has commented that since CopyChunk does not require
copy-on-write we should not reuse BTRFS_IOC_CLONE.
This patch renames the ioctl to use a cifs specific IOCTL
CIFS_IOCTL_COPYCHUNK. This ioctl is particularly important
for SMB2/SMB3 since large file copy over the network otherwise
can be very slow, and with this is often more than 100 times
faster putting less load on server and client.
Note that if a copy syscall is ever introduced, depending on
its requirements/format it could end up using one of the other
three methods that CIFS/SMB2/SMB3 can do for copy offload,
but this method is particularly useful for file copy
and broadly supported (not just by Samba server).
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Disseldorp <ddiss@samba.org>
It was being open coded in a few places.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Direct decompression into the page cache. If we fall back
to using an intermediate buffer (because we cannot grab all the
page cache pages) and we get a decompress fail, we forgot to
release the pages.
Reported-by: Roman Peniaev <r.peniaev@gmail.com>
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
ceph_osdc_readpages() returns number of bytes read, currently,
the code only allocate full-zero page into fscache, this patch
fixes this.
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Milosz Tanski <milosz@adfin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
We also need to wake up 'safe' waiters if error occurs or request
aborted. Otherwise sync(2)/fsync(2) may hang forever.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Aborted requests usually get cleared when the reply is received.
If MDS crashes, no reply will be received. So we need to cleanup
aborted requests when re-sending requests.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
When a cap get released while composing the cap reconnect message.
We should skip queuing the release message if the cap hasn't been
added to the cap reconnect message.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
It's possible that some caps get released while composing the cap
reconnect message.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The following two commits implemented mmap support in the regular file
path and merged bin file support into the regular path.
73d9714627 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
3124eb1679 ("sysfs: merge regular and bin file handling")
After the merge, the following commands trigger a spurious lockdep
warning. "test-mmap-read" simply mmaps the file and dumps the
content.
$ cat /sys/block/sda/trace/act_mask
$ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0-work+ #378 Not tainted
-------------------------------------------------------
test-mmap-read/567 is trying to acquire lock:
(&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&mm->mmap_sem){++++++}:
...
-> #2 (sr_mutex){+.+.+.}:
...
-> #1 (&bdev->bd_mutex){+.+.+.}:
...
-> #0 (&of->mutex){+.+.+.}:
...
other info that might help us debug this:
Chain exists of:
&of->mutex --> sr_mutex --> &mm->mmap_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(sr_mutex);
lock(&mm->mmap_sem);
lock(&of->mutex);
*** DEADLOCK ***
1 lock held by test-mmap-read/567:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
stack backtrace:
CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
Call Trace:
[<ffffffff81611ad2>] dump_stack+0x4e/0x7a
[<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
[<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
[<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
[<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
[<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
[<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
[<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
[<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
[<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
[<ffffffff81008282>] SyS_mmap+0x22/0x30
[<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
This happens because one file nests sr_mutex, which nests mm->mmap_sem
under it, under of->mutex while mmap implementation naturally nests
of->mutex under mm->mmap_sem. The warning is false positive as
of->mutex is per open-file and the two paths belong to two different
files. This warning didn't trigger before regular and bin file
supports were merged because only bin file supported mmap and the
other side of locking happened only on regular files which used
equivalent but separate locking.
It'd be best if we give separate locking classes per file but we can't
easily do that. Let's differentiate on ->mmap() for now. Later we'll
add explicit file operations struct and can add per-ops lockdep key
there.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit bcdde7e221 (sysfs: make __sysfs_remove_dir() recursive) changed
the behavior so that directory removals will be done recursively. This
means that the sysfs group might already be removed if its parent directory
has been removed.
The current code outputs warnings similar to following log snippet when it
detects that there is no group for the given kobject:
WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
Modules linked in:
CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
Call Trace:
[<ffffffff817daab1>] dump_stack+0x45/0x56
[<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
[<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
[<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
[<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
[<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
[<ffffffff8142a0d0>] device_del+0x40/0x1b0
[<ffffffff8142a24d>] device_unregister+0xd/0x20
[<ffffffff8144131a>] scsi_remove_host+0xba/0x110
[<ffffffff8145f526>] ata_host_detach+0xc6/0x100
[<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
[<ffffffff812e8f48>] pci_device_remove+0x28/0x60
[<ffffffff8142d854>] __device_release_driver+0x64/0xd0
[<ffffffff8142d8de>] device_release_driver+0x1e/0x30
[<ffffffff8142d257>] bus_remove_device+0xf7/0x140
[<ffffffff8142a1b1>] device_del+0x121/0x1b0
[<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
[<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
[<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
[<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
[<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
[<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
[<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
[<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
[<ffffffff812fd90d>] hotplug_event+0xcd/0x160
[<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
[<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
[<ffffffff8105cf3a>] process_one_work+0x17a/0x430
[<ffffffff8105db29>] worker_thread+0x119/0x390
[<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
[<ffffffff81063a5d>] kthread+0xcd/0xf0
[<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
[<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
[<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
On this particular machine I see ~16 of these message during Thunderbolt
hot-unplug.
Fix this in similar way that was done for sysfs_remove_one() by checking
if the parent directory has already been removed and bailing out early.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull aio fixes from Benjamin LaHaise.
* git://git.kvack.org/~bcrl/aio-next:
aio: nullify aio->ring_pages after freeing it
aio: prevent double free in ioctx_alloc
aio: Fix a trinity splat
Pull nfsd bugfixes from Bruce Fields:
"A couple nfsd bugfixes"
* 'for-3.13' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix xdr decoding of large non-write compounds
nfsd: make sure to balance get/put_write_access
nfsd: split up nfsd_setattr
fixes a possible NULL pointer dereference, and the second one
resolves a reference counting issue in one of the lesser used paths
through atomic_open.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
iQIcBAABAgAGBQJSjy+IAAoJEMrg3m4a/8jSy4wP+wXU26Hbj0nIkzxEokxgQLO/
64flbmRx/960CAEFiFWhrSQUIpb2R2Lw1tNGiRiAhch4wLmRrKMEZnR/dI57KaMR
/dcm78q2aPbNvdB/5pxrTQx8bkRYcVOCZdD7lizUUHs/oLIhl0IZ4uGSOy0kUdTd
gcgus1Rd7mPJp+Smr33W7rPB6WAfo6+iAn7RNcdxgRcH2Ng8BZm82Z4Gn8TDN3H1
cZ8yzFbycLvj67kpkdEesjaDTQv/zWrtNg/ocVDGbPRxykzf89v94LpodZkpvRAz
vvCT7MNy9R8konDcGImSV6zRstYFtQwg8hqzIvSgla6yppghpGXRp9c7PpPDNzc9
ynVvZqc0MbZeTlGt5/DQW6QmFKwgJcT/tJSaDdx1NRoHQTOXzSM9hw5it048Hm8L
AUrsAvFJtTtm7Uwd7CE78IvrOQWNl0EAgw20twKR1GR6dnGmvvaVFrLXSacCNsFm
VkOs6QLRoax6DbjHgC0ERka5J7YcGniNff+vx2ndRkth3KWv3N8fqh43mBtuQcRN
yzAffJMXYGWWBHvaXR71dJ35Z4lFbMfKU+oPZokucl9c1dnFJ26cqVKMlhD+hUGK
ms0qVDiRf4C8qhIyWor1FqWCGeFlXwtTzgGtoovrOoWJxqp6vLdwFjEEPKb1Tlai
pBmD16Z3ehx8I76Yzgjx
=aMoW
-----END PGP SIGNATURE-----
Merge tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Pull GFS2 fixes from Steven Whitehouse:
"A couple of small, but important bug fixes for GFS2. The first one
fixes a possible NULL pointer dereference, and the second one resolves
a reference counting issue in one of the lesser used paths through
atomic_open"
* tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Fix ref count bug relating to atomic_open
GFS2: fix potential NULL pointer dereference
Pull btrfs fixes from Chris Mason:
"Almost all of these are bug fixes. Dave Sterba's documentation update
is the big exception because he removed our promises to set any
machine running Btrfs on fire"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Documentation: filesystems: update btrfs tools section
Documentation: filesystems: add new btrfs mount options
btrfs: update kconfig help text
btrfs: fix bio_size_ok() for max_sectors > 0xffff
btrfs: Use trace condition for get_extent tracepoint
btrfs: fix typo in the log message
Btrfs: fix list delete warning when removing ordered root from the list
Btrfs: print bytenr instead of page pointer in check-int
Btrfs: remove dead codes from ctree.h
Btrfs: don't wait for ordered data outside desired range
Btrfs: fix lockdep error in async commit
Btrfs: avoid heavy operations in btrfs_commit_super
Btrfs: fix __btrfs_start_workers retval
Btrfs: disable online raid-repair on ro mounts
Btrfs: do not inc uncorrectable_errors counter on ro scrubs
Btrfs: only drop modified extents if we logged the whole inode
Btrfs: make sure to copy everything if we rename
Btrfs: don't BUG_ON() if we get an error walking backrefs
Here we have a performance fix for inode iversion, increased inode cluster size
for v5 superblock filesystems, a fix for error handling in
xfs_bmap_add_attrfork, and a MAINTAINERS update to add Dave.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJSjjbHAAoJENaLyazVq6ZO/l4QAIajqdOy56MDDCaE1ocNRMej
oPXqxMZpi+YAFRXIWQK/evjXjYE4hcCjiLI0tAKzQM0FGRHd1GgQQJvWKjvHBrLG
rlrHWDcKq+3bsJ6KIw+JvLnhsOxxKXbRovIG1PI4frOeFtS3DCtzMZ4FGBErh+BV
PkFjf5Lxe7VnegDL4eNpkAfSM/2/pEtn2gIMMj8eeKemTPAbDplMAk4UUxp18R7F
FCr6mOKETWthHdBgkLB1xA6OVGUuYcleWvluFc5PONJN1VPSutEHJaRw01lY1VLY
4COUt7MjLAqlAu24LTD1aNszhUlajYq0AL+nmd4gULZI2fpu+meUIGCHQG4BpVZl
ds9isn80SmKiLT4ZQCbJAv4XMEXn6p41+uxdKP6ZkYXso4zJmVw0TyGLg/ZOJpw0
9mNcaJG1s57ronN07dMCSMqsyNFLuLtX7+mVa5liO3sEkXv7hEK7EB9qojQ9Qn/p
xC2r4jrE0//xgcbOE+uKOyIad3L0IBM6TXuy58xVPi5l0dpM69z4LCyUGZ+L1G8x
0QhkA7NL3xI2tNgehzJBsBuxjtEePtm/jCIS1HfDSIRQZR3y+PxVEjgO4naS4vm2
xKwo5dBodsxgXeSMUY3miMuEX7ZQse+xVDK1q0Zk4VXZyC2+erNS5hxs313yzeLD
7X2ZESfIJaMrkLSKOVUe
=w0M3
-----END PGP SIGNATURE-----
Merge tag 'xfs-for-linus-v3.13-rc1-2' of git://oss.sgi.com/xfs/xfs
Pull second xfs update from Ben Myers:
"There are a couple of patches that I wasn't quite sure about in time
for our initial 3.13 pull request, a bugfix, and an update to add Dave
to MAINTAINERS:
Here we have a performance fix for inode iversion, increased inode
cluster size for v5 superblock filesystems, a fix for error handling
in xfs_bmap_add_attrfork, and a MAINTAINERS update to add Dave"
* tag 'xfs-for-linus-v3.13-rc1-2' of git://oss.sgi.com/xfs/xfs:
xfs: open code inc_inode_iversion when logging an inode
xfs: increase inode cluster size for v5 filesystems
xfs: fix unlock in xfs_bmap_add_attrfork
xfs: update maintainers
Merge patches from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: place page->pmd_huge_pte to right union
MAINTAINERS: add keyboard driver to Hyper-V file list
x86, mm: do not leak page->ptl for pmd page tables
ipc,shm: correct error return value in shmctl (SHM_UNLOCK)
mm, mempolicy: silence gcc warning
block/partitions/efi.c: fix bound check
ARM: drivers/rtc/rtc-at91rm9200.c: disable interrupts at shutdown
mm: hugetlbfs: fix hugetlbfs optimization
kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly
ipc,shm: fix shm_file deletion races
mm: thp: give transparent hugepage code a separate copy_page
checkpatch: fix "Use of uninitialized value" warnings
configfs: fix race between dentry put and lookup
Pull audit updates from Eric Paris:
"Nothing amazing. Formatting, small bug fixes, couple of fixes where
we didn't get records due to some old VFS changes, and a change to how
we collect execve info..."
Fixed conflict in fs/exec.c as per Eric and linux-next.
* git://git.infradead.org/users/eparis/audit: (28 commits)
audit: fix type of sessionid in audit_set_loginuid()
audit: call audit_bprm() only once to add AUDIT_EXECVE information
audit: move audit_aux_data_execve contents into audit_context union
audit: remove unused envc member of audit_aux_data_execve
audit: Kill the unused struct audit_aux_data_capset
audit: do not reject all AUDIT_INODE filter types
audit: suppress stock memalloc failure warnings since already managed
audit: log the audit_names record type
audit: add child record before the create to handle case where create fails
audit: use given values in tty_audit enable api
audit: use nlmsg_len() to get message payload length
audit: use memset instead of trying to initialize field by field
audit: fix info leak in AUDIT_GET requests
audit: update AUDIT_INODE filter rule to comparator function
audit: audit feature to set loginuid immutable
audit: audit feature to only allow unsetting the loginuid
audit: allow unsetting the loginuid (with priv)
audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE
audit: loginuid functions coding style
selinux: apply selinux checks on new audit message types
...
A race window in configfs, it starts from one dentry is UNHASHED and end
before configfs_d_iput is called. In this window, if a lookup happen,
since the original dentry was UNHASHED, so a new dentry will be
allocated, and then in configfs_attach_attr(), sd->s_dentry will be
updated to the new dentry. Then in configfs_d_iput(),
BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
sys_open: sys_close:
... fput
dput
dentry_kill
__d_drop <--- dentry unhashed here,
but sd->dentry still point
to this dentry.
lookup_real
configfs_lookup
configfs_attach_attr---> update sd->s_dentry
to new allocated dentry here.
d_kill
configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
triggered here.
To fix it, change configfs_d_iput to not update sd->s_dentry if
sd->s_count > 2, that means there are another dentry is using the sd
beside the one that is going to be put. Use configfs_dirent_lock in
configfs_attach_attr to sync with configfs_d_iput.
With the following steps, you can reproduce the bug.
1. enable ocfs2, this will mount configfs at /sys/kernel/config and
fill configure in it.
2. run the following script.
while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the case that atomic_open calls finish_no_open() with
the dentry that was supplied to gfs2_atomic_open() an
extra reference count is required. This patch fixes that
issue preventing a bug trap triggering at umount time.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Commit [e66cf1610: GFS2: Use lockref for glocks] replaced call:
atomic_read(&gi->gl->gl_ref) == 0
with:
__lockref_is_dead(&gl->gl_lockref)
therefore changing how gl is accessed, from gi->gl to plan gl.
However, gl can be a NULL pointer, and so gi->gl needs to be
used instead (which is guaranteed not to be NULL because fo
the while loop checking that condition).
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Reflect the current status. Portions of the text taken from the
wiki pages.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The data type of max_sectors in queue settings is unsigned int. But
this value is stored to the local variable whose type is unsigned short
in bio_size_ok(). This can cause unexpected result when max_sectors >
0xffff.
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Doing an if statement to test some condition to know if we should
trigger a tracepoint is pointless when tracing is disabled. This just
adds overhead and wastes a branch prediction. This is why the
TRACE_EVENT_CONDITION() was created. It places the check inside the jump
label so that the branch does not happen unless tracing is enabled.
That is, instead of doing:
if (em)
trace_btrfs_get_extent(root, em);
Which is basically this:
if (em)
if (static_key(trace_btrfs_get_extent)) {
Using a TRACE_EVENT_CONDITION() we can just do:
trace_btrfs_get_extent(root, em);
And the condition trace event will do:
if (static_key(trace_btrfs_get_extent)) {
if (em) {
...
The static key is a non conditional jump (or nop) that is faster than
having to check if em is NULL or not.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
Commit b02441999e "Btrfs: don't wait for
the completion of all the ordered extents" introduced a bug that broke
the ordered root list:
WARNING: CPU: 1 PID: 7119 at lib/list_debug.c:59 __list_del_entry+0x5a/0x98()
It is because we forgot to return the roots in the splice list to the
ordered list of the fs. Fix it.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
The page pointer information was useless. The bytenr is what you
want when you search for submitted write bios.
Additionally, a new bit in the print mask is added that allows
to selectively enable the check-int submit_bio verbose mode. Before,
the global verbose mode had to be enabled leading to many million
useless lines in the kernel log.
And a comment is added that explains that LOG_BUF_SHIFT needs to
be set to a really high value.
Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>