027a485d12 ("sysfs: use a separate locking class for open files
depending on mmap") assigned different lockdep key to
sysfs_open_file->mutex depending on whether the file implements mmap
or not in an attempt to avoid spurious lockdep warning caused by
merging of regular and bin file paths.
While this restored some of the original behavior of using different
locks (at least lockdep is concerned) for the different clases of
files. The restoration wasn't full because now the lockdep key
assignment depends on whether the file has mmap or not instead of
whether it's a regular file or not.
This means that bin files which don't implement mmap will get assigned
the same lockdep class as regular files. This is problematic because
file_operations for bin files still implements the mmap file operation
and checking whether the sysfs file actually implements mmap happens
in the file operation after grabbing @sysfs_open_file->mutex. We
still end up adding locking dependency from mmap locking to
sysfs_open_file->mutex to the regular file mutex which triggers
spurious circular locking warning.
Fix it by restoring the original behavior fully by differentiating
lockdep key by whether the file is regular or bin, instead of the
existence of mmap.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@redhat.com>
Link: http://lkml.kernel.org/g/20131203184324.GA11320@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 54d71145a4.
The root cause of these "inverted" sysfs removals have now been found,
so there is no need for this patch. Keep this functionality around so
that this type of error doesn't show up in driver code again.
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The following two commits implemented mmap support in the regular file
path and merged bin file support into the regular path.
73d9714627 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
3124eb1679 ("sysfs: merge regular and bin file handling")
After the merge, the following commands trigger a spurious lockdep
warning. "test-mmap-read" simply mmaps the file and dumps the
content.
$ cat /sys/block/sda/trace/act_mask
$ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0-work+ #378 Not tainted
-------------------------------------------------------
test-mmap-read/567 is trying to acquire lock:
(&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&mm->mmap_sem){++++++}:
...
-> #2 (sr_mutex){+.+.+.}:
...
-> #1 (&bdev->bd_mutex){+.+.+.}:
...
-> #0 (&of->mutex){+.+.+.}:
...
other info that might help us debug this:
Chain exists of:
&of->mutex --> sr_mutex --> &mm->mmap_sem
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_sem);
lock(sr_mutex);
lock(&mm->mmap_sem);
lock(&of->mutex);
*** DEADLOCK ***
1 lock held by test-mmap-read/567:
#0: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
stack backtrace:
CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
Call Trace:
[<ffffffff81611ad2>] dump_stack+0x4e/0x7a
[<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
[<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
[<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
[<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
[<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
[<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
[<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
[<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
[<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
[<ffffffff81008282>] SyS_mmap+0x22/0x30
[<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
This happens because one file nests sr_mutex, which nests mm->mmap_sem
under it, under of->mutex while mmap implementation naturally nests
of->mutex under mm->mmap_sem. The warning is false positive as
of->mutex is per open-file and the two paths belong to two different
files. This warning didn't trigger before regular and bin file
supports were merged because only bin file supported mmap and the
other side of locking happened only on regular files which used
equivalent but separate locking.
It'd be best if we give separate locking classes per file but we can't
easily do that. Let's differentiate on ->mmap() for now. Later we'll
add explicit file operations struct and can add per-ops lockdep key
there.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit bcdde7e221 (sysfs: make __sysfs_remove_dir() recursive) changed
the behavior so that directory removals will be done recursively. This
means that the sysfs group might already be removed if its parent directory
has been removed.
The current code outputs warnings similar to following log snippet when it
detects that there is no group for the given kobject:
WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
Modules linked in:
CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
Call Trace:
[<ffffffff817daab1>] dump_stack+0x45/0x56
[<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
[<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
[<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
[<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
[<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
[<ffffffff8142a0d0>] device_del+0x40/0x1b0
[<ffffffff8142a24d>] device_unregister+0xd/0x20
[<ffffffff8144131a>] scsi_remove_host+0xba/0x110
[<ffffffff8145f526>] ata_host_detach+0xc6/0x100
[<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
[<ffffffff812e8f48>] pci_device_remove+0x28/0x60
[<ffffffff8142d854>] __device_release_driver+0x64/0xd0
[<ffffffff8142d8de>] device_release_driver+0x1e/0x30
[<ffffffff8142d257>] bus_remove_device+0xf7/0x140
[<ffffffff8142a1b1>] device_del+0x121/0x1b0
[<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
[<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
[<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
[<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
[<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
[<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
[<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
[<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
[<ffffffff812fd90d>] hotplug_event+0xcd/0x160
[<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
[<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
[<ffffffff8105cf3a>] process_one_work+0x17a/0x430
[<ffffffff8105db29>] worker_thread+0x119/0x390
[<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
[<ffffffff81063a5d>] kthread+0xcd/0xf0
[<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
[<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
[<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
On this particular machine I see ~16 of these message during Thunderbolt
hot-unplug.
Fix this in similar way that was done for sysfs_remove_one() by checking
if the parent directory has already been removed and bailing out early.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit cb26a31157.
It mysteriously causes NetworkManager to not find the wireless device
for me. As far as I can tell, Tejun *meant* for this commit to not make
any semantic changes, but there clearly are some. So revert it, taking
into account some of the calling convention changes that happened in
this area in subsequent commits.
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysfs_assoc_lock is an odd piece of locking. In general, whoever owns
a kobject is responsible for synchronizing sysfs operations and sysfs
proper assumes that, for example, removal won't race with any other
operation; however, this doesn't work for symlinking because an entity
performing symlink doesn't usually own the target kobject and thus has
no control over its removal.
sysfs_assoc_lock synchronizes symlink operations against kobj->sd
disassociation so that symlink code doesn't end up dereferencing
already freed sysfs_dirent by racing with removal of the target
kobject.
This is quite obscure and the generic name of the lock and lack of
comments make it difficult to understand its role. Let's rename it to
sysfs_symlink_target_lock and add comments explaining what's going on.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
13c589d5b0 ("sysfs: use seq_file when reading regular files")
converted regular sysfs files to use seq_file. The commit substituted
generic_file_llseek() with seq_lseek() for llseek implementation.
Before the change, all regular sysfs files were allowed to seek to any
position in [0, PAGE_SIZE] as the file size is always PAGE_SIZE and
generic_file_llseek() allows any seeking inside the range under file
size; however, seq_lseek()'s behavior is different. It traverses the
output by repeatedly invoking ->show() until it reaches the target
offset or traversal indicates EOF. As seq_files are fully dynamic and
may not end at all, it doesn't support seeking from the end
(SEEK_END).
Apparently, there are userland tools which uses SEEK_END to discover
the buffer size to use and the switch to seq_lseek() disturbs them as
SEEK_END fails with -EINVAL.
The only benefits of using seq_lseek() instead of
generic_file_llseek() are
* Early failure. If traversing to certain file position should fail,
seq_lseek() will report such failures on lseek(2) instead of the
following read/write operations.
* EOF detection. While SEEK_END is not supported, SEEK_SET/CUR +
large offset can be used to detect eof - eof at the time of the seek
anyway as the file size may change dynamically.
Both aren't necessary for sysfs or prospect kernfs users. Revert to
genefic_file_llseek() and preserve the original behavior.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Link: https://lkml.kernel.org/r/20131031114358.GA5551@osiris
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Both POSIX.1-2008 and Linux Programmer's Manual have a dedicated return
error code for a case, when a file doesn't support mmap(), it's ENODEV.
This change replaces overloaded EINVAL with ENODEV in a situation
described above for sysfs binary files.
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Separate out sysfs_warn_dup() out of sysfs_add_one(). This will help
separating out the core sysfs functionalities into kernfs so that it
can be used by non-sysfs users too.
This doesn't make any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Most removal related logic is implemented in fs/sysfs/dir.c. Move
sysfs_hash_and_remove() to fs/sysfs/dir.c so that __sysfs_remove()
doesn't have to be public.
This is pure relocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs_get_dentry() has been gone for years now. Remove the left-over
prototype.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
ignore_lockdep is currently honored only for regular files. There's
no reason to ignore it for bin files. Update sysfs_ignore_lockdep()
so that bin_attr.attr.ignore_lockdep works too.
While this doesn't have any in-kernel user, this unifies the behaviors
between regular and bin files and will help later changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
3124eb1679 ("sysfs: merge regular and bin file handling") folded bin
file handling into regular file handling. Among other things, bin
file now shares the same open path including sysfs_open_dirent
association using sysfs_dirent->s_attr.open. This is buggy because
->s_bin_attr lives in the same union and doesn't have the field. This
bug doesn't trigger because sysfs_elem_bin_attr doesn't have an active
field at the conflicting position. It does have a field "buffers" but
it isn't used anymore.
This patch collapses sysfs_elem_bin_attr into sysfs_elem_attr so that
the bin_attr is accessed through ->s_attr.bin_attr which lives with
->s_attr.attr in an anonymous union. The code paths already assume
bin_attr contains attr as the first element, so this doesn't add any
more assumptions while making it explicit that the two types are
handled together.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Before patch(sysfs: prepare path write for unified regular / bin
file handling), when size of bin file is zero, writting still can
continue, but this patch changes the behaviour.
The worse thing is that firmware loader is broken by this patch,
and user space application can't write to firmware bin file any more
because both firmware loader and drivers can't know at advance how
large the firmware file is and have to set its initialized size as
zero.
This patch fixes the problem and keeps behaviour of writting to bin
as before.
Reported-by: Lothar Waßmann <LW@karo-electronics.de>
Tested-by: Lothar Waßmann <LW@karo-electronics.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While looking at the code, I noticed that bin_attribute read() and write()
ops copy the inode size into an int for futher comparisons.
Some bin_attributes can be fairly large. For example, pci creates some for
BARs set to the BAR size and giant BARs are around the corner, so this is
going to break something somewhere eventually.
Let's use the right type.
[adjust for seqfile conversions, only needed for bin_read() - gkh]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
375b611e60 ("sysfs: remove sysfs_buffer->ops") introduced
sysfs_file_ops() which determines the associated file operation of a
given sysfs_dirent. As file ops access should be protected by an
active reference, the new function includes a lockdep assertion on the
sysfs_dirent; unfortunately, I forgot to take attr->ignore_lockdep
flag into account and the lockdep assertion trips spuriously for files
which opt out from active reference lockdep checking.
# cat /sys/devices/pci0000:00/0000:00:01.2/usb1/authorized
------------[ cut here ]------------
WARNING: CPU: 1 PID: 540 at /work/os/work/fs/sysfs/file.c:79 sysfs_file_ops+0x4e/0x60()
Modules linked in:
CPU: 1 PID: 540 Comm: cat Not tainted 3.11.0-work+ #3
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
0000000000000009 ffff880016205c08 ffffffff81ca0131 0000000000000000
ffff880016205c40 ffffffff81096d0d ffff8800166cb898 ffff8800166f6f60
ffffffff8125a220 ffff880011ab1ec0 ffff88000aff0c78 ffff880016205c50
Call Trace:
[<ffffffff81ca0131>] dump_stack+0x4e/0x82
[<ffffffff81096d0d>] warn_slowpath_common+0x7d/0xa0
[<ffffffff81096dea>] warn_slowpath_null+0x1a/0x20
[<ffffffff8125994e>] sysfs_file_ops+0x4e/0x60
[<ffffffff8125a274>] sysfs_open_file+0x54/0x300
[<ffffffff811df612>] do_dentry_open.isra.17+0x182/0x280
[<ffffffff811df820>] finish_open+0x30/0x40
[<ffffffff811f0623>] do_last+0x503/0xd90
[<ffffffff811f0f6b>] path_openat+0xbb/0x6d0
[<ffffffff811f23ba>] do_filp_open+0x3a/0x90
[<ffffffff811e09a9>] do_sys_open+0x129/0x220
[<ffffffff811e0abe>] SyS_open+0x1e/0x20
[<ffffffff81caf3c2>] system_call_fastpath+0x16/0x1b
---[ end trace aa48096b111dafdb ]---
Rename fs/sysfs/dir.c::ignore_lockdep() to sysfs_ignore_lockdep() and
move it to fs/sysfs/sysfs.h and make sysfs_file_ops() skip lockdep
assertion if sysfs_ignore_lockdep() is true.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
With the previous changes, sysfs regular file code is ready to handle
bin files too. This patch makes bin files share the regular file
path.
* sysfs_create/remove_bin_file() are moved to fs/sysfs/file.c.
* sysfs_init_inode() is updated to use the new sysfs_bin_operations
instead of bin_fops for bin files.
* fs/sysfs/bin.c and the related pieces are removed.
This patch shouldn't introduce any behavior difference to bin file
accesses.
Overall, this unification reduces the amount of duplicate logic, makes
behaviors more consistent and paves the road for building simpler and
more versatile interface which will allow other subsystems to make use
of sysfs for their pseudo filesystems.
v2: Stale fs/sysfs/bin.c reference dropped from
Documentation/DocBook/filesystems.tmpl. Reported by kbuild test
robot.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs bin file handling will be merged into the regular file support.
This patch prepares the open path.
This patch updates sysfs_open_file() such that it can handle both
regular and bin files.
This is a preparation and the new bin file path isn't used yet.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs bin file handling will be merged into the regular file support.
This patch copies mmap support from bin so that fs/sysfs/file.c can
handle mmapping bin files.
The code is copied mostly verbatim with the following updates.
* ->mmapped and ->vm_ops are added to sysfs_open_file and bin_buffer
references are replaced with sysfs_open_file ones.
* Symbols are prefixed with sysfs_.
* sysfs_unmap_bin_file() grabs sysfs_open_dirent and traverses
->files. Invocation of this function is added to
sysfs_addrm_finish().
* sysfs_bin_mmap() is added to sysfs_bin_operations.
This is a preparation and the new mmap path isn't used yet.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs bin file handling will be merged into the regular file support.
This patch prepares the read path.
Copy fs/sysfs/bin.c::read() to fs/sysfs/file.c and make it use
sysfs_open_file instead of bin_buffer. The function is identical copy
except for the use of sysfs_open_file.
The new function is added to sysfs_bin_operations. This isn't used
yet but will eventually replace fs/sysfs/bin.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs bin file handling will be merged into the regular file support.
This patch prepares the write path.
bin file write is almost identical to regular file write except that
the write length is capped by the inode size and @off is passed to the
write method. This patch adds bin file handling to sysfs_write_file()
so that it can handle both regular and bin files.
A new file_operations struct sysfs_bin_operations is added, which
currently only hosts sysfs_write_file() and generic_file_llseek().
This isn't used yet but will eventually replace fs/sysfs/bin.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
read() is simple enough and fill_read() being in a separate function
doesn't add anything. Let's collapse it into read(). This will make
merging bin file handling with regular file.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After b31ca3f5df ("sysfs: fix deadlock"), bin read() first writes
data to bb->buffer and bounces it to a transient kernel buffer which
is then copied out to userland. The double bouncing doesn't add
anything. Let's just use the transient buffer directly.
While at it, rename @temp to @buf for clarity.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs read path implements its own buffering scheme between userland
and kernel callbacks, which essentially is a degenerate duplicate of
seq_file. This patch replaces the custom read buffering
implementation in sysfs with seq_file.
While the amount of code reduction is small, this reduces low level
hairiness and enables future development of a new versatile API based
on seq_file so that sysfs features can be shared with other
subsystems.
As write path was already converted to not use sysfs_open_file->page,
this patch makes ->page and ->count unused and removes them.
Userland behavior remains the same except for some extreme corner
cases - e.g. sysfs will now regenerate the content each time a file is
read after a non-contiguous seek whereas the original code would keep
using the same content. While this is a userland visible behavior
change, it is extremely unlikely to be noticeable and brings sysfs
behavior closer to that of procfs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There isn't much to be gained by keeping around kernel buffer while a
file is open especially as the read path planned to be converted to
use seq_file and won't use the buffer. This patch makes
sysfs_write_file() use per-write transient buffer instead of
sysfs_open_file->page.
This simplifies the write path, enables removing sysfs_open_file->page
once read path is updated and will help merging bin file write path
which already requires the use of a transient buffer due to a locking
order issue.
As the function comments of flush_write_buffer() and
sysfs_write_buffer() are being updated anyway, reformat them so that
they're more conventional.
v2: Use min_t() instead of min() in sysfs_write_file() to avoid build
warning on arm. Reported by build test robot.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs will be converted to use seq_file for read path, which will make
it difficult to pass around multiple pointers directly. This patch
adds sysfs_open_file->sd and ->file so that we can reach all the
necessary data structures from sysfs_open_file.
flush_write_buffer() is updated to drop @dentry which was used to
discover the sysfs_dirent as it's now available through
sysfs_open_file->sd.
This patch doesn't cause any behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs read path will be converted to use seq_file which will handle
buffering making sysfs_buffer a misnomer. Rename sysfs_buffer to
sysfs_open_file, and sysfs_open_dirent->buffers to ->files.
This path is pure rename.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add a separate mutex to protect sysfs_open_dirent->buffers list. This
will allow performing sleepable operations while traversing
sysfs_buffers, which will be renamed to sysfs_open_file.
Note that currently sysfs_open_dirent->buffers list isn't being used
for anything and this patch doesn't make any functional difference.
It will be used to merge regular and bin file supports.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, sysfs_ops is fetched during sysfs_open_file() and cached in
sysfs_buffer->ops to be used while the file is open. This patch
removes the caching and makes each operation directly fetch sysfs_ops.
This patch doesn't introduce any behavior difference and is to prepare
for merging regular and bin file supports.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
->needs_read_fill is used to implement the following behaviors.
1. Ensure buffer filling on the first read.
2. Force buffer filling after a write.
3. Force buffer filling after a successful poll.
However, #2 and #3 don't really work as sysfs doesn't reset file
position. While the read buffer would be refilled, the next read
would continue from the position after the last read or write,
requiring an explicit seek to the start for it to be useful, which
makes ->needs_read_fill superflous as read buffer is always refilled
if f_pos == 0.
Update sysfs_read_file() to test buffer->page for #1 instead and
remove ->needs_read_fill. While this changes behavior in extreme
corner cases - e.g. re-reading a sysfs file after seeking to non-zero
position after a write or poll, it's highly unlikely to lead to actual
breakage. This change is to prepare for using seq_file in the read
path.
While at it, reformat a comment in fill_write_buffer().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Given a sysfs_dirent, there is no reason to have multiple versions of
removal functions. A function which removes the specified
sysfs_dirent and its descendants is enough.
This patch intorduces [__}sysfs_remove() which replaces all internal
variations of removal functions. This will be the only removal
function in the planned new sysfs_dirent based interface.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, sysfs directory removal is inconsistent in that it would
remove any files directly under it but wouldn't recurse into
directories. Thanks to group subdirectories, this doesn't even match
with kobject boundaries. sysfs is in the process of being separated
out so that it can be used by multiple subsystems and we want to have
a consistent behavior - either removal of a sysfs_dirent should remove
every descendant entries or none instead of something inbetween.
This patch implements proper recursive removal in
__sysfs_remove_dir(). The function now walks its subtree in a
post-order walk to remove all descendants.
This is a behavior change but kobject / driver layer, which currently
is the only consumer, has already been updated to handle duplicate
removal attempts, so nothing should be broken after this change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs currently has a rather weird behavior regarding removals. A
directory removal would delete all files directly under it but
wouldn't recurse into subdirectories, which, while a bit inconsistent,
seems to make sense at the first glance as each directory is
supposedly associated with a kobject and each kobject can take care of
the directory deletion; however, this doesn't really hold as we have
groups which can be directories without a kobject associated with it
and require explicit deletions.
We're in the process of separating out sysfs from kboject / driver
core and want a consistent behavior. A removal should delete either
only the specified node or everything under it. I think it is helpful
to support recursive atomic removal and later patches will implement
it.
Such change means that a sysfs_dirent associated with kobject may be
deleted before the kobject itself is removed if one of its ancestor
gets removed before it. As sysfs_remove_dir() puts the base ref, we
may end up with dangling pointer on descendants. This can be solved
by holding an extra reference on the sd from kobject.
Acquire an extra reference on the associated sysfs_dirent on directory
creation and put it after removal.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs_addrm_start/finish() enclose sysfs_dirent additions and
deletions and sysfs_addrm_cxt is used to record information necessary
to finish the operations. Currently, sysfs_addrm_start() takes
@parent_sd, records it in sysfs_addrm_cxt, and assumes that all
operations in the block are performed under that @parent_sd.
This assumption has been fine until now but we want to make some
operations behave recursively and, while having @parent_sd recorded in
sysfs_addrm_cxt doesn't necessarily prevents that, it becomes
confusing.
This patch removes sysfs_addrm_cxt->parent_sd and makes
sysfs_add_one() take an explicit @parent_sd parameter. Note that
sysfs_remove_one() doesn't need the extra argument as its parent is
always known from the target @sd.
While at it, add __acquires/releases() notations to
sysfs_addrm_start/finish() respectively.
This patch doesn't make any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some internal sysfs functions which take explicit namespace argument
are weird in that they place the optional @ns in front of @name which
is contrary to the established convention. This is confusing and
error-prone especially as @ns and @name may be interchanged without
causing compilation warning.
Swap the positions of @name and @ns in the following internal
functions.
sysfs_find_dirent()
sysfs_rename()
sysfs_hash_and_remove()
sysfs_name_hash()
sysfs_name_compare()
create_dir()
This patch doesn't introduce any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The pre-existing sysfs interfaces which take explicit namespace
argument are weird in that they place the optional @ns in front of
@name which is contrary to the established convention. For example,
we end up forcing vast majority of sysfs_get_dirent() users to do
sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
especially as @ns and @name may be interchanged without causing
compilation warning.
This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
around sysfs_get_dirent_ns(). This makes confusions a lot less
likely.
There are other interfaces which take @ns before @name. They'll be
updated by following patches.
This patch doesn't introduce any functional changes.
v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
error on module builds. Reported by build test robot. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The way namespace tags are implemented in sysfs is more complicated
than necessary. As each tag is a pointer value and required to be
non-NULL under a namespace enabled parent, there's no need to record
separately what type each tag is or where namespace is enabled.
If multiple namespace types are needed, which currently aren't, we can
simply compare the tag to a set of allowed tags in the superblock
assuming that the tags, being pointers, won't have the same value
across multiple types. Also, whether to filter by namespace tag or
not can be trivially determined by whether the node has any tagged
children or not.
This patch rips out kobj_ns_type handling from sysfs. sysfs no longer
cares whether specific type of namespace is enabled or not. If a
sysfs_dirent has a non-NULL tag, the parent is marked as needing
namespace filtering and the value is tested against the allowed set of
tags for the superblock (currently only one but increasing this number
isn't difficult) and the sysfs_dirent is ignored if it doesn't match.
This removes most kobject namespace knowledge from sysfs proper which
will enable proper separation and layering of sysfs. The namespace
sanity checks in fs/sysfs/dir.c are replaced by the new sanity check
in kobject_namespace(). As this is the only place ktype->namespace()
is called for sysfs, this doesn't weaken the sanity check
significantly. I omitted converting the sanity check in
sysfs_do_create_link_sd(). While the check can be shifted to upper
layer, mistakes there are well contained and should be easily visible
anyway.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's no reason for sysfs to be calling ktype->namespace(). It is
backwards, obfuscates what's going on and unnecessarily tangles two
separate layers.
There are two places where symlink code calls ktype->namespace().
* sysfs_do_create_link_sd() calls it to find out the namespace tag of
the target directory. Unless symlinking races with cross-namespace
renaming, this equals @target_sd->s_ns.
* sysfs_rename_link() uses it to find out the new namespace to rename
to and the new namespace can be different from the existing one.
The function is renamed to sysfs_rename_link_ns() with an explicit
@ns argument and the ktype->namespace() invocation is shifted to the
device layer.
While this patch replaces ktype->namespace() invocation with the
recorded result in @target_sd, this shouldn't result in any behvior
difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
For some unrecognizable reason, namespace information is communicated
to sysfs through ktype->namespace() callback when there's *nothing*
which needs the use of a callback. The whole sequence of operations
is completely synchronous and sysfs operations simply end up calling
back into the layer which just invoked it in order to find out the
namespace information, which is completely backwards, obfuscates
what's going on and unnecessarily tangles two separate layers.
This patch doesn't remove ktype->namespace() but shifts its handling
to kobject layer. We probably want to get rid of the callback in the
long term.
This patch adds an explicit param to sysfs_{create|rename|move}_dir()
and renames them to sysfs_{create|rename|move}_dir_ns(), respectively.
ktype->namespace() invocations are moved to the calling sites of the
above functions. A new helper kboject_namespace() is introduced which
directly tests kobj_ns_type_operations->type which should give the
same result as testing sysfs_fs_type(parent_sd) and returns @kobj's
namespace tag as necessary. kobject_namespace() is extern as it will
be used from another file in the following patches.
This patch should be an equivalent conversion without any functional
difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
sysfs ns (namespace) implementation became more convoluted than
necessary while trying to hide ns information from visible interface.
The relatively recent attr ns support is a good example.
* attr ns tag is determined by sysfs_ops->namespace() callback while
dir tag is determined by kobj_type->namespace(). The placement is
arbitrary.
* Instead of performing operations with explicit ns tag, the namespace
callback is routed through sysfs_attr_ns(), sysfs_ops->namespace(),
class_attr_namespace(), class_attr->namespace(). It's not simpler
in any sense. The only thing this convolution does is traversing
the whole stack backwards.
The namespace callbacks are unncessary because the operations involved
are inherently synchronous. The information can be provided in in
straight-forward top-down direction and reversing that direction is
unnecessary and against basic design principles.
This backward interface is unnecessarily convoluted and hinders
properly separating out sysfs from driver model / kobject for proper
layering. This patch updates attr ns support such that
* sysfs_ops->namespace() and class_attr->namespace() are dropped.
* sysfs_{create|remove}_file_ns(), which take explicit @ns param, are
added and sysfs_{create|remove}_file() are now simple wrappers
around the ns aware functions.
* ns handling is dropped from sysfs_chmod_file(). Nobody uses it at
this point. sysfs_chmod_file_ns() can be added later if necessary.
* Explicit @ns is propagated through class_{create|remove}_file_ns()
and netdev_class_{create|remove}_file_ns().
* driver/net/bonding which is currently the only user of attr
namespace is updated to use netdev_class_{create|remove}_file_ns()
with @bh->net as the ns tag instead of using the namespace callback.
This patch should be an equivalent conversion without any functional
difference. It makes the code easier to follow, reduces lines of code
a bit and helps proper separation and layering.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Kay Sievers <kay@vrfy.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The expansion of to_sysfs_dirent() contains an unncessary trailing
semicolon making it impossible to use in the middle of statements.
Drop it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull vfs pile 2 (of many) from Al Viro:
"Mostly Miklos' series this time"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify dcache.c inlined helpers where possible
fuse: drop dentry on failed revalidate
fuse: clean up return in fuse_dentry_revalidate()
fuse: use d_materialise_unique()
sysfs: use check_submounts_and_drop()
nfs: use check_submounts_and_drop()
gfs2: use check_submounts_and_drop()
afs: use check_submounts_and_drop()
vfs: check unlinked ancestors before mount
vfs: check submounts and drop atomically
vfs: add d_walk()
vfs: restructure d_genocide()
Pull namespace changes from Eric Biederman:
"This is an assorted mishmash of small cleanups, enhancements and bug
fixes.
The major theme is user namespace mount restrictions. nsown_capable
is killed as it encourages not thinking about details that need to be
considered. A very hard to hit pid namespace exiting bug was finally
tracked and fixed. A couple of cleanups to the basic namespace
infrastructure.
Finally there is an enhancement that makes per user namespace
capabilities usable as capabilities, and an enhancement that allows
the per userns root to nice other processes in the user namespace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Kill nsown_capable it makes the wrong thing easy
capabilities: allow nice if we are privileged
pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
userns: Allow PR_CAPBSET_DROP in a user namespace.
namespaces: Simplify copy_namespaces so it is clear what is going on.
pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
sysfs: Restrict mounting sysfs
userns: Better restrictions on when proc and sysfs can be mounted
vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces
kernel/nsproxy.c: Improving a snippet of code.
proc: Restrict mounting the proc filesystem
vfs: Lock in place mounts from more privileged users
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.
check_submounts_and_drop() can deal with negative dentries and
non-directories as well.
Non-directories can also be mounted on. And just like directories we don't
want these to disappear with invalidation.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Don't allow mounting sysfs unless the caller has CAP_SYS_ADMIN rights
over the net namespace. The principle here is if you create or have
capabilities over it you can mount it, otherwise you get to live with
what other people have mounted.
Instead of testing this with a straight forward ns_capable call,
perform this check the long and torturous way with kobject helpers,
this keeps direct knowledge of namespaces out of sysfs, and preserves
the existing sysfs abstractions.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Rely on the fact that another flavor of the filesystem is already
mounted and do not rely on state in the user namespace.
Verify that the mounted filesystem is not covered in any significant
way. I would love to verify that the previously mounted filesystem
has no mounts on top but there are at least the directories
/proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
for other filesystems to mount on top of.
Refactor the test into a function named fs_fully_visible and call that
function from the mount routines of proc and sysfs. This makes this
test local to the filesystems involved and the results current of when
the mounts take place, removing a weird threading of the user
namespace, the mount namespace and the filesystems themselves.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Fix up the wording of sysfs_create/remove_groups() a bit.
Reported-by: Anthony Foiani <tkil@scrye.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes the coding style warnings in fs/sysfs/file.c for broken
strings across lines.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>