mirror of
https://github.com/torvalds/linux.git
synced 2024-11-24 21:21:41 +00:00
docs: fs: convert docs without extension to ReST
There are 3 remaining files without an extension inside the fs docs dir. Manually convert them to ReST. In the case of the nfs/exporting.rst file, as the nfs docs aren't ported yet, I opted to convert and add a :orphan: there, with should be removed when it gets added into a nfs-specific part of the fs documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
parent
5a5e045bb3
commit
ec23eb54fb
@ -1,12 +1,17 @@
|
|||||||
Locking scheme used for directory operations is based on two
|
=================
|
||||||
|
Directory Locking
|
||||||
|
=================
|
||||||
|
|
||||||
|
|
||||||
|
Locking scheme used for directory operations is based on two
|
||||||
kinds of locks - per-inode (->i_rwsem) and per-filesystem
|
kinds of locks - per-inode (->i_rwsem) and per-filesystem
|
||||||
(->s_vfs_rename_mutex).
|
(->s_vfs_rename_mutex).
|
||||||
|
|
||||||
When taking the i_rwsem on multiple non-directory objects, we
|
When taking the i_rwsem on multiple non-directory objects, we
|
||||||
always acquire the locks in order by increasing address. We'll call
|
always acquire the locks in order by increasing address. We'll call
|
||||||
that "inode pointer" order in the following.
|
that "inode pointer" order in the following.
|
||||||
|
|
||||||
For our purposes all operations fall in 5 classes:
|
For our purposes all operations fall in 5 classes:
|
||||||
|
|
||||||
1) read access. Locking rules: caller locks directory we are accessing.
|
1) read access. Locking rules: caller locks directory we are accessing.
|
||||||
The lock is taken shared.
|
The lock is taken shared.
|
||||||
@ -27,25 +32,29 @@ NB: we might get away with locking the the source (and target in exchange
|
|||||||
case) shared.
|
case) shared.
|
||||||
|
|
||||||
5) link creation. Locking rules:
|
5) link creation. Locking rules:
|
||||||
|
|
||||||
* lock parent
|
* lock parent
|
||||||
* check that source is not a directory
|
* check that source is not a directory
|
||||||
* lock source
|
* lock source
|
||||||
* call the method.
|
* call the method.
|
||||||
|
|
||||||
All locks are exclusive.
|
All locks are exclusive.
|
||||||
|
|
||||||
6) cross-directory rename. The trickiest in the whole bunch. Locking
|
6) cross-directory rename. The trickiest in the whole bunch. Locking
|
||||||
rules:
|
rules:
|
||||||
|
|
||||||
* lock the filesystem
|
* lock the filesystem
|
||||||
* lock parents in "ancestors first" order.
|
* lock parents in "ancestors first" order.
|
||||||
* find source and target.
|
* find source and target.
|
||||||
* if old parent is equal to or is a descendent of target
|
* if old parent is equal to or is a descendent of target
|
||||||
fail with -ENOTEMPTY
|
fail with -ENOTEMPTY
|
||||||
* if new parent is equal to or is a descendent of source
|
* if new parent is equal to or is a descendent of source
|
||||||
fail with -ELOOP
|
fail with -ELOOP
|
||||||
* If it's an exchange, lock both the source and the target.
|
* If it's an exchange, lock both the source and the target.
|
||||||
* If the target exists, lock it. If the source is a non-directory,
|
* If the target exists, lock it. If the source is a non-directory,
|
||||||
lock it. If we need to lock both, do so in inode pointer order.
|
lock it. If we need to lock both, do so in inode pointer order.
|
||||||
* call the method.
|
* call the method.
|
||||||
|
|
||||||
All ->i_rwsem are taken exclusive. Again, we might get away with locking
|
All ->i_rwsem are taken exclusive. Again, we might get away with locking
|
||||||
the the source (and target in exchange case) shared.
|
the the source (and target in exchange case) shared.
|
||||||
|
|
||||||
@ -54,10 +63,11 @@ read, modified or removed by method will be locked by caller.
|
|||||||
|
|
||||||
|
|
||||||
If no directory is its own ancestor, the scheme above is deadlock-free.
|
If no directory is its own ancestor, the scheme above is deadlock-free.
|
||||||
|
|
||||||
Proof:
|
Proof:
|
||||||
|
|
||||||
First of all, at any moment we have a partial ordering of the
|
First of all, at any moment we have a partial ordering of the
|
||||||
objects - A < B iff A is an ancestor of B.
|
objects - A < B iff A is an ancestor of B.
|
||||||
|
|
||||||
That ordering can change. However, the following is true:
|
That ordering can change. However, the following is true:
|
||||||
|
|
||||||
@ -77,32 +87,32 @@ objects - A < B iff A is an ancestor of B.
|
|||||||
non-directory object, except renames, which take locks on source and
|
non-directory object, except renames, which take locks on source and
|
||||||
target in inode pointer order in the case they are not directories.)
|
target in inode pointer order in the case they are not directories.)
|
||||||
|
|
||||||
Now consider the minimal deadlock. Each process is blocked on
|
Now consider the minimal deadlock. Each process is blocked on
|
||||||
attempt to acquire some lock and already holds at least one lock. Let's
|
attempt to acquire some lock and already holds at least one lock. Let's
|
||||||
consider the set of contended locks. First of all, filesystem lock is
|
consider the set of contended locks. First of all, filesystem lock is
|
||||||
not contended, since any process blocked on it is not holding any locks.
|
not contended, since any process blocked on it is not holding any locks.
|
||||||
Thus all processes are blocked on ->i_rwsem.
|
Thus all processes are blocked on ->i_rwsem.
|
||||||
|
|
||||||
By (3), any process holding a non-directory lock can only be
|
By (3), any process holding a non-directory lock can only be
|
||||||
waiting on another non-directory lock with a larger address. Therefore
|
waiting on another non-directory lock with a larger address. Therefore
|
||||||
the process holding the "largest" such lock can always make progress, and
|
the process holding the "largest" such lock can always make progress, and
|
||||||
non-directory objects are not included in the set of contended locks.
|
non-directory objects are not included in the set of contended locks.
|
||||||
|
|
||||||
Thus link creation can't be a part of deadlock - it can't be
|
Thus link creation can't be a part of deadlock - it can't be
|
||||||
blocked on source and it means that it doesn't hold any locks.
|
blocked on source and it means that it doesn't hold any locks.
|
||||||
|
|
||||||
Any contended object is either held by cross-directory rename or
|
Any contended object is either held by cross-directory rename or
|
||||||
has a child that is also contended. Indeed, suppose that it is held by
|
has a child that is also contended. Indeed, suppose that it is held by
|
||||||
operation other than cross-directory rename. Then the lock this operation
|
operation other than cross-directory rename. Then the lock this operation
|
||||||
is blocked on belongs to child of that object due to (1).
|
is blocked on belongs to child of that object due to (1).
|
||||||
|
|
||||||
It means that one of the operations is cross-directory rename.
|
It means that one of the operations is cross-directory rename.
|
||||||
Otherwise the set of contended objects would be infinite - each of them
|
Otherwise the set of contended objects would be infinite - each of them
|
||||||
would have a contended child and we had assumed that no object is its
|
would have a contended child and we had assumed that no object is its
|
||||||
own descendent. Moreover, there is exactly one cross-directory rename
|
own descendent. Moreover, there is exactly one cross-directory rename
|
||||||
(see above).
|
(see above).
|
||||||
|
|
||||||
Consider the object blocking the cross-directory rename. One
|
Consider the object blocking the cross-directory rename. One
|
||||||
of its descendents is locked by cross-directory rename (otherwise we
|
of its descendents is locked by cross-directory rename (otherwise we
|
||||||
would again have an infinite set of contended objects). But that
|
would again have an infinite set of contended objects). But that
|
||||||
means that cross-directory rename is taking locks out of order. Due
|
means that cross-directory rename is taking locks out of order. Due
|
||||||
@ -112,7 +122,7 @@ try to acquire lock on descendent before the lock on ancestor.
|
|||||||
Contradiction. I.e. deadlock is impossible. Q.E.D.
|
Contradiction. I.e. deadlock is impossible. Q.E.D.
|
||||||
|
|
||||||
|
|
||||||
These operations are guaranteed to avoid loop creation. Indeed,
|
These operations are guaranteed to avoid loop creation. Indeed,
|
||||||
the only operation that could introduce loops is cross-directory rename.
|
the only operation that could introduce loops is cross-directory rename.
|
||||||
Since the only new (parent, child) pair added by rename() is (new parent,
|
Since the only new (parent, child) pair added by rename() is (new parent,
|
||||||
source), such loop would have to contain these objects and the rest of it
|
source), such loop would have to contain these objects and the rest of it
|
||||||
@ -123,13 +133,13 @@ new parent had been equal to or a descendent of source since the moment when
|
|||||||
we had acquired filesystem lock and rename() would fail with -ELOOP in that
|
we had acquired filesystem lock and rename() would fail with -ELOOP in that
|
||||||
case.
|
case.
|
||||||
|
|
||||||
While this locking scheme works for arbitrary DAGs, it relies on
|
While this locking scheme works for arbitrary DAGs, it relies on
|
||||||
ability to check that directory is a descendent of another object. Current
|
ability to check that directory is a descendent of another object. Current
|
||||||
implementation assumes that directory graph is a tree. This assumption is
|
implementation assumes that directory graph is a tree. This assumption is
|
||||||
also preserved by all operations (cross-directory rename on a tree that would
|
also preserved by all operations (cross-directory rename on a tree that would
|
||||||
not introduce a cycle will leave it a tree and link() fails for directories).
|
not introduce a cycle will leave it a tree and link() fails for directories).
|
||||||
|
|
||||||
Notice that "directory" in the above == "anything that might have
|
Notice that "directory" in the above == "anything that might have
|
||||||
children", so if we are going to introduce hybrid objects we will need
|
children", so if we are going to introduce hybrid objects we will need
|
||||||
either to make sure that link(2) doesn't work for them or to make changes
|
either to make sure that link(2) doesn't work for them or to make changes
|
||||||
in is_subdir() that would make it work even in presence of such beasts.
|
in is_subdir() that would make it work even in presence of such beasts.
|
@ -20,6 +20,8 @@ algorithms work.
|
|||||||
path-lookup
|
path-lookup
|
||||||
api-summary
|
api-summary
|
||||||
splice
|
splice
|
||||||
|
locking
|
||||||
|
directory-locking
|
||||||
|
|
||||||
Filesystem support layers
|
Filesystem support layers
|
||||||
=========================
|
=========================
|
||||||
|
@ -1,14 +1,22 @@
|
|||||||
The text below describes the locking rules for VFS-related methods.
|
=======
|
||||||
|
Locking
|
||||||
|
=======
|
||||||
|
|
||||||
|
The text below describes the locking rules for VFS-related methods.
|
||||||
It is (believed to be) up-to-date. *Please*, if you change anything in
|
It is (believed to be) up-to-date. *Please*, if you change anything in
|
||||||
prototypes or locking protocols - update this file. And update the relevant
|
prototypes or locking protocols - update this file. And update the relevant
|
||||||
instances in the tree, don't leave that to maintainers of filesystems/devices/
|
instances in the tree, don't leave that to maintainers of filesystems/devices/
|
||||||
etc. At the very least, put the list of dubious cases in the end of this file.
|
etc. At the very least, put the list of dubious cases in the end of this file.
|
||||||
Don't turn it into log - maintainers of out-of-the-tree code are supposed to
|
Don't turn it into log - maintainers of out-of-the-tree code are supposed to
|
||||||
be able to use diff(1).
|
be able to use diff(1).
|
||||||
Thing currently missing here: socket operations. Alexey?
|
|
||||||
|
|
||||||
--------------------------- dentry_operations --------------------------
|
Thing currently missing here: socket operations. Alexey?
|
||||||
prototypes:
|
|
||||||
|
dentry_operations
|
||||||
|
=================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
int (*d_revalidate)(struct dentry *, unsigned int);
|
int (*d_revalidate)(struct dentry *, unsigned int);
|
||||||
int (*d_weak_revalidate)(struct dentry *, unsigned int);
|
int (*d_weak_revalidate)(struct dentry *, unsigned int);
|
||||||
int (*d_hash)(const struct dentry *, struct qstr *);
|
int (*d_hash)(const struct dentry *, struct qstr *);
|
||||||
@ -24,23 +32,30 @@ prototypes:
|
|||||||
struct dentry *(*d_real)(struct dentry *, const struct inode *);
|
struct dentry *(*d_real)(struct dentry *, const struct inode *);
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
rename_lock ->d_lock may block rcu-walk
|
|
||||||
d_revalidate: no no yes (ref-walk) maybe
|
|
||||||
d_weak_revalidate:no no yes no
|
|
||||||
d_hash no no no maybe
|
|
||||||
d_compare: yes no no maybe
|
|
||||||
d_delete: no yes no no
|
|
||||||
d_init: no no yes no
|
|
||||||
d_release: no no yes no
|
|
||||||
d_prune: no yes no no
|
|
||||||
d_iput: no no yes no
|
|
||||||
d_dname: no no no no
|
|
||||||
d_automount: no no yes no
|
|
||||||
d_manage: no no yes (ref-walk) maybe
|
|
||||||
d_real no no yes no
|
|
||||||
|
|
||||||
--------------------------- inode_operations ---------------------------
|
================== =========== ======== ============== ========
|
||||||
prototypes:
|
ops rename_lock ->d_lock may block rcu-walk
|
||||||
|
================== =========== ======== ============== ========
|
||||||
|
d_revalidate: no no yes (ref-walk) maybe
|
||||||
|
d_weak_revalidate: no no yes no
|
||||||
|
d_hash no no no maybe
|
||||||
|
d_compare: yes no no maybe
|
||||||
|
d_delete: no yes no no
|
||||||
|
d_init: no no yes no
|
||||||
|
d_release: no no yes no
|
||||||
|
d_prune: no yes no no
|
||||||
|
d_iput: no no yes no
|
||||||
|
d_dname: no no no no
|
||||||
|
d_automount: no no yes no
|
||||||
|
d_manage: no no yes (ref-walk) maybe
|
||||||
|
d_real no no yes no
|
||||||
|
================== =========== ======== ============== ========
|
||||||
|
|
||||||
|
inode_operations
|
||||||
|
================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
int (*create) (struct inode *,struct dentry *,umode_t, bool);
|
int (*create) (struct inode *,struct dentry *,umode_t, bool);
|
||||||
struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
|
struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
|
||||||
int (*link) (struct dentry *,struct inode *,struct dentry *);
|
int (*link) (struct dentry *,struct inode *,struct dentry *);
|
||||||
@ -68,7 +83,10 @@ prototypes:
|
|||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
all may block
|
all may block
|
||||||
i_rwsem(inode)
|
|
||||||
|
============ =============================================
|
||||||
|
ops i_rwsem(inode)
|
||||||
|
============ =============================================
|
||||||
lookup: shared
|
lookup: shared
|
||||||
create: exclusive
|
create: exclusive
|
||||||
link: exclusive (both)
|
link: exclusive (both)
|
||||||
@ -89,17 +107,21 @@ fiemap: no
|
|||||||
update_time: no
|
update_time: no
|
||||||
atomic_open: exclusive
|
atomic_open: exclusive
|
||||||
tmpfile: no
|
tmpfile: no
|
||||||
|
============ =============================================
|
||||||
|
|
||||||
|
|
||||||
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
|
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
|
||||||
exclusive on victim.
|
exclusive on victim.
|
||||||
cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
|
cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
|
||||||
|
|
||||||
See Documentation/filesystems/directory-locking for more detailed discussion
|
See Documentation/filesystems/directory-locking.rst for more detailed discussion
|
||||||
of the locking scheme for directory operations.
|
of the locking scheme for directory operations.
|
||||||
|
|
||||||
----------------------- xattr_handler operations -----------------------
|
xattr_handler operations
|
||||||
prototypes:
|
========================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
bool (*list)(struct dentry *dentry);
|
bool (*list)(struct dentry *dentry);
|
||||||
int (*get)(const struct xattr_handler *handler, struct dentry *dentry,
|
int (*get)(const struct xattr_handler *handler, struct dentry *dentry,
|
||||||
struct inode *inode, const char *name, void *buffer,
|
struct inode *inode, const char *name, void *buffer,
|
||||||
@ -110,13 +132,20 @@ prototypes:
|
|||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
all may block
|
all may block
|
||||||
i_rwsem(inode)
|
|
||||||
|
===== ==============
|
||||||
|
ops i_rwsem(inode)
|
||||||
|
===== ==============
|
||||||
list: no
|
list: no
|
||||||
get: no
|
get: no
|
||||||
set: exclusive
|
set: exclusive
|
||||||
|
===== ==============
|
||||||
|
|
||||||
|
super_operations
|
||||||
|
================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
--------------------------- super_operations ---------------------------
|
|
||||||
prototypes:
|
|
||||||
struct inode *(*alloc_inode)(struct super_block *sb);
|
struct inode *(*alloc_inode)(struct super_block *sb);
|
||||||
void (*free_inode)(struct inode *);
|
void (*free_inode)(struct inode *);
|
||||||
void (*destroy_inode)(struct inode *);
|
void (*destroy_inode)(struct inode *);
|
||||||
@ -138,7 +167,10 @@ prototypes:
|
|||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
All may block [not true, see below]
|
All may block [not true, see below]
|
||||||
s_umount
|
|
||||||
|
====================== ============ ========================
|
||||||
|
ops s_umount note
|
||||||
|
====================== ============ ========================
|
||||||
alloc_inode:
|
alloc_inode:
|
||||||
free_inode: called from RCU callback
|
free_inode: called from RCU callback
|
||||||
destroy_inode:
|
destroy_inode:
|
||||||
@ -157,6 +189,7 @@ show_options: no (namespace_sem)
|
|||||||
quota_read: no (see below)
|
quota_read: no (see below)
|
||||||
quota_write: no (see below)
|
quota_write: no (see below)
|
||||||
bdev_try_to_free_page: no (see below)
|
bdev_try_to_free_page: no (see below)
|
||||||
|
====================== ============ ========================
|
||||||
|
|
||||||
->statfs() has s_umount (shared) when called by ustat(2) (native or
|
->statfs() has s_umount (shared) when called by ustat(2) (native or
|
||||||
compat), but that's an accident of bad API; s_umount is used to pin
|
compat), but that's an accident of bad API; s_umount is used to pin
|
||||||
@ -164,31 +197,44 @@ the superblock down when we only have dev_t given us by userland to
|
|||||||
identify the superblock. Everything else (statfs(), fstatfs(), etc.)
|
identify the superblock. Everything else (statfs(), fstatfs(), etc.)
|
||||||
doesn't hold it when calling ->statfs() - superblock is pinned down
|
doesn't hold it when calling ->statfs() - superblock is pinned down
|
||||||
by resolving the pathname passed to syscall.
|
by resolving the pathname passed to syscall.
|
||||||
|
|
||||||
->quota_read() and ->quota_write() functions are both guaranteed to
|
->quota_read() and ->quota_write() functions are both guaranteed to
|
||||||
be the only ones operating on the quota file by the quota code (via
|
be the only ones operating on the quota file by the quota code (via
|
||||||
dqio_sem) (unless an admin really wants to screw up something and
|
dqio_sem) (unless an admin really wants to screw up something and
|
||||||
writes to quota files with quotas on). For other details about locking
|
writes to quota files with quotas on). For other details about locking
|
||||||
see also dquot_operations section.
|
see also dquot_operations section.
|
||||||
|
|
||||||
->bdev_try_to_free_page is called from the ->releasepage handler of
|
->bdev_try_to_free_page is called from the ->releasepage handler of
|
||||||
the block device inode. See there for more details.
|
the block device inode. See there for more details.
|
||||||
|
|
||||||
--------------------------- file_system_type ---------------------------
|
file_system_type
|
||||||
prototypes:
|
================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
struct dentry *(*mount) (struct file_system_type *, int,
|
struct dentry *(*mount) (struct file_system_type *, int,
|
||||||
const char *, void *);
|
const char *, void *);
|
||||||
void (*kill_sb) (struct super_block *);
|
void (*kill_sb) (struct super_block *);
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
may block
|
|
||||||
|
======= =========
|
||||||
|
ops may block
|
||||||
|
======= =========
|
||||||
mount yes
|
mount yes
|
||||||
kill_sb yes
|
kill_sb yes
|
||||||
|
======= =========
|
||||||
|
|
||||||
->mount() returns ERR_PTR or the root dentry; its superblock should be locked
|
->mount() returns ERR_PTR or the root dentry; its superblock should be locked
|
||||||
on return.
|
on return.
|
||||||
|
|
||||||
->kill_sb() takes a write-locked superblock, does all shutdown work on it,
|
->kill_sb() takes a write-locked superblock, does all shutdown work on it,
|
||||||
unlocks and drops the reference.
|
unlocks and drops the reference.
|
||||||
|
|
||||||
--------------------------- address_space_operations --------------------------
|
address_space_operations
|
||||||
prototypes:
|
========================
|
||||||
|
prototypes::
|
||||||
|
|
||||||
int (*writepage)(struct page *page, struct writeback_control *wbc);
|
int (*writepage)(struct page *page, struct writeback_control *wbc);
|
||||||
int (*readpage)(struct file *, struct page *);
|
int (*readpage)(struct file *, struct page *);
|
||||||
int (*writepages)(struct address_space *, struct writeback_control *);
|
int (*writepages)(struct address_space *, struct writeback_control *);
|
||||||
@ -218,14 +264,16 @@ prototypes:
|
|||||||
locking rules:
|
locking rules:
|
||||||
All except set_page_dirty and freepage may block
|
All except set_page_dirty and freepage may block
|
||||||
|
|
||||||
PageLocked(page) i_rwsem
|
====================== ======================== =========
|
||||||
|
ops PageLocked(page) i_rwsem
|
||||||
|
====================== ======================== =========
|
||||||
writepage: yes, unlocks (see below)
|
writepage: yes, unlocks (see below)
|
||||||
readpage: yes, unlocks
|
readpage: yes, unlocks
|
||||||
writepages:
|
writepages:
|
||||||
set_page_dirty no
|
set_page_dirty no
|
||||||
readpages:
|
readpages:
|
||||||
write_begin: locks the page exclusive
|
write_begin: locks the page exclusive
|
||||||
write_end: yes, unlocks exclusive
|
write_end: yes, unlocks exclusive
|
||||||
bmap:
|
bmap:
|
||||||
invalidatepage: yes
|
invalidatepage: yes
|
||||||
releasepage: yes
|
releasepage: yes
|
||||||
@ -239,17 +287,18 @@ is_partially_uptodate: yes
|
|||||||
error_remove_page: yes
|
error_remove_page: yes
|
||||||
swap_activate: no
|
swap_activate: no
|
||||||
swap_deactivate: no
|
swap_deactivate: no
|
||||||
|
====================== ======================== =========
|
||||||
|
|
||||||
->write_begin(), ->write_end() and ->readpage() may be called from
|
->write_begin(), ->write_end() and ->readpage() may be called from
|
||||||
the request handler (/dev/loop).
|
the request handler (/dev/loop).
|
||||||
|
|
||||||
->readpage() unlocks the page, either synchronously or via I/O
|
->readpage() unlocks the page, either synchronously or via I/O
|
||||||
completion.
|
completion.
|
||||||
|
|
||||||
->readpages() populates the pagecache with the passed pages and starts
|
->readpages() populates the pagecache with the passed pages and starts
|
||||||
I/O against them. They come unlocked upon I/O completion.
|
I/O against them. They come unlocked upon I/O completion.
|
||||||
|
|
||||||
->writepage() is used for two purposes: for "memory cleansing" and for
|
->writepage() is used for two purposes: for "memory cleansing" and for
|
||||||
"sync". These are quite different operations and the behaviour may differ
|
"sync". These are quite different operations and the behaviour may differ
|
||||||
depending upon the mode.
|
depending upon the mode.
|
||||||
|
|
||||||
@ -297,70 +346,81 @@ will leave the page itself marked clean but it will be tagged as dirty in the
|
|||||||
radix tree. This incoherency can lead to all sorts of hard-to-debug problems
|
radix tree. This incoherency can lead to all sorts of hard-to-debug problems
|
||||||
in the filesystem like having dirty inodes at umount and losing written data.
|
in the filesystem like having dirty inodes at umount and losing written data.
|
||||||
|
|
||||||
->writepages() is used for periodic writeback and for syscall-initiated
|
->writepages() is used for periodic writeback and for syscall-initiated
|
||||||
sync operations. The address_space should start I/O against at least
|
sync operations. The address_space should start I/O against at least
|
||||||
*nr_to_write pages. *nr_to_write must be decremented for each page which is
|
``*nr_to_write`` pages. ``*nr_to_write`` must be decremented for each page
|
||||||
written. The address_space implementation may write more (or less) pages
|
which is written. The address_space implementation may write more (or less)
|
||||||
than *nr_to_write asks for, but it should try to be reasonably close. If
|
pages than ``*nr_to_write`` asks for, but it should try to be reasonably close.
|
||||||
nr_to_write is NULL, all dirty pages must be written.
|
If nr_to_write is NULL, all dirty pages must be written.
|
||||||
|
|
||||||
writepages should _only_ write pages which are present on
|
writepages should _only_ write pages which are present on
|
||||||
mapping->io_pages.
|
mapping->io_pages.
|
||||||
|
|
||||||
->set_page_dirty() is called from various places in the kernel
|
->set_page_dirty() is called from various places in the kernel
|
||||||
when the target page is marked as needing writeback. It may be called
|
when the target page is marked as needing writeback. It may be called
|
||||||
under spinlock (it cannot block) and is sometimes called with the page
|
under spinlock (it cannot block) and is sometimes called with the page
|
||||||
not locked.
|
not locked.
|
||||||
|
|
||||||
->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some
|
->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some
|
||||||
filesystems and by the swapper. The latter will eventually go away. Please,
|
filesystems and by the swapper. The latter will eventually go away. Please,
|
||||||
keep it that way and don't breed new callers.
|
keep it that way and don't breed new callers.
|
||||||
|
|
||||||
->invalidatepage() is called when the filesystem must attempt to drop
|
->invalidatepage() is called when the filesystem must attempt to drop
|
||||||
some or all of the buffers from the page when it is being truncated. It
|
some or all of the buffers from the page when it is being truncated. It
|
||||||
returns zero on success. If ->invalidatepage is zero, the kernel uses
|
returns zero on success. If ->invalidatepage is zero, the kernel uses
|
||||||
block_invalidatepage() instead.
|
block_invalidatepage() instead.
|
||||||
|
|
||||||
->releasepage() is called when the kernel is about to try to drop the
|
->releasepage() is called when the kernel is about to try to drop the
|
||||||
buffers from the page in preparation for freeing it. It returns zero to
|
buffers from the page in preparation for freeing it. It returns zero to
|
||||||
indicate that the buffers are (or may be) freeable. If ->releasepage is zero,
|
indicate that the buffers are (or may be) freeable. If ->releasepage is zero,
|
||||||
the kernel assumes that the fs has no private interest in the buffers.
|
the kernel assumes that the fs has no private interest in the buffers.
|
||||||
|
|
||||||
->freepage() is called when the kernel is done dropping the page
|
->freepage() is called when the kernel is done dropping the page
|
||||||
from the page cache.
|
from the page cache.
|
||||||
|
|
||||||
->launder_page() may be called prior to releasing a page if
|
->launder_page() may be called prior to releasing a page if
|
||||||
it is still found to be dirty. It returns zero if the page was successfully
|
it is still found to be dirty. It returns zero if the page was successfully
|
||||||
cleaned, or an error value if not. Note that in order to prevent the page
|
cleaned, or an error value if not. Note that in order to prevent the page
|
||||||
getting mapped back in and redirtied, it needs to be kept locked
|
getting mapped back in and redirtied, it needs to be kept locked
|
||||||
across the entire operation.
|
across the entire operation.
|
||||||
|
|
||||||
->swap_activate will be called with a non-zero argument on
|
->swap_activate will be called with a non-zero argument on
|
||||||
files backing (non block device backed) swapfiles. A return value
|
files backing (non block device backed) swapfiles. A return value
|
||||||
of zero indicates success, in which case this file can be used for
|
of zero indicates success, in which case this file can be used for
|
||||||
backing swapspace. The swapspace operations will be proxied to the
|
backing swapspace. The swapspace operations will be proxied to the
|
||||||
address space operations.
|
address space operations.
|
||||||
|
|
||||||
->swap_deactivate() will be called in the sys_swapoff()
|
->swap_deactivate() will be called in the sys_swapoff()
|
||||||
path after ->swap_activate() returned success.
|
path after ->swap_activate() returned success.
|
||||||
|
|
||||||
----------------------- file_lock_operations ------------------------------
|
file_lock_operations
|
||||||
prototypes:
|
====================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
|
void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
|
||||||
void (*fl_release_private)(struct file_lock *);
|
void (*fl_release_private)(struct file_lock *);
|
||||||
|
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
inode->i_lock may block
|
|
||||||
|
=================== ============= =========
|
||||||
|
ops inode->i_lock may block
|
||||||
|
=================== ============= =========
|
||||||
fl_copy_lock: yes no
|
fl_copy_lock: yes no
|
||||||
fl_release_private: maybe maybe[1]
|
fl_release_private: maybe maybe[1]_
|
||||||
|
=================== ============= =========
|
||||||
|
|
||||||
[1]: ->fl_release_private for flock or POSIX locks is currently allowed
|
.. [1]:
|
||||||
to block. Leases however can still be freed while the i_lock is held and
|
->fl_release_private for flock or POSIX locks is currently allowed
|
||||||
so fl_release_private called on a lease should not block.
|
to block. Leases however can still be freed while the i_lock is held and
|
||||||
|
so fl_release_private called on a lease should not block.
|
||||||
|
|
||||||
|
lock_manager_operations
|
||||||
|
=======================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
----------------------- lock_manager_operations ---------------------------
|
|
||||||
prototypes:
|
|
||||||
void (*lm_notify)(struct file_lock *); /* unblock callback */
|
void (*lm_notify)(struct file_lock *); /* unblock callback */
|
||||||
int (*lm_grant)(struct file_lock *, struct file_lock *, int);
|
int (*lm_grant)(struct file_lock *, struct file_lock *, int);
|
||||||
void (*lm_break)(struct file_lock *); /* break_lease callback */
|
void (*lm_break)(struct file_lock *); /* break_lease callback */
|
||||||
@ -368,24 +428,33 @@ prototypes:
|
|||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
|
|
||||||
inode->i_lock blocked_lock_lock may block
|
========== ============= ================= =========
|
||||||
|
ops inode->i_lock blocked_lock_lock may block
|
||||||
|
========== ============= ================= =========
|
||||||
lm_notify: yes yes no
|
lm_notify: yes yes no
|
||||||
lm_grant: no no no
|
lm_grant: no no no
|
||||||
lm_break: yes no no
|
lm_break: yes no no
|
||||||
lm_change yes no no
|
lm_change yes no no
|
||||||
|
========== ============= ================= =========
|
||||||
|
|
||||||
|
buffer_head
|
||||||
|
===========
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
--------------------------- buffer_head -----------------------------------
|
|
||||||
prototypes:
|
|
||||||
void (*b_end_io)(struct buffer_head *bh, int uptodate);
|
void (*b_end_io)(struct buffer_head *bh, int uptodate);
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
called from interrupts. In other words, extreme care is needed here.
|
|
||||||
|
called from interrupts. In other words, extreme care is needed here.
|
||||||
bh is locked, but that's all warranties we have here. Currently only RAID1,
|
bh is locked, but that's all warranties we have here. Currently only RAID1,
|
||||||
highmem, fs/buffer.c, and fs/ntfs/aops.c are providing these. Block devices
|
highmem, fs/buffer.c, and fs/ntfs/aops.c are providing these. Block devices
|
||||||
call this method upon the IO completion.
|
call this method upon the IO completion.
|
||||||
|
|
||||||
--------------------------- block_device_operations -----------------------
|
block_device_operations
|
||||||
prototypes:
|
=======================
|
||||||
|
prototypes::
|
||||||
|
|
||||||
int (*open) (struct block_device *, fmode_t);
|
int (*open) (struct block_device *, fmode_t);
|
||||||
int (*release) (struct gendisk *, fmode_t);
|
int (*release) (struct gendisk *, fmode_t);
|
||||||
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
|
int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
|
||||||
@ -399,7 +468,10 @@ prototypes:
|
|||||||
void (*swap_slot_free_notify) (struct block_device *, unsigned long);
|
void (*swap_slot_free_notify) (struct block_device *, unsigned long);
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
bd_mutex
|
|
||||||
|
======================= ===================
|
||||||
|
ops bd_mutex
|
||||||
|
======================= ===================
|
||||||
open: yes
|
open: yes
|
||||||
release: yes
|
release: yes
|
||||||
ioctl: no
|
ioctl: no
|
||||||
@ -410,6 +482,7 @@ unlock_native_capacity: no
|
|||||||
revalidate_disk: no
|
revalidate_disk: no
|
||||||
getgeo: no
|
getgeo: no
|
||||||
swap_slot_free_notify: no (see below)
|
swap_slot_free_notify: no (see below)
|
||||||
|
======================= ===================
|
||||||
|
|
||||||
media_changed, unlock_native_capacity and revalidate_disk are called only from
|
media_changed, unlock_native_capacity and revalidate_disk are called only from
|
||||||
check_disk_change().
|
check_disk_change().
|
||||||
@ -418,8 +491,11 @@ swap_slot_free_notify is called with swap_lock and sometimes the page lock
|
|||||||
held.
|
held.
|
||||||
|
|
||||||
|
|
||||||
--------------------------- file_operations -------------------------------
|
file_operations
|
||||||
prototypes:
|
===============
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
loff_t (*llseek) (struct file *, loff_t, int);
|
loff_t (*llseek) (struct file *, loff_t, int);
|
||||||
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
|
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
|
||||||
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
|
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
|
||||||
@ -455,7 +531,6 @@ prototypes:
|
|||||||
size_t, unsigned int);
|
size_t, unsigned int);
|
||||||
int (*setlease)(struct file *, long, struct file_lock **, void **);
|
int (*setlease)(struct file *, long, struct file_lock **, void **);
|
||||||
long (*fallocate)(struct file *, int, loff_t, loff_t);
|
long (*fallocate)(struct file *, int, loff_t, loff_t);
|
||||||
};
|
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
All may block.
|
All may block.
|
||||||
@ -490,8 +565,11 @@ in sys_read() and friends.
|
|||||||
the lease within the individual filesystem to record the result of the
|
the lease within the individual filesystem to record the result of the
|
||||||
operation
|
operation
|
||||||
|
|
||||||
--------------------------- dquot_operations -------------------------------
|
dquot_operations
|
||||||
prototypes:
|
================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
int (*write_dquot) (struct dquot *);
|
int (*write_dquot) (struct dquot *);
|
||||||
int (*acquire_dquot) (struct dquot *);
|
int (*acquire_dquot) (struct dquot *);
|
||||||
int (*release_dquot) (struct dquot *);
|
int (*release_dquot) (struct dquot *);
|
||||||
@ -503,20 +581,26 @@ a proper locking wrt the filesystem and call the generic quota operations.
|
|||||||
|
|
||||||
What filesystem should expect from the generic quota functions:
|
What filesystem should expect from the generic quota functions:
|
||||||
|
|
||||||
FS recursion Held locks when called
|
============== ============ =========================
|
||||||
|
ops FS recursion Held locks when called
|
||||||
|
============== ============ =========================
|
||||||
write_dquot: yes dqonoff_sem or dqptr_sem
|
write_dquot: yes dqonoff_sem or dqptr_sem
|
||||||
acquire_dquot: yes dqonoff_sem or dqptr_sem
|
acquire_dquot: yes dqonoff_sem or dqptr_sem
|
||||||
release_dquot: yes dqonoff_sem or dqptr_sem
|
release_dquot: yes dqonoff_sem or dqptr_sem
|
||||||
mark_dirty: no -
|
mark_dirty: no -
|
||||||
write_info: yes dqonoff_sem
|
write_info: yes dqonoff_sem
|
||||||
|
============== ============ =========================
|
||||||
|
|
||||||
FS recursion means calling ->quota_read() and ->quota_write() from superblock
|
FS recursion means calling ->quota_read() and ->quota_write() from superblock
|
||||||
operations.
|
operations.
|
||||||
|
|
||||||
More details about quota locking can be found in fs/dquot.c.
|
More details about quota locking can be found in fs/dquot.c.
|
||||||
|
|
||||||
--------------------------- vm_operations_struct -----------------------------
|
vm_operations_struct
|
||||||
prototypes:
|
====================
|
||||||
|
|
||||||
|
prototypes::
|
||||||
|
|
||||||
void (*open)(struct vm_area_struct*);
|
void (*open)(struct vm_area_struct*);
|
||||||
void (*close)(struct vm_area_struct*);
|
void (*close)(struct vm_area_struct*);
|
||||||
vm_fault_t (*fault)(struct vm_area_struct*, struct vm_fault *);
|
vm_fault_t (*fault)(struct vm_area_struct*, struct vm_fault *);
|
||||||
@ -525,7 +609,10 @@ prototypes:
|
|||||||
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
|
int (*access)(struct vm_area_struct *, unsigned long, void*, int, int);
|
||||||
|
|
||||||
locking rules:
|
locking rules:
|
||||||
mmap_sem PageLocked(page)
|
|
||||||
|
============= ======== ===========================
|
||||||
|
ops mmap_sem PageLocked(page)
|
||||||
|
============= ======== ===========================
|
||||||
open: yes
|
open: yes
|
||||||
close: yes
|
close: yes
|
||||||
fault: yes can return with page locked
|
fault: yes can return with page locked
|
||||||
@ -533,8 +620,9 @@ map_pages: yes
|
|||||||
page_mkwrite: yes can return with page locked
|
page_mkwrite: yes can return with page locked
|
||||||
pfn_mkwrite: yes
|
pfn_mkwrite: yes
|
||||||
access: yes
|
access: yes
|
||||||
|
============= ======== ===========================
|
||||||
|
|
||||||
->fault() is called when a previously not present pte is about
|
->fault() is called when a previously not present pte is about
|
||||||
to be faulted in. The filesystem must find and return the page associated
|
to be faulted in. The filesystem must find and return the page associated
|
||||||
with the passed in "pgoff" in the vm_fault structure. If it is possible that
|
with the passed in "pgoff" in the vm_fault structure. If it is possible that
|
||||||
the page may be truncated and/or invalidated, then the filesystem must lock
|
the page may be truncated and/or invalidated, then the filesystem must lock
|
||||||
@ -542,7 +630,7 @@ the page, then ensure it is not already truncated (the page lock will block
|
|||||||
subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
|
subsequent truncate), and then return with VM_FAULT_LOCKED, and the page
|
||||||
locked. The VM will unlock the page.
|
locked. The VM will unlock the page.
|
||||||
|
|
||||||
->map_pages() is called when VM asks to map easy accessible pages.
|
->map_pages() is called when VM asks to map easy accessible pages.
|
||||||
Filesystem should find and map pages associated with offsets from "start_pgoff"
|
Filesystem should find and map pages associated with offsets from "start_pgoff"
|
||||||
till "end_pgoff". ->map_pages() is called with page table locked and must
|
till "end_pgoff". ->map_pages() is called with page table locked and must
|
||||||
not block. If it's not possible to reach a page without blocking,
|
not block. If it's not possible to reach a page without blocking,
|
||||||
@ -551,25 +639,26 @@ page table entry. Pointer to entry associated with the page is passed in
|
|||||||
"pte" field in vm_fault structure. Pointers to entries for other offsets
|
"pte" field in vm_fault structure. Pointers to entries for other offsets
|
||||||
should be calculated relative to "pte".
|
should be calculated relative to "pte".
|
||||||
|
|
||||||
->page_mkwrite() is called when a previously read-only pte is
|
->page_mkwrite() is called when a previously read-only pte is
|
||||||
about to become writeable. The filesystem again must ensure that there are
|
about to become writeable. The filesystem again must ensure that there are
|
||||||
no truncate/invalidate races, and then return with the page locked. If
|
no truncate/invalidate races, and then return with the page locked. If
|
||||||
the page has been truncated, the filesystem should not look up a new page
|
the page has been truncated, the filesystem should not look up a new page
|
||||||
like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which
|
like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which
|
||||||
will cause the VM to retry the fault.
|
will cause the VM to retry the fault.
|
||||||
|
|
||||||
->pfn_mkwrite() is the same as page_mkwrite but when the pte is
|
->pfn_mkwrite() is the same as page_mkwrite but when the pte is
|
||||||
VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is
|
VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is
|
||||||
VM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior
|
VM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior
|
||||||
after this call is to make the pte read-write, unless pfn_mkwrite returns
|
after this call is to make the pte read-write, unless pfn_mkwrite returns
|
||||||
an error.
|
an error.
|
||||||
|
|
||||||
->access() is called when get_user_pages() fails in
|
->access() is called when get_user_pages() fails in
|
||||||
access_process_vm(), typically used to debug a process through
|
access_process_vm(), typically used to debug a process through
|
||||||
/proc/pid/mem or ptrace. This function is needed only for
|
/proc/pid/mem or ptrace. This function is needed only for
|
||||||
VM_IO | VM_PFNMAP VMAs.
|
VM_IO | VM_PFNMAP VMAs.
|
||||||
|
|
||||||
================================================================================
|
--------------------------------------------------------------------------------
|
||||||
|
|
||||||
Dubious stuff
|
Dubious stuff
|
||||||
|
|
||||||
(if you break something or notice that it is broken and do not fix it yourself
|
(if you break something or notice that it is broken and do not fix it yourself
|
@ -1,3 +1,4 @@
|
|||||||
|
:orphan:
|
||||||
|
|
||||||
Making Filesystems Exportable
|
Making Filesystems Exportable
|
||||||
=============================
|
=============================
|
||||||
@ -42,9 +43,9 @@ filehandle fragment, there is no automatic creation of a path prefix
|
|||||||
for the object. This leads to two related but distinct features of
|
for the object. This leads to two related but distinct features of
|
||||||
the dcache that are not needed for normal filesystem access.
|
the dcache that are not needed for normal filesystem access.
|
||||||
|
|
||||||
1/ The dcache must sometimes contain objects that are not part of the
|
1. The dcache must sometimes contain objects that are not part of the
|
||||||
proper prefix. i.e that are not connected to the root.
|
proper prefix. i.e that are not connected to the root.
|
||||||
2/ The dcache must be prepared for a newly found (via ->lookup) directory
|
2. The dcache must be prepared for a newly found (via ->lookup) directory
|
||||||
to already have a (non-connected) dentry, and must be able to move
|
to already have a (non-connected) dentry, and must be able to move
|
||||||
that dentry into place (based on the parent and name in the
|
that dentry into place (based on the parent and name in the
|
||||||
->lookup). This is particularly needed for directories as
|
->lookup). This is particularly needed for directories as
|
||||||
@ -52,7 +53,7 @@ the dcache that are not needed for normal filesystem access.
|
|||||||
|
|
||||||
To implement these features, the dcache has:
|
To implement these features, the dcache has:
|
||||||
|
|
||||||
a/ A dentry flag DCACHE_DISCONNECTED which is set on
|
a. A dentry flag DCACHE_DISCONNECTED which is set on
|
||||||
any dentry that might not be part of the proper prefix.
|
any dentry that might not be part of the proper prefix.
|
||||||
This is set when anonymous dentries are created, and cleared when a
|
This is set when anonymous dentries are created, and cleared when a
|
||||||
dentry is noticed to be a child of a dentry which is in the proper
|
dentry is noticed to be a child of a dentry which is in the proper
|
||||||
@ -71,48 +72,52 @@ a/ A dentry flag DCACHE_DISCONNECTED which is set on
|
|||||||
dentries. That guarantees that we won't need to hunt them down upon
|
dentries. That guarantees that we won't need to hunt them down upon
|
||||||
umount.
|
umount.
|
||||||
|
|
||||||
b/ A primitive for creation of secondary roots - d_obtain_root(inode).
|
b. A primitive for creation of secondary roots - d_obtain_root(inode).
|
||||||
Those do _not_ bear DCACHE_DISCONNECTED. They are placed on the
|
Those do _not_ bear DCACHE_DISCONNECTED. They are placed on the
|
||||||
per-superblock list (->s_roots), so they can be located at umount
|
per-superblock list (->s_roots), so they can be located at umount
|
||||||
time for eviction purposes.
|
time for eviction purposes.
|
||||||
|
|
||||||
c/ Helper routines to allocate anonymous dentries, and to help attach
|
c. Helper routines to allocate anonymous dentries, and to help attach
|
||||||
loose directory dentries at lookup time. They are:
|
loose directory dentries at lookup time. They are:
|
||||||
|
|
||||||
d_obtain_alias(inode) will return a dentry for the given inode.
|
d_obtain_alias(inode) will return a dentry for the given inode.
|
||||||
If the inode already has a dentry, one of those is returned.
|
If the inode already has a dentry, one of those is returned.
|
||||||
|
|
||||||
If it doesn't, a new anonymous (IS_ROOT and
|
If it doesn't, a new anonymous (IS_ROOT and
|
||||||
DCACHE_DISCONNECTED) dentry is allocated and attached.
|
DCACHE_DISCONNECTED) dentry is allocated and attached.
|
||||||
|
|
||||||
In the case of a directory, care is taken that only one dentry
|
In the case of a directory, care is taken that only one dentry
|
||||||
can ever be attached.
|
can ever be attached.
|
||||||
|
|
||||||
d_splice_alias(inode, dentry) will introduce a new dentry into the tree;
|
d_splice_alias(inode, dentry) will introduce a new dentry into the tree;
|
||||||
either the passed-in dentry or a preexisting alias for the given inode
|
either the passed-in dentry or a preexisting alias for the given inode
|
||||||
(such as an anonymous one created by d_obtain_alias), if appropriate.
|
(such as an anonymous one created by d_obtain_alias), if appropriate.
|
||||||
It returns NULL when the passed-in dentry is used, following the calling
|
It returns NULL when the passed-in dentry is used, following the calling
|
||||||
convention of ->lookup.
|
convention of ->lookup.
|
||||||
|
|
||||||
Filesystem Issues
|
Filesystem Issues
|
||||||
-----------------
|
-----------------
|
||||||
|
|
||||||
For a filesystem to be exportable it must:
|
For a filesystem to be exportable it must:
|
||||||
|
|
||||||
1/ provide the filehandle fragment routines described below.
|
1. provide the filehandle fragment routines described below.
|
||||||
2/ make sure that d_splice_alias is used rather than d_add
|
2. make sure that d_splice_alias is used rather than d_add
|
||||||
when ->lookup finds an inode for a given parent and name.
|
when ->lookup finds an inode for a given parent and name.
|
||||||
|
|
||||||
If inode is NULL, d_splice_alias(inode, dentry) is equivalent to
|
If inode is NULL, d_splice_alias(inode, dentry) is equivalent to::
|
||||||
|
|
||||||
d_add(dentry, inode), NULL
|
d_add(dentry, inode), NULL
|
||||||
|
|
||||||
Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err)
|
Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err)
|
||||||
|
|
||||||
Typically the ->lookup routine will simply end with a:
|
Typically the ->lookup routine will simply end with a::
|
||||||
|
|
||||||
return d_splice_alias(inode, dentry);
|
return d_splice_alias(inode, dentry);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
A file system implementation declares that instances of the filesystem
|
A file system implementation declares that instances of the filesystem
|
||||||
are exportable by setting the s_export_op field in the struct
|
are exportable by setting the s_export_op field in the struct
|
||||||
super_block. This field must point to a "struct export_operations"
|
super_block. This field must point to a "struct export_operations"
|
||||||
struct which has the following members:
|
struct which has the following members:
|
@ -20,7 +20,7 @@ kernel which allows different filesystem implementations to coexist.
|
|||||||
|
|
||||||
VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
|
VFS system calls open(2), stat(2), read(2), write(2), chmod(2) and so on
|
||||||
are called from a process context. Filesystem locking is described in
|
are called from a process context. Filesystem locking is described in
|
||||||
the document Documentation/filesystems/Locking.
|
the document Documentation/filesystems/locking.rst.
|
||||||
|
|
||||||
|
|
||||||
Directory Entry Cache (dcache)
|
Directory Entry Cache (dcache)
|
||||||
|
@ -24,7 +24,7 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* See Documentation/filesystems/nfs/Exporting
|
* See Documentation/filesystems/nfs/exporting.rst
|
||||||
* and examples in fs/exportfs
|
* and examples in fs/exportfs
|
||||||
*
|
*
|
||||||
* Since cifs is a network file system, an "fsid" must be included for
|
* Since cifs is a network file system, an "fsid" must be included for
|
||||||
|
@ -7,7 +7,7 @@
|
|||||||
* and for mapping back from file handles to dentries.
|
* and for mapping back from file handles to dentries.
|
||||||
*
|
*
|
||||||
* For details on why we do all the strange and hairy things in here
|
* For details on why we do all the strange and hairy things in here
|
||||||
* take a look at Documentation/filesystems/nfs/Exporting.
|
* take a look at Documentation/filesystems/nfs/exporting.rst.
|
||||||
*/
|
*/
|
||||||
#include <linux/exportfs.h>
|
#include <linux/exportfs.h>
|
||||||
#include <linux/fs.h>
|
#include <linux/fs.h>
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
*
|
*
|
||||||
* The following files are helpful:
|
* The following files are helpful:
|
||||||
*
|
*
|
||||||
* Documentation/filesystems/nfs/Exporting
|
* Documentation/filesystems/nfs/exporting.rst
|
||||||
* fs/exportfs/expfs.c.
|
* fs/exportfs/expfs.c.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
@ -555,7 +555,7 @@ static int orangefs_fsync(struct file *file,
|
|||||||
* Change the file pointer position for an instance of an open file.
|
* Change the file pointer position for an instance of an open file.
|
||||||
*
|
*
|
||||||
* \note If .llseek is overriden, we must acquire lock as described in
|
* \note If .llseek is overriden, we must acquire lock as described in
|
||||||
* Documentation/filesystems/Locking.
|
* Documentation/filesystems/locking.rst.
|
||||||
*
|
*
|
||||||
* Future upgrade could support SEEK_DATA and SEEK_HOLE but would
|
* Future upgrade could support SEEK_DATA and SEEK_HOLE but would
|
||||||
* require much changes to the FS
|
* require much changes to the FS
|
||||||
|
@ -151,7 +151,7 @@ struct dentry_operations {
|
|||||||
|
|
||||||
/*
|
/*
|
||||||
* Locking rules for dentry_operations callbacks are to be found in
|
* Locking rules for dentry_operations callbacks are to be found in
|
||||||
* Documentation/filesystems/Locking. Keep it updated!
|
* Documentation/filesystems/locking.rst. Keep it updated!
|
||||||
*
|
*
|
||||||
* FUrther descriptions are found in Documentation/filesystems/vfs.rst.
|
* FUrther descriptions are found in Documentation/filesystems/vfs.rst.
|
||||||
* Keep it updated too!
|
* Keep it updated too!
|
||||||
|
@ -139,7 +139,7 @@ struct fid {
|
|||||||
* @get_parent: find the parent of a given directory
|
* @get_parent: find the parent of a given directory
|
||||||
* @commit_metadata: commit metadata changes to stable storage
|
* @commit_metadata: commit metadata changes to stable storage
|
||||||
*
|
*
|
||||||
* See Documentation/filesystems/nfs/Exporting for details on how to use
|
* See Documentation/filesystems/nfs/exporting.rst for details on how to use
|
||||||
* this interface correctly.
|
* this interface correctly.
|
||||||
*
|
*
|
||||||
* encode_fh:
|
* encode_fh:
|
||||||
|
Loading…
Reference in New Issue
Block a user