The list operations can never be called; they are even documented to be
unused.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pages looked up by __hfs_bnode_create() (called by hfs_bnode_create() and
hfs_bnode_find() for finding or creating pages corresponding to an inode)
are immediately kmap()'ed and used (both read and write) and kunmap()'ed,
and should not be page_cache_release()'ed until hfs_bnode_free().
This patch fixes a problem I first saw in July 2012: merely running "du"
on a large hfsplus-mounted directory a few times on a reasonably loaded
system would get the hfsplus driver all confused and complaining about
B-tree inconsistencies, and generates a "BUG: Bad page state". Most
recently, I can generate this problem on up-to-date Fedora 22 with shipped
kernel 4.0.5, by running "du /" (="/" + "/home" + "/mnt" + other smaller
mounts) and "du /mnt" simultaneously on two windows, where /mnt is a
lightly-used QEMU VM image of the full Mac OS X 10.9:
$ df -i / /home /mnt
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 551665 2725135 17% /
/dev/mapper/fedora-home 52879360 716221 52163139 2% /home
/dev/nbd0p2 4294967295 1387818 4293579477 1% /mnt
After applying the patch, I was able to run "du /" (60+ times) and "du
/mnt" (150+ times) continuously and simultaneously for 6+ hours.
There are many reports of the hfsplus driver getting confused under load
and generating "BUG: Bad page state" or other similar issues over the
years. [1]
The unpatched code [2] has always been wrong since it entered the kernel
tree. The only reason why it gets away with it is that the
kmap/memcpy/kunmap follow very quickly after the page_cache_release() so
the kernel has not had a chance to reuse the memory for something else,
most of the time.
The current RW driver appears to have followed the design and development
of the earlier read-only hfsplus driver [3], where-by version 0.1 (Dec
2001) had a B-tree node-centric approach to
read_cache_page()/page_cache_release() per bnode_get()/bnode_put(),
migrating towards version 0.2 (June 2002) of caching and releasing pages
per inode extents. When the current RW code first entered the kernel [2]
in 2005, there was an REF_PAGES conditional (and "//" commented out code)
to switch between B-node centric paging to inode-centric paging. There
was a mistake with the direction of one of the REF_PAGES conditionals in
__hfs_bnode_create(). In a subsequent "remove debug code" commit [4], the
read_cache_page()/page_cache_release() per bnode_get()/bnode_put() were
removed, but a page_cache_release() was mistakenly left in (propagating
the "REF_PAGES <-> !REF_PAGE" mistake), and the commented-out
page_cache_release() in bnode_release() (which should be spanned by
!REF_PAGES) was never enabled.
References:
[1]:
Michael Fox, Apr 2013
http://www.spinics.net/lists/linux-fsdevel/msg63807.html
("hfsplus volume suddenly inaccessable after 'hfs: recoff %d too large'")
Sasha Levin, Feb 2015
http://lkml.org/lkml/2015/2/20/85 ("use after free")
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/740814https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1027887https://bugzilla.kernel.org/show_bug.cgi?id=42342https://bugzilla.kernel.org/show_bug.cgi?id=63841https://bugzilla.kernel.org/show_bug.cgi?id=78761
[2]:
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfs/bnode.c?id=d1081202f1d0ee35ab0beb490da4b65d4bc763db
commit d1081202f1d0ee35ab0beb490da4b65d4bc763db
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:36 2004 -0800
[PATCH] HFS rewrite
http://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/\
fs/hfsplus/bnode.c?id=91556682e0bf004d98a529bf829d339abb98bbbd
commit 91556682e0bf004d98a529bf829d339abb98bbbd
Author: Andrew Morton <akpm@osdl.org>
Date: Wed Feb 25 16:17:48 2004 -0800
[PATCH] HFS+ support
[3]:
http://sourceforge.net/projects/linux-hfsplus/http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.1/http://sourceforge.net/projects/linux-hfsplus/files/Linux%202.4.x%20patch/hfsplus%200.2/http://linux-hfsplus.cvs.sourceforge.net/viewvc/linux-hfsplus/linux/\
fs/hfsplus/bnode.c?r1=1.4&r2=1.5
Date: Thu Jun 6 09:45:14 2002 +0000
Use buffer cache instead of page cache in bnode.c. Cache inode extents.
[4]:
http://git.kernel.org/cgit/linux/kernel/git/\
stable/linux-stable.git/commit/?id=a5e3985fa014029eb6795664c704953720cc7f7d
commit a5e3985fa0
Author: Roman Zippel <zippel@linux-m68k.org>
Date: Tue Sep 6 15:18:47 2005 -0700
[PATCH] hfs: remove debug code
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g. new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else. This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.
Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use
of "sudo" is something more sneaky:
$ BASE="ovl"
$ MNT="$BASE/mnt"
$ LOW="$BASE/lower"
$ UP="$BASE/upper"
$ WORK="$BASE/work/ 0 0
none /proc fuse.pwn user_id=1000"
$ mkdir -p "$LOW" "$UP" "$WORK"
$ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
$ cat /proc/mounts
none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
none /proc fuse.pwn user_id=1000 0 0
$ fusermount -u /proc
$ cat /proc/mounts
cat: /proc/mounts: No such file or directory
This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed. Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.
[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
list_entry is just a wrapper for container_of, but it is arguably
wrong (and slightly confusing) to use it when the pointed-to struct
member is not a struct list_head. Use container_of directly instead.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With the planned cgroup writeback support, backing-dev related
declarations will be more widely used across block and cgroup;
unfortunately, including backing-dev.h from include/linux/blkdev.h
makes cyclic include dependency quite likely.
This patch separates out backing-dev-defs.h which only has the
essential definitions and updates blkdev.h to include it. c files
which need access to more backing-dev details now include
backing-dev.h directly. This takes backing-dev.h off the common
include dependency chain making it a lot easier to use it across block
and cgroup.
v2: fs/fat build failure fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
Merge third patchbomb from Andrew Morton:
- various misc things
- a couple of lib/ optimisations
- provide DIV_ROUND_CLOSEST_ULL()
- checkpatch updates
- rtc tree
- befs, nilfs2, hfs, hfsplus, fatfs, adfs, affs, bfs
- ptrace fixes
- fork() fixes
- seccomp cleanups
- more mmap_sem hold time reductions from Davidlohr
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (138 commits)
proc: show locks in /proc/pid/fdinfo/X
docs: add missing and new /proc/PID/status file entries, fix typos
drivers/rtc/rtc-at91rm9200.c: make IO endian agnostic
Documentation/spi/spidev_test.c: fix warning
drivers/rtc/rtc-s5m.c: allow usage on device type different than main MFD type
.gitignore: ignore *.tar
MAINTAINERS: add Mediatek SoC mailing list
tomoyo: reduce mmap_sem hold for mm->exe_file
powerpc/oprofile: reduce mmap_sem hold for exe_file
oprofile: reduce mmap_sem hold for mm->exe_file
mips: ip32: add platform data hooks to use DS1685 driver
lib/Kconfig: fix up HAVE_ARCH_BITREVERSE help text
x86: switch to using asm-generic for seccomp.h
sparc: switch to using asm-generic for seccomp.h
powerpc: switch to using asm-generic for seccomp.h
parisc: switch to using asm-generic for seccomp.h
mips: switch to using asm-generic for seccomp.h
microblaze: use asm-generic for seccomp.h
arm: use asm-generic for seccomp.h
seccomp: allow COMPAT sigreturn overrides
...
On Mac OS X, HFS+ extended attributes are not namespaced. Since we want
to be compatible with OS X filesystems and yet still support the Linux
namespacing system, the hfsplus driver implements a special "osx"
namespace that is reported for any attribute that is not namespaced
on-disk. However, the current code for getting and setting these
unprefixed attributes is broken.
hfsplus_osx_setattr() and hfsplus_osx_getattr() are passed names that have
already had their "osx." prefixes stripped by the generic functions. The
functions first, quite correctly, check those names to make sure that they
aren't prefixed with a known namespace, which would allow namespace access
restrictions to be bypassed. However, the functions then prepend "osx."
to the name they're given before passing it on to hfsplus_getattr() and
hfsplus_setattr(). Not only does this cause the "osx." prefix to be
stored on-disk, defeating its purpose, it also breaks the check for the
special "com.apple.FinderInfo" attribute, which is reported for all files,
and as a consequence makes some userspace applications (e.g. GNU patch)
fail even when extended attributes are not otherwise in use.
There are five commits which have touched this particular code:
127e5f5ae5 ("hfsplus: rework functionality of getting, setting and deleting of extended attributes")
b168fff721 ("hfsplus: use xattr handlers for removexattr")
bf29e886b2 ("hfsplus: correct usage of HFSPLUS_ATTR_MAX_STRLEN for non-English attributes")
fcacbd95e121 ("fs/hfsplus: move xattr_name allocation in hfsplus_getxattr()")
ec1bbd346f18 ("fs/hfsplus: move xattr_name allocation in hfsplus_setxattr()")
The first commit creates the functions to begin with. The namespace is
prepended by the original code, which I believe was correct at the time,
since hfsplus_?etattr() stripped the prefix if found. The second commit
removes this behavior from hfsplus_?etattr() and appears to have been
intended to also remove the prefixing from hfsplus_osx_?etattr().
However, what it actually does is remove a necessary strncpy() call
completely, breaking the osx namespace entirely. The third commit re-adds
the strncpy() call as it was originally, but doesn't mention it in its
commit message. The final two commits refactor the code and don't affect
its functionality.
This commit does what b168fff attempted to do (prevent the prefix from
being added), but does it properly, instead of passing in an empty buffer
(which is what b168fff actually did).
Fixes: b168fff721 ("hfsplus: use xattr handlers for removexattr")
Signed-off-by: Thomas Hebb <tommyhebb@gmail.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sergei Antonov <saproj@gmail.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix a bug which is reproduced as follows. Create a file:
echo abc > test_file
Try to expand the file beyond available space:
truncate --size=<size exceeding available space> test_file
Since HFS+ does not support file size > allocated size, truncate should
fail. However, it ends successfully. The driver returns success despite
having been unable to allocate the requested space for the file. Also
filesystem check finds an error:
Checking catalog file.
Incorrect size for file test_file
(It should be 469094400 instead of 1000000000)
Add a piece of code analogous to code in the fat driver. Now a proper
error is returned and filesystem remains consistent.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case of memory allocation error, the return should be -ENOMEM, instead
of -ENOSPC.
Signed-off-by: Chengyu Song <csong84@gatech.edu>
Reviewed-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_known_namespace() only returns true/false. Also remove inline and let
compiler decide what to do with static functions.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
According to commit 5f16f3225b ("ext4: atomically set inode->i_flags in
ext4_set_inode_flags()").
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
security/trusted/user/osx setxattr did the same
xattr_name initialization. Move that operation in hfsplus_setxattr().
Tested with security/trusted/user getfattr/setfattr
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
security/trusted/user/osx getxattr did the same
xattr_name initialization. Move that operation in hfsplus_getxattr().
Tested with security/trusted/user getfattr/setfattr
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This doesn't change how the code works, but clearly the curly braces were
intended.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Sougata Santra <sougata@tuxera.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The rw parameter to direct_IO is redundant with iov_iter->type, and
treated slightly differently just about everywhere it's used: some users
do rw & WRITE, and others do rw == WRITE where they should be doing a
bitwise check. Simplify this with the new iov_iter_rw() helper, which
always returns either READ or WRITE.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Most filesystems call through to these at some point, so we'll start
here.
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
trivial conflict in net/socket.c and non-trivial one in crypto -
that one had evaded aio_complete() removal.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert(). In this case a hfs_brec_update_parent() is
called to update the parent index node (if exists) and it is passed
hfs_find_data with a search_key containing a newly inserted key instead
of the key to be updated. This results in an inconsistent index node.
The bug reproduces on my machine after an extents overflow record for
the catalog file (CNID=4) is inserted into the extents overflow B-tree.
Because of a low (reserved) value of CNID=4, it has to become the first
record in the first leaf node.
The resulting first leaf node is correct:
----------------------------------------------------
| key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |
----------------------------------------------------
But the parent index key0 still contains the previous key CNID=123:
-----------------------
| key0.CNID=123 | ... |
-----------------------
A change in hfs_brec_insert() makes hfs_brec_update_parent() work
correctly by preventing it from getting fd->record=-1 value from
__hfs_brec_find().
Along the way, I removed duplicate code with unification of the if
condition. The resulting code is equivalent to the original code
because node is never 0.
Also hfs_brec_update_parent() will now return an error after getting a
negative fd->record value. However, the return value of
hfs_brec_update_parent() is not checked anywhere in the file and I'm
leaving it unchanged by this patch. brec.c lacks error checking after
some other calls too, but this issue is of less importance than the one
being fixed by this patch.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
my @cocci = (
'@@',
'expression E;',
'@@',
'',
'- S_ISLNK(E->d_inode->i_mode)',
'+ d_is_symlink(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISDIR(E->d_inode->i_mode)',
'+ d_is_dir(E)',
'',
'@@',
'expression E;',
'@@',
'',
'- S_ISREG(E->d_inode->i_mode)',
'+ d_is_reg(E)' );
my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);
foreach my $file (@callers) {
chomp $file;
print "Processing ", $file, "\n";
system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
die "spatch failed";
}
[AV: overlayfs parts skipped]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Longname is not correctly handled by hfsplus driver. If an attempt to
create a longname(>255) file/directory is made, it succeeds by creating a
file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key. Thus
leaving the volume in an inconsistent state. This patch fixes this issue.
Although lookup is always called first to create a negative entry, so just
doing a check in lookup would probably fix this issue. I choose to
propagate error to other iops as well.
Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from
hfsplus_cat_build_key, to avoid unncessary branching.
Thanks a lot.
TEST:
------
dir="TEST_DIR"
cdir=`pwd`
name255="_123456789_123456789_123456789_123456789_123456789_123456789\
_123456789_123456789_123456789_123456789_123456789_123456789_123456789\
_123456789_123456789_123456789_123456789_123456789_123456789_123456789\
_123456789_123456789_123456789_123456789_123456789_1234"
name256="${name255}5"
mkdir $dir
cd $dir
touch $name255
rm -f $name255
touch $name256
ls -la
cd $cdir
rm -rf $dir
RESULT:
-------
[sougata@ultrabook tmp]$ cdir=`pwd`
[sougata@ultrabook tmp]$
name255="_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_1234"
[sougata@ultrabook tmp]$ name256="${name255}5"
[sougata@ultrabook tmp]$
[sougata@ultrabook tmp]$ mkdir $dir
[sougata@ultrabook tmp]$ cd $dir
[sougata@ultrabook TEST_DIR]$ touch $name255
[sougata@ultrabook TEST_DIR]$ rm -f $name255
[sougata@ultrabook TEST_DIR]$ touch $name256
[sougata@ultrabook TEST_DIR]$ ls -la
ls: cannot access
_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
No such file or directory
total 0
drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 .
drwxrwxrwx 1 root root 6 Feb 20 19:56 ..
-????????? ? ? ? ? ?
_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
[sougata@ultrabook TEST_DIR]$ cd $cdir
[sougata@ultrabook tmp]$ rm -rf $dir
rm: cannot remove `TEST_DIR': Directory not empty
-ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops.
This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN
and incorrect keys, leaving the FS in an inconsistent state. This patch
fixes this issue.
Signed-off-by: Sougata Santra <sougata@tuxera.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs updates from Al Viro:
"This the bunch that sat in -next + lock_parent() fix. This is the
minimal set; there's more pending stuff.
In particular, I really hope to get acct.c fixes merged this cycle -
we need that to deal sanely with delayed-mntput stuff. In the next
pile, hopefully - that series is fairly short and localized
(kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
iov_iter work. Most of prereqs for ->splice_write with sane locking
order are there and Kent's dio rewrite would also fit nicely on top of
this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
lock_parent: don't step on stale ->d_parent of all-but-freed one
kill generic_file_splice_write()
ceph: switch to iter_file_splice_write()
shmem: switch to iter_file_splice_write()
nfs: switch to iter_splice_write_file()
fs/splice.c: remove unneeded exports
ocfs2: switch to iter_file_splice_write()
->splice_write() via ->write_iter()
bio_vec-backed iov_iter
optimize copy_page_{to,from}_iter()
bury generic_file_aio_{read,write}
lustre: get rid of messing with iovecs
ceph: switch to ->write_iter()
ceph_sync_direct_write: stop poking into iov_iter guts
ceph_sync_read: stop poking into iov_iter guts
new helper: copy_page_from_iter()
fuse: switch to ->write_iter()
btrfs: switch to ->write_iter()
ocfs2: switch to ->write_iter()
xfs: switch to ->write_iter()
...
Commit a99b7069aa ("hfsplus: Fix undefined __divdi3 in
hfsplus_init_header_node()") introduced do_div() to xattr.c and the
warning below too.
As Geert remarked: "tmp" is "loff_t" which is "__kernel_loff_t", which
is "long long", i.e. signed, while include/asm-generic/div64.h compares
its type with "uint64_t". As inode sizes are positive, it should be
safe to change the type of "tmp" to "u64".
In file included from
arch/powerpc/include/asm/div64.h:1:0,
from include/linux/kernel.h:124,
from include/asm-generic/bug.h:13,
from arch/powerpc/include/asm/bug.h:127,
from include/linux/bug.h:4,
from include/linux/thread_info.h:11,
from include/asm-generic/preempt.h:4,
from arch/powerpc/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:18,
from include/linux/spinlock.h:50,
from include/linux/wait.h:8,
from include/linux/fs.h:6,
from fs/hfsplus/hfsplus_fs.h:19,
from fs/hfsplus/xattr.c:9:
fs/hfsplus/xattr.c: In function 'hfsplus_init_header_node':
include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default]
(void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
^
fs/hfsplus/xattr.c:86:2: note: in expansion of macro 'do_div'
do_div(tmp, node_size);
^
Signed-off-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Sergei Antonov <saproj@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some function declarations in hfsplus_fs.h were with argument names,
some without, and some were mixed. This patch adds argument names
everywhere, sorts function in order they go in .c files, and moves
hfs_part_find() to a proper section.
Auto-formatting and sorting was done with:
cfunctions *.c | indent -linux | sed "s| \* | \*|"
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace while blocksize;shift by ilog2
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zero newly allocated extents in the catalog tree if volume attributes
tell us to. Not doing so we risk getting the "unused node is not
erased" error. See kHFSUnusedNodeFix flag in Apple's source code for
reference.
There was a previous commit clearing the node when it is freed: commit
899bed05e9 ("hfsplus: fix issue with unzeroed unused b-tree nodes").
But it did not handle newly allocated extents (this patch fixes it).
And it zeroed nodes in all trees unconditionally which is an overkill.
This patch adds a condition and also switches to 'tree->node_size' as a
simpler method of getting the length to zero.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Kyle Laracey <kalaracey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Also add * before function comments (it was not detected by kernel-doc)
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace seq_printf where possible
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
hfsplus_readdir() incorrectly returned DT_REG for symbolic links and
special files. Return DT_REG, DT_LNK, DT_FIFO, DT_CHR, DT_BLK, DT_SOCK,
or DT_UNKNOWN according to mode field in catalog record. Programs
relying on information from readdir will now work correctly with HFS+.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The directory/file catalog b-tree equivalent, hfsplus_build_key_uni(),
is used by hfsplus_find_cat() for internal referencing between catalog
records. There is no corresponding usage for attributes - attribute
records do not refer to one another.
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Sougata Santra <sougata@tuxera.com>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HFSPLUS_ATTR_MAX_STRLEN (=127) is the limit of attribute names for the
number of unicode character (UTF-16BE) storable in the HFS+ file system.
Almost all the current usage of it is wrong, in relation to NLS to
on-disk conversion.
Except for one use calling hfsplus_asc2uni (which should stay the same)
and its uses in calling hfsplus_uni2asc (which was corrected in the
earlier patch in this series concerning usage of hfsplus_uni2asc), all
the other uses are of the forms:
- char buffer[size]
- bound check: "if (namespace_adjusted_input_length > size) return failure;"
Conversion between on-disk unicode representation and NLS char strings
(in whichever direction) always needs to accommodate the worst-case NLS
conversion, so all char buffers of that size need to have a
NLS_MAX_CHARSET_SIZE x .
The bound checks are all wrong, since they compare nls_length derived
from strlen() to a unicode length limit.
It turns out that all the bound-checks do is to protect
hfsplus_asc2uni(), which can fail if the input is too large.
There is only one usage of it as far as attributes are concerned, in
hfsplus_attr_build_key(). It is in turn used by hfsplus_find_attr(),
hfsplus_create_attr(), hfsplus_delete_attr(). Thus making sure that
errors from hfsplus_asc2uni() is caught in hfsplus_attr_build_key() and
propagated is sufficient to replace all the bound checks.
Unpropagated errors from hfsplus_asc2uni() in the file catalog code was
addressed recently in an independent patch "hfsplus: fix longname
handling" by Sougata Santra.
Before this patch, trying to set a 55 CJK character (in a UTF-8 locale,
> 127/3=42) attribute plus user prefix fails with:
$ setfattr -n user.`cat testing-string` -v `cat testing-string` \
testing-string
setfattr: testing-string: Operation not supported
and retrieving a stored long attributes is particular ugly(!):
find /mnt/* -type f -exec getfattr -d {} \;
getfattr: /mnt/testing-string: Input/output error
with console log:
[268008.389781] hfsplus: unicode conversion failed
After the patch, both of the above works.
FYI, the test attribute string is prepared with:
echo -e -n \
"\xe9\x80\x99\xe6\x98\xaf\xe4\xb8\x80\xe5\x80\x8b\xe9\x9d\x9e\xe5" \
"\xb8\xb8\xe6\xbc\xab\xe9\x95\xb7\xe8\x80\x8c\xe6\xa5\xb5\xe5\x85" \
"\xb6\xe4\xb9\x8f\xe5\x91\xb3\xe5\x92\x8c\xe7\x9b\xb8\xe7\x95\xb6" \
"\xe7\x84\xa1\xe8\xb6\xa3\xe3\x80\x81\xe4\xbb\xa5\xe5\x8f\x8a\xe7" \
"\x84\xa1\xe7\x94\xa8\xe7\x9a\x84\xe3\x80\x81\xe5\x86\x8d\xe5\x8a" \
"\xa0\xe4\xb8\x8a\xe6\xaf\xab\xe7\x84\xa1\xe6\x84\x8f\xe7\xbe\xa9" \
"\xe7\x9a\x84\xe6\x93\xb4\xe5\xb1\x95\xe5\xb1\xac\xe6\x80\xa7\xef" \
"\xbc\x8c\xe8\x80\x8c\xe5\x85\xb6\xe5\x94\xaf\xe4\xb8\x80\xe5\x89" \
"\xb5\xe5\xbb\xba\xe7\x9b\xae\xe7\x9a\x84\xe5\x83\x85\xe6\x98\xaf" \
"\xe7\x82\xba\xe4\xba\x86\xe6\xb8\xac\xe8\xa9\xa6\xe4\xbd\x9c\xe7" \
"\x94\xa8\xe3\x80\x82" | tr -d ' '
(= "pointlessly long attribute for testing", elaborate Chinese in
UTF-8 enoding).
However, it is not possible to set double the size (110 + 5 is still
under 127) in a UTF-8 locale:
$setfattr -n user.`cat testing-string testing-string` -v \
`cat testing-string testing-string` testing-string
setfattr: testing-string: Numerical result out of range
110 CJK char in UTF-8 is 330 bytes - the generic get/set attribute
system call code in linux/fs/xattr.c imposes a 255 byte limit. One can
use a combination of iconv to encode content, changing terminal locale
for viewing, and an nls=cp932/cp936/cp949/cp950 mount option to fully
use 127-unicode attribute in a double-byte locale.
Also, as an additional information, it is possible to (mis-)use unicode
half-width/full-width forms (U+FFxx) to write attributes which looks
like english but not actually ascii.
Thanks Anton Altaparmakov for reviewing the earlier ideas behind this
change.
[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a series of 3 patches which corrects issues in HFS+ concerning
the use of non-english file names and attributes. Names and attributes
are stored internally as UTF-16 units up to a fixed maximum size, and
convert to and from user-representation by NLS. The code incorrectly
assume that NLS string lengths are equal to unicode lengths, which is
only true for English ascii usage.
This patch (of 3):
The HFS Plus Volume Format specification (TN1150) states that file names
are stored internally as a maximum of 255 unicode characters, as defined
by The Unicode Standard, Version 2.0 [Unicode, Inc. ISBN
0-201-48345-9]. File names are converted by the NLS system on Linux
before presented to the user.
255 CJK characters converts to UTF-8 with 1 unicode character to up to 3
bytes, and to GB18030 with 1 unicode character to up to 4 bytes. Thus,
trying in a UTF-8 locale to list files with names of more than 85 CJK
characters results in:
$ ls /mnt
ls: reading directory /mnt: File name too long
The receiving buffer to hfsplus_uni2asc() needs to be 255 x
NLS_MAX_CHARSET_SIZE bytes, not 255 bytes as the code has always been.
Similar consideration applies to attributes, which are stored internally
as a maximum of 127 UTF-16BE units. See XNU source for an up-to-date
reference on attributes.
Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6 is not
attainable in the case of conversion to UTF-8, as going beyond 3 bytes
requires the use of surrogate pairs, i.e. consuming two input units.
Thanks Anton Altaparmakov for reviewing an earlier version of this
change.
This patch fixes all callers of hfsplus_uni2asc(), and also enables the
use of long non-English file names in HFS+. The getting and setting,
and general usage of long non-English attributes requires further
forthcoming work, in the following patches of this series.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and COLLAPSE_RANGE fallocate operations, and scalability improvements
in the jbd2 layer and in xattr handling when the extended attributes
spill over into an external block.
Other than that, the usual clean ups and minor bug fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y
Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A
ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI
kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd
bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15
a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs
rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa
ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/
+165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA
2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr
700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU
DrPKlXwIgva7zq5/S0Vr
=4s1Z
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Major changes for 3.14 include support for the newly added ZERO_RANGE
and COLLAPSE_RANGE fallocate operations, and scalability improvements
in the jbd2 layer and in xattr handling when the extended attributes
spill over into an external block.
Other than that, the usual clean ups and minor bug fixes"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits)
ext4: fix premature freeing of partial clusters split across leaf blocks
ext4: remove unneeded test of ret variable
ext4: fix comment typo
ext4: make ext4_block_zero_page_range static
ext4: atomically set inode->i_flags in ext4_set_inode_flags()
ext4: optimize Hurd tests when reading/writing inodes
ext4: kill i_version support for Hurd-castrated file systems
ext4: each filesystem creates and uses its own mb_cache
fs/mbcache.c: doucple the locking of local from global data
fs/mbcache.c: change block and index hash chain to hlist_bl_node
ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate
ext4: refactor ext4_fallocate code
ext4: Update inode i_size after the preallocation
ext4: fix partial cluster handling for bigalloc file systems
ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents
ext4: only call sync_filesystm() when remounting read-only
fs: push sync_filesystem() down to the file system's remount_fs()
jbd2: improve error messages for inconsistent journal heads
jbd2: minimize region locked by j_list_lock in jbd2_journal_forget()
jbd2: minimize region locked by j_list_lock in journal_get_create_access()
...
hfsplus_create_attr_tree_cache is only called by __init init_hfsplus_fs
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Concurrent access to alloc_blocks in hfsplus_inode_info() is protected
by extents_lock mutex. This patch fixes two instances where
alloc_blocks modification was not protected with this lock.
This fixes possible allocation bitmap corruption in race conditions
while extending and truncating files.
[akpm@linux-foundation.org: take extents_lock before taking a copy of ->alloc_blocks]
[akpm@linux-foundation.org: remove now-unused label `out']
Signed-off-by: Sougata Santra <sougata@tuxera.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The variable is defined but not used. Generally it compiles away with
-O2 optimization hence it does not show a warning.
Signed-off-by: Sougata Santra <sougata@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reclaim will be leaving shadow entries in the page cache radix tree upon
evicting the real page. As those pages are found from the LRU, an
iput() can lead to the inode being freed concurrently. At this point,
reclaim must no longer install shadow pages because the inode freeing
code needs to ensure the page tree is really empty.
Add an address_space flag, AS_EXITING, that the inode freeing code sets
under the tree lock before doing the final truncate. Reclaim will check
for this flag before installing shadow pages.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Metin Doslu <metin@citusdata.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Ozgun Erdogan <ozgun@citusdata.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin <klamm@yandex-team.ru>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds support for HFSX 'HasFolderCount' flag and a corresponding
'folderCount' field in folder records. (For reference see
HFS_FOLDERCOUNT and kHFSHasFolderCountBit/kHFSHasFolderCountMask in
Apple's source code.)
Ignoring subfolder count leads to fs errors found by Mac:
...
Checking catalog hierarchy.
HasFolderCount flag needs to be set (id = 105)
(It should be 0x10 instead of 0)
Incorrect folder count in a directory (id = 2)
(It should be 7 instead of 6)
...
Steps to reproduce:
Format with "newfs_hfs -s /dev/diskXXX".
Mount in Linux.
Create a new directory in root.
Unmount.
Run "fsck_hfs /dev/diskXXX".
The patch handles directory creation, deletion, and rename.
Signed-off-by: Sergei Antonov <saproj@gmail.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current implementation of HFS+ driver has small issue with remount
option. Namely, for example, you are unable to remount from RO mode
into RW mode by means of command "mount -o remount,rw /dev/loop0
/mnt/hfsplus". Trying to execute sequence of commands results in an
error message:
mount /dev/loop0 /mnt/hfsplus
mount -o remount,ro /dev/loop0 /mnt/hfsplus
mount -o remount,rw /dev/loop0 /mnt/hfsplus
mount: you must specify the filesystem type
mount -t hfsplus -o remount,rw /dev/loop0 /mnt/hfsplus
mount: /mnt/hfsplus not mounted or bad option
The reason of such issue is failure of mount syscall:
mount("/dev/loop0", "/mnt/hfsplus", 0x2282a60, MS_MGC_VAL|MS_REMOUNT, NULL) = -1 EINVAL (Invalid argument)
Namely, hfsplus_parse_options_remount() method receives empty "input"
argument and return false in such case. As a result, hfsplus_remount()
returns -EINVAL error code.
This patch fixes the issue by means of return true for the case of empty
"input" argument in hfsplus_parse_options_remount() method.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs fixes from Al Viro:
"Several obvious fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Fix mountpoint reference leakage in linkat
hfsplus: use xattr handlers for removexattr
Typo in compat_sys_lseek() declaration
fs/super.c: sync ro remount after blocking writers
vfs: unexport the getname() symbol
hfsplus was already using the handlers for get and set operations,
and with the removal of can_set_xattr we've now allow operations that
wouldn't otherwise be allowed.
With this we can also centralize the special-casing of the osx.
attrs that don't have prefixes on disk in the osx xattr handlers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull core block IO changes from Jens Axboe:
"The major piece in here is the immutable bio_ve series from Kent, the
rest is fairly minor. It was supposed to go in last round, but
various issues pushed it to this release instead. The pull request
contains:
- Various smaller blk-mq fixes from different folks. Nothing major
here, just minor fixes and cleanups.
- Fix for a memory leak in the error path in the block ioctl code
from Christian Engelmayer.
- Header export fix from CaiZhiyong.
- Finally the immutable biovec changes from Kent Overstreet. This
enables some nice future work on making arbitrarily sized bios
possible, and splitting more efficient. Related fixes to immutable
bio_vecs:
- dm-cache immutable fixup from Mike Snitzer.
- btrfs immutable fixup from Muthu Kumar.
- bio-integrity fix from Nic Bellinger, which is also going to stable"
* 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits)
xtensa: fixup simdisk driver to work with immutable bio_vecs
block/blk-mq-cpu.c: use hotcpu_notifier()
blk-mq: for_each_* macro correctness
block: Fix memory leak in rw_copy_check_uvector() handling
bio-integrity: Fix bio_integrity_verify segment start bug
block: remove unrelated header files and export symbol
blk-mq: uses page->list incorrectly
blk-mq: use __smp_call_function_single directly
btrfs: fix missing increment of bi_remaining
Revert "block: Warn and free bio if bi_end_io is not set"
block: Warn and free bio if bi_end_io is not set
blk-mq: fix initializing request's start time
block: blk-mq: don't export blk_mq_free_queue()
block: blk-mq: make blk_sync_queue support mq
block: blk-mq: support draining mq queue
dm cache: increment bi_remaining when bi_end_io is restored
block: fixup for generic bio chaining
block: Really silence spurious compiler warnings
block: Silence spurious compiler warnings
block: Kill bio_pair_split()
...
Pull vfs updates from Al Viro:
"Assorted stuff; the biggest pile here is Christoph's ACL series. Plus
assorted cleanups and fixes all over the place...
There will be another pile later this week"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits)
__dentry_path() fixes
vfs: Remove second variable named error in __dentry_path
vfs: Is mounted should be testing mnt_ns for NULL or error.
Fix race when checking i_size on direct i/o read
hfsplus: remove can_set_xattr
nfsd: use get_acl and ->set_acl
fs: remove generic_acl
nfs: use generic posix ACL infrastructure for v3 Posix ACLs
gfs2: use generic posix ACL infrastructure
jfs: use generic posix ACL infrastructure
xfs: use generic posix ACL infrastructure
reiserfs: use generic posix ACL infrastructure
ocfs2: use generic posix ACL infrastructure
jffs2: use generic posix ACL infrastructure
hfsplus: use generic posix ACL infrastructure
f2fs: use generic posix ACL infrastructure
ext2/3/4: use generic posix ACL infrastructure
btrfs: use generic posix ACL infrastructure
fs: make posix_acl_create more useful
fs: make posix_acl_chmod more useful
...
When using the per-superblock xattr handlers permission checking is
done by the generic code. hfsplus just needs to check for the magic
osx attribute not to leak into protected namespaces.
Also given that the code was obviously copied from JFS the proper
attribution was missing.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Rename the current posix_acl_created to __posix_acl_create and add
a fully featured helper to set up the ACLs on file creation that
uses get_acl().
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Rename the current posix_acl_chmod to __posix_acl_chmod and add
a fully featured ACL chmod helper that uses the ->set_acl inode
operation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
HFS+ resource fork lookup breaks opendir() library function. Since
opendir first calls open() with O_DIRECTORY flag set. O_DIRECTORY means
"refuse to open if not a directory". The open system call in the kernel
does a check for inode->i_op->lookup and returns -ENOTDIR. So if
hfsplus_file_lookup is set it allows opendir() for plain files.
Also resource fork lookup in HFS+ does not work. Since it is never
invoked after VFS permission checking. It will always return with
-EACCES.
When we call opendir() on a file, it does not return NULL. opendir()
library call is based on open with O_DIRECTORY flag passed and then
layered on top of getdents() system call. O_DIRECTORY means "refuse to
open if not a directory".
The open() system call in the kernel does a check for: do_sys_open()
-->..--> can_lookup() i.e it only checks inode->i_op->lookup and returns
ENOTDIR if this function pointer is not set.
In OSX, we can open "file/rsrc" to get the resource fork of "file". This
behavior is emulated inside hfsplus on Linux, which means that to some
degree every file acts like a directory. That is the reason lookup()
inode operations is supported for files, and it is possible to do a lookup
on this specific name. As a result of this open succeeds without
returning ENOTDIR for HFS+
Please see the LKML discussion thread on this issue:
http://marc.info/?l=linux-fsdevel&m=122823343730412&w=2
I tried to test file/rsrc lookup in HFS+ driver and the feature does not
work. From OSX:
$ touch test
$ echo "1234" > test/..namedfork/rsrc
$ ls -l test..namedfork/rsrc
--rw-r--r-- 1 tuxera staff 5 10 dec 12:59 test/..namedfork/rsrc
[sougata@ultrabook tmp]$ id
uid=1000(sougata) gid=1000(sougata) groups=1000(sougata),5(tty),18(dialout),1001(vboxusers)
[sougata@ultrabook tmp]$ mount
/dev/sdb1 on /mnt/tmp type hfsplus (rw,relatime,umask=0,uid=1000,gid=1000,nls=utf8)
[sougata@ultrabook tmp]$ ls -l test/rsrc
ls: cannot access test/rsrc: Permission denied
According to this LKML thread it is expected behavior.
http://marc.info/?t=121139033800008&r=1&w=4
I guess now that permission checking happens in vfs generic_permission() ?
So it turns out that even though the lookup() inode_operation exists for
HFS+ files. It cannot really get invoked ?. So if we can disable this
feature to make opendir() work for HFS+.
Signed-off-by: Sougata Santra <sougata@tuxera.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It was being open coded in a few places.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It was being open coded in a few places.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Chris Mason <chris.mason@fusionio.com>
Acked-by: NeilBrown <neilb@suse.de>
ERROR: "__divdi3" [fs/hfsplus/hfsplus.ko] undefined!
Introduced by commit 099e9245e0 ("hfsplus: implement attributes file's
header node initialization code").
i_size_read() returns loff_t, which is long long, i.e. 64-bit. node_size
is size_t, which is either 32-bit or 64-bit. Hence
"i_size_read(attr_file) / node_size" is a 64-by-32 or 64-by-64 division,
causing (some versions of) gcc to emit a call to __divdi3().
Fortunately node_size is actually 16-bit, as the sole caller of
hfsplus_init_header_node() passes a u16. Hence change its type from
size_t to u16, and use do_div() to perform a 64-by-32 division.
Not seen in m68k/allmodconfig in -next, so it really depends on the
verion of gcc.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement functionality of creation AttributesFile metadata file on HFS+
volume in the case of absence of it.
It makes trying to open AttributesFile's B-tree during mount of HFS+
volume. If HFS+ volume hasn't AttributesFile then a pointer on
AttributesFile's B-tree keeps as NULL. Thereby, when it is discovered
absence of AttributesFile on HFS+ volume in the begin of xattr creation
operation then AttributesFile will be created.
The creation of AttributesFile will have success in the case of
availability (2 * clump) free blocks on HFS+ volume. Otherwise,
creation operation is ended with error (-ENOSPC).
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are situation when HFS+ volume had been created without
AttributesFile. Such situation can take place because of using old
mkfs.hfs utility or creation HFS+ volume without taking in mind
necessity to use xattrs. For example, Mac OS X 10.4 (Tiger) doesn't
create AttributesFile during mkfs phase. Also it is a very frequent
situation for the case of users that created HFS+ volumes under Linux.
As a result, xattrs and POSIX ACLs on HFS+ volume are unavailable for
such users.
This patchset implements functionality of AttributesFile creation on
HFS+ volume in the case of this metadata file absence during operation
of xattr creation.
This patch:
Add functionality of metadata file's clump size calculation. Operation
of AttributesFile creation needs in clump size setting. This value will
be used when AttributesFile will be extended.
This code is adopted from code of newfs_hfs utility of diskdev_cmds packet
http://opensource.apple.com/tarballs/diskdev_cmds/.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
truncate_pagecache() doesn't care about old size since commit
cedabed49b ("vfs: Fix vmtruncate() regression"). Let's drop it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset implements POSIX ACLs support in hfsplus driver.
Mac OS X beginning with version 10.4 ("Tiger") support NFSv4 ACLs, which
are part of the NFSv4 standard. HFS+ stores ACLs in the form of
specially named extended attributes (com.apple.system.Security).
But this patchset doesn't use "com.apple.system.Security" extended
attributes. It implements support of POSIX ACLs in the form of extended
attributes with names "system.posix_acl_access" and
"system.posix_acl_default". These xattrs are treated only under Linux.
POSIX ACLs doesn't mean something under Mac OS X. Thereby, this patch
set provides opportunity to use POSIX ACLs under Linux on HFS+
filesystem.
This patch:
Add CONFIG_HFSPLUS_FS_POSIX_ACL kernel configuration option, DBG_ACL_MOD
debugging flag and acl.h file with declaration of essential functions
for support POSIX ACLs in hfsplus driver.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instances either don't look at it at all (the majority of cases) or
only want it to find the superblock (which can be had as dentry->d_sb).
A few cases that want more are actually safe with dentry->d_inode -
the only precaution needed is the check that it hadn't been replaced with
NULL by rmdir() or by overwriting rename(), which case should be simply
treated as cache miss.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use a more current logging style.
Rename macro and uses.
Add do {} while (0) to macro.
Add DBG_ to macro.
Add and use hfs_dbg_cont variant where appropriate.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/hfsplus/bfind.c: In function 'hfs_find_1st_rec_by_cnid':
(1) include/uapi/linux/swab.h:60:2: warning: 'search_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]
(2) include/uapi/linux/swab.h:60:2: warning: 'cur_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized]
[akpm@linux-foundation.org: make the workaround more explicit]
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.
A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.
Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.
Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.
This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.
This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.
After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The fsck_hfs (under MacOS X) complains about unzeroed unused b-tree nodes
after deletion of folders' tree under Linux.
SYMPTOMS:
Running Disk Utiltiy's "Verify Disk" on "test" gives the following:
Verifying volume “Test”
Checking file systemChecking Journaled HFS Plus volume.
Checking extents overflow file.
Checking catalog file.
Unused node is not erased (node = 3111)
Checking multi-linked files.
Checking catalog hierarchy.
Checking extended attributes file.
Checking volume bitmap.
Checking volume information.
The volume Test was found corrupt and needs to be repaired.
Error: This disk needs to be repaired. Click Repair Disk.
REPRODUCING PATH:
1. Prepare HFS+ (non-case sensitive) partition (for example, 5GB)
under MacOS X.
2. Copy linux kernel source tree (for example, 3.7-rc6 version) on
this partition under MacOS X.
3. Then switch to Linux and mount this prepared partition.
4. Execute `sudo rm -r` under prepared directory with linux kernel
source tree.
5. Unmount and boot back into OS X.
6. Open up Disk Utility and verify partition.
REPRODUCIBILITY: 100%
FIX:
It is added code of node clearing in hfs_bnode_put() method for the case
when node has flag HFS_BNODE_DELETED.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Kyle Laracey <kalaracey@gmail.com>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add support of manipulation by attributes file.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rework functionality of getting, setting and deleting of extended attributes.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add functionality of manipulating by records in attributes tree.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add all necessary on-disk layout declarations related to attributes file.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Reported-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge the rest of Andrew's patches for -rc1:
"A bunch of fixes and misc missed-out-on things.
That'll do for -rc1. I still have a batch of IPC patches which still
have a possible bug report which I'm chasing down."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits)
keys: use keyring_alloc() to create module signing keyring
keys: fix unreachable code
sendfile: allows bypassing of notifier events
SGI-XP: handle non-fatal traps
fat: fix incorrect function comment
Documentation: ABI: remove testing/sysfs-devices-node
proc: fix inconsistent lock state
linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
memcg: don't register hotcpu notifier from ->css_alloc()
checkpatch: warn on uapi #includes that #include <uapi/...
revert "rtc: recycle id when unloading a rtc driver"
mm: clean up transparent hugepage sysfs error messages
hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method
hfsplus: rework processing of hfs_btree_write() returned error
hfsplus: rework processing errors in hfsplus_free_extents()
hfsplus: avoid crash on failed block map free
kcmp: include linux/ptrace.h
drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h>
mm: cma: WARN if freed memory is still in use
exec: do not leave bprm->interp on stack
...
Add an error message for the case of failure of sync fs in
delayed_sync_fs() method.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add to hfs_btree_write() a return of -EIO on failure of b-tree node
searching. Also add logic ofor processing errors from hfs_btree_write()
in hfsplus_system_write_inode() with a message about b-tree writing
failure.
[akpm@linux-foundation.org: reduce scope of `err', print errno on error]
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, it doesn't process error codes from the hfsplus_block_free()
call in hfsplus_free_extents() method. Add some error code processing.
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the read fails we kmap an error code. This doesn't end well. Instead
print a critical error and pray. This mirrors the rest of the fs
behaviour with critical error cases.
Acked-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.
Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Use -ENOMEM return value instead of -EINVAL when kzalloc() fails.
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch makes hfsplus stop using the VFS '->write_super()' method along with
the 's_dirt' superblock flag, because they are on their way out.
The whole "superblock write-out" VFS infrastructure is served by the
'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
writes out all dirty superblocks using the '->write_super()' call-back. But the
problem with this thread is that it wastes power by waking up the system every
5 seconds, even if there are no diry superblocks, or there are no client
file-systems which would need this (e.g., btrfs does not use
'->write_super()'). So we want to kill it completely and thus, we need to make
file-systems to stop using the '->write_super()' VFS service, and then remove
it together with the kernel thread.
Tested using fsstress from the LTP project.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This check is useless because we always have 'sb->s_fs_info' to be non-NULL.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Print correct function name in the debugging print of the
'hfsplus_sync_fs()' function.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... because it is used only in fs/hfsplus/super.c.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
HFS+ doesn't really implement hard links - instead, hardlinks are indicated
by a magic file type which refers to an indirect node in a hidden
directory. The spec indicates that stat() should return the inode number
of the indirect node, but it turns out that this doesn't satisfy the
firmware when it's looking for a bootloader - it wants the catalog ID of
the hardlink file instead. Fix up this case.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The variable io_size was unsigned int, which caused the wrong sector number
to be calculated after aligning it. This then caused mount to fail with big
volumes, as backup volume header information was searched from a
wrong sector.
Signed-off-by: Janne Kalliomäki <janne@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPw2J/AAoJECvKgwp+S8Ja5jkP/3uMxkhf8XQpXCI3O1QVfaQr
uZFfM8sINqIPDVm1dtFjFj7f8Bw9mhE2KAnnJ1rKT8tQwqq9yAse1QPlhCG1ZqoP
+AnMDDXHtx7WmQZXhBvS9b+unpZ7Jr6r6pO5XrmTL2kRL3YJPUhZ2+xbTT5belTB
KoAu4WqORZRxfXoC76S7U8K+D4NcAGhAOxCClsIjmY+oocCiCag4FZOyzYIFViqc
ghUN/+rLQ3fqGGv2yO7Ylx1gUM7sxIwkZQ/h962jFAtxz9czImr2NmRoMliOaOkS
tvcnIf+E3u0n/zIjzFvzhxKgHJPP8PkcPMk60d3jKmFngBkqFTzNUeVTP8md7HrV
4DlXisWr+z7YVyWUCFaNcJLmjiWSwQ8DV/clRLobeBf9EJKan5F1PjFgl6PLJM5F
Qr1+LHMNaetdulBwMRTyveZTzYqw9RmDnD9dWMo4mX/kTpvtC4jTPVV7hkRD+Qlv
5vTRR+VXL3Q50yClLf0AQMSKTnH2gBuepM/b+7cShLGfsMln8DtUjmbigv+niL63
BibcCIbIlP2uWGnl37VhsC34AT+RKt3lggrBOpn/7XJMq/wKR7IRP/7V9TfYgaUN
NBa+wtnLDa1pZEn/X7izdcQP62PzDtmB+ObvYT0Yb40A4+2ud3qF/lB53c1A1ewF
/9c4zxxekjHZnn2oooEa
=oLXf
-----END PGP SIGNATURE-----
Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback tree from Wu Fengguang:
"Mainly from Jan Kara to avoid iput() in the flusher threads."
* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Avoid iput() from flusher thread
vfs: Rename end_writeback() to clear_inode()
vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
writeback: Refactor writeback_single_inode()
writeback: Remove wb->list_lock from writeback_single_inode()
writeback: Separate inode requeueing after writeback
writeback: Move I_DIRTY_PAGES handling
writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
writeback: Move clearing of I_SYNC into inode_sync_complete()
writeback: initialize global_dirty_limit
fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
mm: page-writeback.c: local functions should not be exposed globally
After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Commit ec81aecb29 ("hfs: fix a potential buffer overflow") fixed a few
potential buffer overflows in the hfs filesystem. But as Timo Warns
pointed out, these changes also need to be made on the hfsplus
filesystem as well.
Reported-by: Timo Warns <warns@pre-sense.de>
Acked-by: WANG Cong <amwang@redhat.com>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Sage Weil <sage@newdream.net>
Cc: Eugene Teo <eteo@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: stable <stable@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Making an hfsplus partition bootable requires the ability to "bless" a
file by putting its inode number in the volume header. Doing this from
userspace on a mounted filesystem is impractical since the kernel will
write back the original values on unmount. Add an ioctl to allow userspace
to update the volume header information based on the target file.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The finder_info block in the hfsplus volume header is currently defined as
an array of 8 bit values, but TN1150 defines it as being an array of 32 bit
values. Fix for convenience.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The userflags field was being written to the filesystem without being
initialised. Make sure it's clear, since otherwise files end up with
garbage attributes.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Replace direct i_nlink updates with the respective updater function
(inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count).
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
generic_check_addressable can't deal with hfsplus's larger than page
size allocation blocks, so simply opencode the checks that we actually
need in hfsplus_fill_super.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Reported-by: Pavel Ivanov <paivanof@gmail.com>
Tested-by: Pavel Ivanov <paivanof@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 6596528e39 ("hfsplus: ensure bio requests are not smaller than
the hardware sectors") changed the pointers used for volume header
allocations but failed to free the correct pointers in the error path
path of hfsplus_fill_super() and hfsplus_read_wrapper.
The second hunk came from a separate patch by Pavel Ivanov.
Reported-by: Pavel Ivanov <paivanof@gmail.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Cc: <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (107 commits)
vfs: use ERR_CAST for err-ptr tossing in lookup_instantiate_filp
isofs: Remove global fs lock
jffs2: fix IN_DELETE_SELF on overwriting rename() killing a directory
fix IN_DELETE_SELF on overwriting rename() on ramfs et.al.
mm/truncate.c: fix build for CONFIG_BLOCK not enabled
fs:update the NOTE of the file_operations structure
Remove dead code in dget_parent()
AFS: Fix silly characters in a comment
switch d_add_ci() to d_splice_alias() in "found negative" case as well
simplify gfs2_lookup()
jfs_lookup(): don't bother with . or ..
get rid of useless dget_parent() in btrfs rename() and link()
get rid of useless dget_parent() in fs/btrfs/ioctl.c
fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers
drivers: fix up various ->llseek() implementations
fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek
Ext4: handle SEEK_HOLE/SEEK_DATA generically
Btrfs: implement our own ->llseek
fs: add SEEK_HOLE and SEEK_DATA flags
reiserfs: make reiserfs default to barrier=flush
...
Fix up trivial conflicts in fs/xfs/linux-2.6/xfs_super.c due to the new
shrinker callout for the inode cache, that clashed with the xfs code to
start the periodic workers later.
Currently all bio requests are 512 bytes, which may fail for media
whose physical sector size is larger than this. Ensure these
requests are not smaller than the block device logical block size.
BugLink: http://bugs.launchpad.net/bugs/734883
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
'recoff' is read from disk and used for an argument to memcpy, so if
the value read from disk is larger than the page size, it result to
"general protection fault". This patch add additional range check for
the value, so that disk fuzz won't cause such fault.
Signed-off-by: Naohiro Aota <naota@elisp.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Simple filesystems always pass inode->i_sb_bdev as the block device
argument, and never need a end_io handler. Let's simply things for
them and for my grepping activity by dropping these arguments. The
only thing not falling into that scheme is ext4, which passes and
end_io handler without needing special flags (yet), but given how
messy the direct I/O code there is use of __blockdev_direct_IO
in one instead of two out of three cases isn't going to make a large
difference anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Let filesystems handle waiting for direct I/O requests themselves instead
of doing it beforehand. This means filesystem-specific locks to prevent
new dio referenes from appearing can be held. This is important to allow
generalizing i_dio_count to non-DIO_LOCKING filesystems.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Implement error propagation through the callers of
hfsplus_ext_write_extent_locked().
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
hfs_find_init() may fail with ENOMEM, but there are places, where
the returned value is not checked. The consequences can be very
unpleasant, e.g. kfree uninitialized pointer and
inappropriate mutex unlocking.
The patch adds checks for errors in hfs_find_init().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Replace the hardcoded 2TB limit with a dynamic limit based on the block
size now that we have fixed the few overflows preventing operation
with large volumes.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
For partitions larger than 2TB or at such an offset the hfs wrapper code
in hfsplus might overflow the range representable in a 32-bit
data type. Make sure we use a sector_t for the arithmetics leading to it.
I'm not sure this code can be readed at all as hfs itself never supported
such large volumes.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
For filesystems larger than 2TB the final sector number passed to
map_bh might overflow the range representable in a 32-bit data type.
Make sure we use a sector_t for it and the arithmetics calculating it.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Make assignments outside `if' conditions for unicode.c module
where the checkpatch.pl script reported this coding style error.
Signed-off-by: Anton Salikhmetov <alexo@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
There is a misprint in resource deallocation code on error path in
hfsplus_fill_super(): the sbi->alloc_file inode is iput twice,
while the root inode in not iput at all.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Christoph Hellwig <hch@lst.de>
hfsplus leaks bio objects by failing to call bio_put() on the bios
it allocates. Add the missing call to fix the leak.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Cc: <stable@kernel.org> # .38.x, .39.x
Signed-off-by: Christoph Hellwig <hch@lst.de>
hfsplus does not have problems with references to unlinked directories.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only a few file systems need this. Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Only a few file systems need this. Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.
This does not change behavior for any in-tree file systems.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...
Fix up conflicts in fs/{aio.c,super.c}
And give it a kernel-doc comment.
[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Revert an incorrect hunk from commit b2837fcf49,
"hfsplus: %L-to-%ll, macro correction, and remove unneeded braces"
revert a pointless change of comparism operation argument order, which turned
out to not even be equivalent.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Currently the error handling in hfsplus_fill_super is a mess, and can
lead to accessing fields in the superblock that haven't been even set
up yet. Fix this by making sure we do not set up sb->s_root until we
have the mount fully set up, and before that do proper step by step
unwinding instead of using hfsplus_put_super as a big hammer.
Reported-by: Dan Williams <dcbw@redhat.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
hfsplus: %L-to-%ll, macro correction, and remove unneeded braces
hfsplus: spaces/indentation clean-up
hfsplus: C99 comments clean-up
hfsplus: over 80 character lines clean-up
hfsplus: fix an artifact in ioctl flag checking
hfsplus: flush disk caches in sync and fsync
hfsplus: optimize fsync
hfsplus: split up inode flags
hfsplus: write up fsync for directories
hfsplus: simplify fsync
hfsplus: avoid useless work in hfsplus_sync_fs
hfsplus: make sure sync writes out all metadata
hfsplus: use raw bio access for partition tables
hfsplus: use raw bio access for the volume headers
hfsplus: always use hfsplus_sync_fs to write the volume header
hfsplus: silence a few debug printks
hfsplus: fix option parsing during remount
Fix up conflicts due to VFS changes in fs/hfsplus/{hfsplus_fs.h,unicode.c}
Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.
Patched with:
git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
RCU free the struct inode. This will allow:
- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.
The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.
In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.
The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Change d_hash so it may be called from lock-free RCU lookups. See similar
patch for d_compare for details.
For in-tree filesystems, this is just a mechanical change.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.
For in-tree filesystems, this is just a mechanical change.
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Clean-up based on checkpatch.pl report against unnecessary braces
(`{' and `}'), non-standard format option %Lu (%llu recommended)
as well as one trailing statement in a macro definition which
should have been on the next line.
Signed-off-by: Anton Salikhmetov <alexo@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Fix incorrect spaces and indentation reported by checkpatch.pl.
Signed-off-by: Anton Salikhmetov <alexo@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Match coding style restriction against C99 comments where
checkpatch.pl reported errors about their usage.
Signed-off-by: Anton Salikhmetov <alexo@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Match coding style line length limitation where checkpatch.pl
reported over-80-character-line warnings.
Signed-off-by: Anton Salikhmetov <alexo@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Fix a flag checking artifact in hfsplus_ioctl_getflags() routine
found while doing clean-up against assignments inside `if's.
Signed-off-by: Anton Salikhmetov <alexo@tuxera.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Flush the disk cache in fsync and sync to make sure data actually is
on disk on completion of these system calls. There is a nobarrier
mount option to disable this behaviour. It's slightly misnamed now
that barrier actually are gone, but it matches the name used by all
major filesystems.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Avoid doing unessecary work in fsync. Do nothing unless the inode
was marked dirty, and only write the various metadata inodes out if
they contain any dirty state from this inode. This is archived by
adding three new dirty bits to the hfsplus-specific inode which are
set in the correct places.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Split the flags field in the hfsplus inode into an extent_state
flag that is locked by the extent_lock, and a new flags field
that uses atomic bitops. The second will grow more flags in the
next patch.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
fsync is supposed to not just work on regular files, but also on
directories. Fortunately enough hfsplus_file_fsync works just fine
for directories, so we can just wire it up.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Remove lots of code we don't need from fsync, we just need to call
->write_inode on the inode if it's dirty, for which sync_inode_metadata
is a lot more efficient than write_inode_now, and we need to write
out the various metadata inodes, which we now do explicitly instead
of by calling ->sync_fs.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
There is no reason to write out the metadata inodes or volume headers
during a non-blocking sync, as we are almost guaranteed to dirty them
again during the inode writeouts.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
hfsplus stores all metadata except for the volume headers in special
inodes. While these are marked hashed and periodically written out
by the flusher threads, we can't rely on that for sync. For the case
of a data integrity sync the VM has life-lock avoidance code that
avoids writing inodes again that are redirtied during the sync,
which is something that can happen easily for hfsplus. So make sure
we explicitly write out the metadata inodes at the beginning of
hfsplus_sync_fs.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Switch the hfsplus partition table reding for cdroms to use our bio
helpers. Again we don't rely on any caching in the buffer_heads, and
this gets rid of the last buffer_head use in hfsplus.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
The hfsplus backup volume header is located two blocks from the end of
the device. In case of device sizes that are not 4k aligned this means
we can't access it using buffer_heads when using the default 4k block
size.
Switch to using raw bios to read/write all buffer headers. We were not
relying on any caching behaviour of the buffer heads anyway. Additionally
always read in the backup volume header during mount to verify that we
can actually read it.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Remove opencoded writing of the volume header in hfsplus_fill_super
and hfsplus_put_super and offload it to hfsplus_sync_fs. In the
put_super case this means we only write the superblock once instead
of twice.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Turn a few noisy debug printks that show up during xfstests into
complied out debug print statements.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
hfsplus only actually uses the force option during remount, but it uses
the full option parser with a fake superblock to do so. This means remount
will fail if any nls option is set (which happens frequently with older
mount tools), even if it is the same.
Fix this by adding a simpler version of the parser that only parses the force
option for remount.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/hfsplus:
hfsplus: free space correcly for files unlinked while open
hfsplus: fix double lock typo in ioctl
hfsplus_delete_inode only truncates away all block allocations if
i_nlink is zero. Make sure we properly drop the unlink count even
when doing the rename hack for open but unlinked files.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Make node look as if it was on hlist, with hlist_del()
working correctly. Usable without any locking...
Convert a couple of places where we want to do that to
inode->i_hash.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This was supposed to be a mutex_unlock() instead of a mutex_lock().
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
We need to support -EOPNOTSUPP for attributes that are not supported to
match other filesystems and allow userspace to detect if Posix ACLs
are supported or not. setxattr already gets this right.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Make sure the initial insertation of the catalog entry already contains
the device number by calling init_special_inode early and setting writing
out the dev field of the on-disk permission structure. The latter is
facilitated by sharing the almost identical hfsplus_set_perms helpers
between initial catalog entry creating and ->write_inode.
Unless we crashed just after mknod this bug was harmless as the inode
is marked dirty at the end of hfsplus_mknod, and hfsplus_write_inode
will update the catalog entry to contain the correct value.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
The rootflags field in hfsplus_inode_info only caches the immutable and
append-only flags in the VFS inode, so we can easily get rid of it.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
HFS implements hardlink by using indirect catalog entries that refer to a hidden
directly. The link target is cached in the dev field in the HFS+ specific
inode, which is also used for the device number for device files, and inside
for passing the nlink value of the indirect node from hfsplus_cat_write_inode
to a helper function. Now if we happen to write out the indirect node while
hfsplus_link is creating the catalog entry we'll get a link pointing to the
linkid of the current nlink value. This can easily be reproduced by a large
enough loop of local git-clone operations.
Stop abusing the dev field in the HFS+ inode for short term storage by
refactoring the way the permission structure in the catalog entry is
set up, and rename the dev field to linkid to avoid any confusion.
While we're at it also prevent creating hard links to special files, as
the HFS+ dev and linkid share the same space in the on-disk structure.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
hfs seems prone to bad things when it encounters on disk corruption. Many
values are read from disk, and used as lengths to memcpy, as an example.
This patch fixes up several of these problematic cases.
o sanity check the on-disk maximum key lengths on mount
(these are set to a defined value at mkfs time and shouldn't differ)
o check on-disk node keylens against the maximum key length for each tree
o fix hfs_btree_open so that going out via free_tree: doesn't wind
up in hfs_releasepage, which wants to follow the very pointer
we were trying to set up:
HFS_SB(sb)->cat_tree = hfs_btree_open()
.
failure gets to hfs_releasepage and tries to follow HFS_SB(sb)->cat_tree
Tested with the fsfuzzer; it survives more than it used to.
[hch: ported of commit cf05946250 from hfs]
[hch: added the fixes from 5581d018ed3493d226e7a4d645d9c8a5af6c36b]
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
oops and fs corruption; the latter can happen even on valid fs in case of oom.
[hch: port of commit 3d10a15d69 from hfs]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
A particular fsfuzzer run caused an hfs file system to crash on mount. This
is due to a corrupted MDB extent record causing a miscalculation of
HFSPLUS_I(inode)->first_blocks for the extent tree. If the extent records
are zereod out, then it won't trigger the first_blocks special case and
instead falls through to the extent code, which we're in the middle
of initializing.
This patch catches the 0 size extent records, reports the corruption,
and fails the mount.
[hch: ported of commit 47f365eb57 from hfs]
Reported-by: Ramon de Carvalho Valle <rcvalle@linux.vnet.ibm.com>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
When renaming over a directory we need to use hfsplus_rmdir instead of
hfsplus_unlink to evict the victim. This makes sure we properly error out
on non-empty directory as required by Posix (BZ #16571), and it also makes
sure we do the right thing in case i_nlink will every be set correctly for
directories on hfsplus.
Reported-by: Vlado Plaga <rechner@vlado-do.de>
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Most of the extent handling code already does proper SMP locking, but
hfsplus_write_inode was calling into hfsplus_ext_write_extent without
taking the extents_lock. Fix this by splitting hfsplus_ext_write_extent
into an internal helper that expects the lock, and a public interface
that first acquires it.
Also add a few locking asserts and document the locking rules in
hfsplus_fs.h.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
We already have i_mutex for readdir and the namespace operations that add
entries to open_dir_list, the only thing that was missing was the removal
in hfsplus_dir_release.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
The flags in the HFS+-specific superlock do get modified during runtime,
use atomic bitops to make the modifications SMP safe.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
Lock updates to the mutal fields in the volume header, and document the
locing in the hfsplus_sb_info structure.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
We never walk the list - the only reason for it is to make the resource fork
inodes appear hashed to the writeback code. Borrow a trick from JFS to do
that without needing a list head.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>
We never look at it, nor change the next_alloc field in the superblock. So
don't bother caching it or writing it out in hfsplus_sync_fs.
Signed-off-by: Christoph Hellwig <hch@tuxera.com>