Prepare the ground for the next "map_files" patch which needs a name of a
link file to analyse.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
journal_init() doesn't need the lock since no operation on the filesystem
is involved there. journal_read() and get_list_bitmap() have yet to be
reviewed carefully though before removing the lock there. Just keep the
it around these two calls for safety.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the mount path, transactions that are made before journal
initialization don't involve the filesystem. We can delay the reiserfs
lock until we play with the journal.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randomization of PIE load address is hard coded in binfmt_elf.c for X86
and ARM. Create a new Kconfig variable
(CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE) for this and use it instead. Thus
architecture specific policy is pushed out of the generic binfmt_elf.c and
into the architecture Kconfig files.
X86 and ARM Kconfigs are modified to select the new variable so there is
no change in behavior. A follow on patch will select it for MIPS too.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
oom_score_adj is used for guarding processes from OOM-Killer. One of
problem is that it's inherited at fork(). When a daemon set oom_score_adj
and make children, it's hard to know where the value is set.
This patch adds some tracepoints useful for debugging. This patch adds
3 trace points.
- creating new task
- renaming a task (exec)
- set oom_score_adj
To debug, users need to enable some trace pointer. Maybe filtering is useful as
# EVENT=/sys/kernel/debug/tracing/events/task/
# echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
# echo "oom_score_adj != 0" > $EVENT/task_rename/filter
# echo 1 > $EVENT/enable
# EVENT=/sys/kernel/debug/tracing/events/oom/
# echo 1 > $EVENT/enable
output will be like this.
# grep oom /sys/kernel/debug/tracing/trace
bash-7699 [007] d..3 5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
bash-7699 [007] ...1 5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
ls-7729 [003] ...2 5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
bash-7699 [002] ...1 5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
grep-7730 [007] ...2 5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tell the page allocator that pages allocated for a buffered write are
expected to become dirty soon.
Signed-off-by: Johannes Weiner <jweiner@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Inode cache pruning indirectly reclaims page-cache by invalidating mapping
pages. Let's account them into reclaim-state to notice this progress in
memory reclaimer.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABCAAGBQJPDIkfAAoJENNvdpvBGATwWhwQAJ/rVsDAVSH7rbELW7jjdRRw
9vsDRbuCkBKsAi+r1nsfpjGnYTux1BnVU8BMGO8DnL6e43r9ihnm5o78amM9m8aL
EDvCqogIY3hzi31gsuIs43DBPs5LwjPPQemgbeE+m/Zbpno/Ju+fDHL286Uvitj3
X22cC691/Ce7vXhnTPS2+p2fZtT17kvwmkxkVyTs3E3gXcKKtwI3WUrwyF1iwCXF
+1WlKt1/YLdUs+OXPNNBNvLN5nF3b4UYOU7duQXDiY6ubro666VFIbRrIWVMRDFx
o4bySo3RBie4AAOJq3IALPDFdRhnkk/qxpBkWvgwqDnMiusogJP6yWh9RdR8OB07
8nrkobxFwkkDnOFU4KaVjVmPtLmeq4ksiQfDFPnYRg/SQ2+BsdoxV8jpnZvugzKV
XN+ABWqNucBuM/HoeinJF/UgT+nxpTr2QvH5ueNdvS3Kb9YsbrgJ/fX1941DWzMv
r9P//ZUOueSGhqUB0IopvXSrjAbRMF1SjuAP9oLEJZkJ/+/dCPMohJTyusZ8qAaY
MPb1QrQQej5dOb2jqe7BUe0eNKPT9eaZxCW3+hF5QMtlcLlJbHhDL5ijyyWJvTAZ
GZh5RwdwANxOEfl0atXZcniBvxD2FA6hi2P/d4nBXGa3EkZRk5bDsCj12cPGPoGa
irKRs7QRBiZGLYi0+W2a
=3oE5
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Ext4 commits for 3.3 merge window
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
ext4: fix undefined behavior in ext4_fill_flex_info()
ext4: make more symbols static
ext4: make local symbol ext4_initxattrs static
jbd2: fix hung processes in jbd2_journal_lock_updates()
ext4: reserve new feature flag codepoints
ext4: Report max_batch_time option correctly
ext4: add missing ext4_resize_end on error paths
ext4: let ext4_group_add() use common code
ext4: let ext4_group_extend() use common code
ext4: add new online resize interface
ext4: add a new function which adds a flex group to a fs
ext4: add a new function which allocates bitmaps and inode tables
ext4: pass verify_reserved_gdb() the number of group decriptors
ext4: add a function which updates the super block during online resizing
ext4: add a function which sets up a block group descriptors of a flex bg
ext4: add a function which sets up group blocks of a flex bg
ext4: add a structure which will be used by 64bit-resize interface
ext4: add a function which adds a new group descriptors to a fs
ext4: add a function which extends a group without checking parameters
ext4: use proper little-endian bitops
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: iattr_valid flags are kernel internal flags map them to 9p values.
fs/9p: We should not allocate a new inode when creating hardlines.
fs/9p: v9fs_stat2inode should update suid/sgid bits.
9p: Reduce object size with CONFIG_NET_9P_DEBUG
fs/9p: check schedule_timeout_interruptible return value
Fix up trivial conflicts in fs/9p/{vfs_inode.c,vfs_inode_dotl.c} due to
debug messages having changed to use p9_debug() on one hand, and the
changes for umode_t on the other.
* 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Change the default setting of the nfs4_disable_idmapping parameter
NFSv4: Save the owner/group name string when doing open
NFS: Remove pNFS bloat from the generic write path
pnfs-obj: Must return layout on IO error
pnfs-obj: pNFS errors are communicated on iodata->pnfs_error
NFS: Cache state owners after files are closed
NFS: Clean up nfs4_find_state_owners_locked()
NFSv4: include bitmap in nfsv4 get acl data
nfs: fix a minor do_div portability issue
NFSv4.1: cleanup comment and debug printk
NFSv4.1: change nfs4_free_slot parameters for dynamic slots
NFSv4.1: cleanup init and reset of session slot tables
NFSv4.1: fix backchannel slotid off-by-one bug
nfs: fix regression in handling of context= option in NFSv4
NFS - fix recent breakage to NFS error handling.
NFS: Retry mounting NFSROOT
SUNRPC: Clean up the RPCSEC_GSS service ticket requests
* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBI: fix use-after-free on error path
UBI: fix missing scrub when there is a bit-flip
UBIFS: Use kmemdup rather than duplicating its implementation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEYEABECAAYFAk8Mq/MACgkQdwG7hYl686PeFACfZCgbdDWD9A/JL+i1RMfExVu6
Pi0An3Hmc3PTCp0yQ21KtcKhpF9CAMEu
=NfuL
-----END PGP SIGNATURE-----
Merge tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6
MTD pull for 3.3
* tag 'for-linus-3.3' of git://git.infradead.org/mtd-2.6: (113 commits)
mtd: Fix dependency for MTD_DOC200x
mtd: do not use mtd->block_markbad directly
logfs: do not use 'mtd->block_isbad' directly
mtd: introduce mtd_can_have_bb helper
mtd: do not use mtd->suspend and mtd->resume directly
mtd: do not use mtd->lock, unlock and is_locked directly
mtd: do not use mtd->sync directly
mtd: harmonize mtd_writev usage
mtd: do not use mtd->lock_user_prot_reg directly
mtd: mtd->write_user_prot_reg directly
mtd: do not use mtd->read_*_prot_reg directly
mtd: do not use mtd->get_*_prot_info directly
mtd: do not use mtd->read_oob directly
mtd: mtdoops: do not use mtd->panic_write directly
romfs: do not use mtd->get_unmapped_area directly
mtd: do not use mtd->get_unmapped_area directly
mtd: do use mtd->point directly
mtd: introduce mtd_has_oob helper
mtd: mtdcore: export symbols cleanup
mtd: clean-up the default_mtd_writev function
...
Fix up trivial edit/remove conflict in drivers/staging/spectra/lld_mtd.c
Two (or more) concurrent calls of shrink_dcache_parent() on the same dentry may
cause shrink_dcache_parent() to loop forever.
Here's what appears to happen:
1 - CPU0: select_parent(P) finds C and puts it on dispose list, returns 1
2 - CPU1: select_parent(P) locks P->d_lock
3 - CPU0: shrink_dentry_list() locks C->d_lock
dentry_kill(C) tries to lock P->d_lock but fails, unlocks C->d_lock
4 - CPU1: select_parent(P) locks C->d_lock,
moves C from dispose list being processed on CPU0 to the new
dispose list, returns 1
5 - CPU0: shrink_dentry_list() finds dispose list empty, returns
6 - Goto 2 with CPU0 and CPU1 switched
Basically select_parent() steals the dentry from shrink_dentry_list() and thinks
it found a new one, causing shrink_dentry_list() to think it's making progress
and loop over and over.
One way to trigger this is to make udev calls stat() on the sysfs file while it
is going away.
Having a file in /lib/udev/rules.d/ with only this one rule seems to the trick:
ATTR{vendor}=="0x8086", ATTR{device}=="0x10ca", ENV{PCI_SLOT_NAME}="%k", ENV{MATCHADDR}="$attr{address}", RUN+="/bin/true"
Then execute the following loop:
while true; do
echo -bond0 > /sys/class/net/bonding_masters
echo +bond0 > /sys/class/net/bonding_masters
echo -bond1 > /sys/class/net/bonding_masters
echo +bond1 > /sys/class/net/bonding_masters
done
One fix would be to check all callers and prevent concurrent calls to
shrink_dcache_parent(). But I think a better solution is to stop the
stealing behavior.
This patch adds a new dentry flag that is set when the dentry is added to the
dispose list. The flag is cleared in dentry_lru_del() in case the dentry gets a
new reference just before being pruned.
If the dentry has this flag, select_parent() will skip it and let
shrink_dentry_list() retry pruning it. With select_parent() skipping those
dentries there will not be the appearance of progress (new dentries found) when
there is none, hence shrink_dcache_parent() will not loop forever.
Set the flag is also set in prune_dcache_sb() for consistency as suggested by
Linus.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Nothing we do here sleeps, so just do it under d_lock and avoid the dget/
dput entirely.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Sage Weil <sage@newdream.net>
We now set d_fsdata unconditionally on all dentries prior to setting up
the d_ops, so all of these checks are unnecessary.
Signed-off-by: Sage Weil <sage@newdream.net>
Commit 503358ae01 ("ext4: avoid divide by
zero when trying to mount a corrupted file system") fixes CVE-2009-4307
by performing a sanity check on s_log_groups_per_flex, since it can be
set to a bogus value by an attacker.
sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
groups_per_flex = 1 << sbi->s_log_groups_per_flex;
if (groups_per_flex < 2) { ... }
This patch fixes two potential issues in the previous commit.
1) The sanity check might only work on architectures like PowerPC.
On x86, 5 bits are used for the shifting amount. That means, given a
large s_log_groups_per_flex value like 36, groups_per_flex = 1 << 36
is essentially 1 << 4 = 16, rather than 0. This will bypass the check,
leaving s_log_groups_per_flex and groups_per_flex inconsistent.
2) The sanity check relies on undefined behavior, i.e., oversized shift.
A standard-confirming C compiler could rewrite the check in unexpected
ways. Consider the following equivalent form, assuming groups_per_flex
is unsigned for simplicity.
groups_per_flex = 1 << sbi->s_log_groups_per_flex;
if (groups_per_flex == 0 || groups_per_flex == 1) {
We compile the code snippet using Clang 3.0 and GCC 4.6. Clang will
completely optimize away the check groups_per_flex == 0, leaving the
patched code as vulnerable as the original. GCC keeps the check, but
there is no guarantee that future versions will do the same.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
lookup should fail with ENOMEM, not silently make dentry negative.
Switched to saner calling conventions, while we are at it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: new helper - d_make_root()
dcache: use a dispose list in select_parent
ceph: d_alloc_root() may fail
ext4: fix failure exits
isofs: inode leak on mount failure
select_parent currently abuses the dentry cache LRU to provide
cleanup features for child dentries that need to be freed. It moves
them to the tail of the LRU, then tells shrink_dcache_parent() to
calls __shrink_dcache_sb to unconditionally move them to a dispose
list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to
relock the dentries to move them off the LRU onto the dispose list,
but otherwise does not touch the dentries that select_parent() moved
to the tail of the LRU. It then passses the dispose list to
shrink_dentry_list() which tries to free the dentries.
IOWs, the use of __shrink_dcache_sb() is superfluous - we can build
exactly the same list of dentries for disposal directly in
select_parent() and call shrink_dentry_list() instead of calling
__shrink_dcache_sb() to do that. This means that we avoid long holds
on the lru lock walking the LRU moving dentries to the dispose list
We also avoid the need to relock each dentry just to move it off the
LRU, reducing the numebr of times we lock each dentry to dispose of
them in shrink_dcache_parent() from 3 to 2 times.
Further, we remove one of the two callers of __shrink_dcache_sb().
This also means that __shrink_dcache_sb can be moved into back into
prune_dcache_sb() and we no longer have to handle referenced
dentries conditionally, simplifying the code.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) leaking root dentry is bad
b) in case of failed ext4_mb_init() we don't want to do ext4_mb_release()
c) OTOH, in the same case we *do* want ext4_ext_release()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2/3/4: delete unneeded includes of module.h
ext{3,4}: Fix potential race when setversion ioctl updates inode
udf: Mark LVID buffer as uptodate before marking it dirty
ext3: Don't warn from writepage when readonly inode is spotted after error
jbd: Remove j_barrier mutex
reiserfs: Force inode evictions before umount to avoid crash
reiserfs: Fix quota mount option parsing
udf: Treat symlink component of type 2 as /
udf: Fix deadlock when converting file from in-ICB one to normal one
udf: Cleanup calling convention of inode_getblk()
ext2: Fix error handling on inode bitmap corruption
ext3: Fix error handling on inode bitmap corruption
ext3: replace ll_rw_block with other functions
ext3: NULL dereference in ext3_evict_inode()
jbd: clear revoked flag on buffers before a new transaction started
ext3: call ext3_mark_recovery_complete() when recovery is really needed
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
ore: Must support none-PAGE-aligned IO
ore: fix BUG_ON, too few sgs when reading
ore: Fix crash in case of an IO error.
ore: FIX breakage when MISC_FILESYSTEMS is not set
Now that the use of numeric uids/gids is officially sanctioned in
RFC3530bis, it is time to change the default here to 'enabled'.
By doing so, we ensure that NFSv4 copies the behaviour of NFSv3 when we're
using the default AUTH_SYS authentication (i.e. when the client uses the
numeric uids/gids as authentication tokens), so that when new files are
created, they will appear to have the correct user/group.
It also fixes a number of backward compatibility issues when migrating
from NFSv3 to NFSv4 on a platform where the server uses different uid/gid
mappings than the client.
Note also that this setting has been successfully tested against servers
that do not support numeric uids/gids at several Connectathon/Bakeathon
events at this point, and the fall back to using string names/groups has
been shown to work well in all those test cases.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead, use the new 'mtd_can_have_bb()', or just rely on 'mtd_block_markbad()'
return code, which will be -EOPNOTSUPP if bad blocks are not supported.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Instead, use the new 'mtd_can_have_bb()' helper.
Cc: Jörn Engel <joern@logfs.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch teaches 'mtd_sync()' to do nothing when the MTD driver does
not have the '->sync()' method, which allows us to remove all direct
'mtd->sync' accesses.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch makes the 'mtd_writev()' function more usable and logical. We first
teach it to fall-back to the 'default_mtd_writev()' function if the MTD driver
does not define its own '->writev()' method. Then we make block2mtd and JFFS2
just 'mtd_writev()' instead of 'default_mtd_writev()' function. This means we
can now stop exporting 'default_mtd_writev()' and instead, export
'mtd_writev()'. This is much cleaner and more logical, as well as allows us to
get read of another direct 'mtd->writev' access in JFFS2.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Remove direct usage of mtd->get_unmapped_area. Instead, just call
'mtd_get_unmapped_area()' which will return -EOPNOTSUPP if the function
is not implemented, and then test for this code.
We also translate -EOPNOTSUPP to -ENOSYS because this return code is
probably part of the kernel ABI which we do not want to break.
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Remove direct usage of the "mtd->point" function pointer. Instead,
test the mtd_point() return code for '-EOPNOTSUPP'.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Instead, use 'default_mtd_writev()' function which MTD provides.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch is part of a patch-set which changes the MTD interface
from 'mtd->func()' form to 'mtd_func()' form. We need this because
we want to add common code to to all drivers in the mtd core level,
which is impossible with the current interface when MTD clients
call driver functions like 'read()' or 'write()' directly.
At this point we just introduce a new inline wrapper function, but
later some of them are expected to gain more code. E.g., the input
parameters check should be moved to the wrappers rather than be
duplicated at many drivers.
This particular patch introduced the 'mtd_erase()' interface. The
following patches add all the other interfaces one by one.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We are going to re-work the MTD interface and change 'mtd->write()' to
'mtd_write()', 'mtd->read()' to 'mtd_read()' and so forth for all functions
in the 'struct mtd_info' structure.
However, logfs has its own 'mtd_read()', 'mtd_write()', etc functions
which collide with our changes. This patch renames these logfs functions
to 'logfs_mtd_read()', 'logfs_mtd_write()', etc.
Additionally, to make the 'fs/logfs/dev_mtd.c' file look consistent, rename
similarly all the other functions starting with 'mtd_'.
Cc: Jörn Engel <joern@logfs.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
after 250df6ed27
(fs: protect inode->i_state with inode->i_lock), insert_inode_locked()
no longer returns the inode with I_NEW set on failure. However,
the error handler still calls unlock_new_inode() on failure,
which does a WARN_ON if I_NEW is not set, so any failure spews
a lot of warnings.
We can just drop the unlock_new_inode() if insert_inode_locked()
fails here.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Delete any instances of include module.h that were not strictly
required. In the case of ext2, the declaration of MODULE_LICENSE
etc. were in inode.c but the module_init/exit were in super.c, so
relocate the MODULE_LICENCE/AUTHOR block to super.c which makes it
consistent with ext3 and ext4 at the same time.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jan Kara <jack@suse.cz>
The EXT{3,4}_IOC_SETVERSION ioctl() updates i_ctime and i_generation
without i_mutex. This can lead to a race with the other operations that
update i_ctime. This is not a big issue but let's make the ioctl consistent
with how we handle e.g. other timestamp updates and use i_mutex to protect
inode changes.
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
When we hit EIO while writing LVID, the buffer uptodate bit is cleared.
This then results in an anoying warning from mark_buffer_dirty() when we
write the buffer again. So just set uptodate flag unconditionally.
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
WARN_ON_ONCE(IS_RDONLY(inode)) tends to trip when filesystem hits error and is
remounted read-only. This unnecessarily scares users (well, they should be
scared because of filesystem error, but the stack trace distracts them from the
right source of their fear ;-). We could as well just remove the WARN_ON but
it's not hard to fix it to not trip on filesystem with errors and not use more
cycles in the common case so that's what we do.
CC: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
j_barrier mutex is used for serializing different journal lock operations. The
problem with it is that e.g. FIFREEZE ioctl results in process leaving kernel
with j_barrier mutex held which makes lockdep freak out. Also hibernation code
wants to freeze filesystem but it cannot do so because it then cannot hibernate
the system because of mutex being locked.
So we remove j_barrier mutex and use direct wait on j_barrier_count instead.
Since locking journal is a rare operation we don't have to care about fairness
or such things.
CC: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Jan Kara <jack@suse.cz>
This patch fixes a crash in reiserfs_delete_xattrs during umount.
When shrink_dcache_for_umount clears the dcache from
generic_shutdown_super, delayed evictions are forced to disk. If an
evicted inode has extended attributes associated with it, it will
need to walk the xattr tree to locate and remove them.
But since shrink_dcache_for_umount will BUG if it encounters active
dentries, the xattr tree must be released before it's called or it will
crash during every umount.
This patch forces the evictions to occur before generic_shutdown_super
by calling shrink_dcache_sb first. The additional evictions caused
by the removal of each associated xattr file and dir will be automatically
handled as they're added to the LRU list.
CC: reiserfs-devel@vger.kernel.org
CC: stable@kernel.org
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When jqfmt mount option is not specified on remount, we mistakenly clear
s_jquota_fmt value stored in superblock. Fix the problem.
CC: stable@kernel.org
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Currently, we ignore symlink component of type 2. But mkisofs and other OS'
seem to treat it as / so do the same for compatibility.
Reported-by: "Gábor S." <otnaccess@hotmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
During BKL removal in 2.6.38, conversion of files from in-ICB format to normal
format got broken. We call ->writepage with i_data_sem held but udf_get_block()
also acquires i_data_sem thus creating A-A deadlock.
We fix the problem by dropping i_data_sem before calling ->writepage() which is
safe since i_mutex still protects us against any changes in the file. Also fix
pagelock - i_data_sem lock inversion in udf_expand_file_adinicb() by dropping
i_data_sem before calling find_or_create_page().
CC: stable@kernel.org
Reported-by: Matthias Matiak <netzpython@mail-on.us>
Tested-by: Matthias Matiak <netzpython@mail-on.us>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
inode_getblk() always returned NULL and passed results in its parameters.
Make the function return something useful - found block number.
Signed-off-by: Jan Kara <jack@suse.cz>
When insert_inode_locked() fails in ext2_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by informing about filesystem error and
jumping to fail: (instead of fail_drop:) which doesn't call unlock_new_inode().
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
ll_rw_block() is deprecated. Thus we replace it with other functions.
CC: Jan Kara <jack@suse.cz>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Both ext3 and ext4 put the half-created symlink inode into the orphan list
for a while (see the comment in ext[34]_symlink() for gory details). Then,
if everything went fine, they pull it out of the orphan list and bump the
link count back to 1. The thing is, inc_nlink() is going to complain about
seeing somebody changing i_nlink from 0 to 1. With a good reason, since
normally something like that is a bug. Explicit set_nlink(inode, 1) does
the same thing as inc_nlink() here, but it does *not* complain - exactly
because it should be usable in strange situations like this one.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
We have already set ->s_root, so ->put_super() is going to be called.
Freeing ->s_fs_info is a bloody bad idea when it's going to be
dereferenced very shortly...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
devpts_kill_sb() is called even if devpts_fill_super() fails;
we should not do that kfree() in the latter, especially not
with ->s_fs_info left pointing to freed object. Double kfree()
is a Bad Thing(tm)...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
Kconfig: acpi: Fix typo in comment.
misc latin1 to utf8 conversions
devres: Fix a typo in devm_kfree comment
btrfs: free-space-cache.c: remove extra semicolon.
fat: Spelling s/obsolate/obsolete/g
SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
tools/power turbostat: update fields in manpage
mac80211: drop spelling fix
types.h: fix comment spelling for 'architectures'
typo fixes: aera -> area, exntension -> extension
devices.txt: Fix typo of 'VMware'.
sis900: Fix enum typo 'sis900_rx_bufer_status'
decompress_bunzip2: remove invalid vi modeline
treewide: Fix comment and string typo 'bufer'
hyper-v: Update MAINTAINERS
treewide: Fix typos in various parts of the kernel, and fix some comments.
clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
leds: Kconfig: Fix typo 'D2NET_V2'
sound: Kconfig: drop unknown symbol ARCH_CLPS7500
...
Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
PM / Hibernate: Implement compat_ioctl for /dev/snapshot
PM / Freezer: fix return value of freezable_schedule_timeout_killable()
PM / shmobile: Allow the A4R domain to be turned off at run time
PM / input / touchscreen: Make st1232 use device PM QoS constraints
PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
PM / shmobile: Remove the stay_on flag from SH7372's PM domains
PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
PM: Drop generic_subsys_pm_ops
PM / Sleep: Remove forward-only callbacks from AMBA bus type
PM / Sleep: Remove forward-only callbacks from platform bus type
PM: Run the driver callback directly if the subsystem one is not there
PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
PM / Sleep: Merge internal functions in generic_ops.c
PM / Sleep: Simplify generic system suspend callbacks
PM / Hibernate: Remove deprecated hibernation snapshot ioctls
PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
ARM: S3C64XX: Implement basic power domain support
PM / shmobile: Use common always on power domain governor
...
Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
XBT_FORCE_SLEEP bit
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
GFS2: local functions should be static
GFS2: We only need one ACL getting function
GFS2: Fix multi-block allocation
GFS2: decouple quota allocations from block allocations
GFS2: split function rgblk_search
GFS2: Fix up "off by one" in the previous patch
GFS2: move toward a generic multi-block allocator
GFS2: O_(D)SYNC support for fallocate
GFS2: remove vestigial al_alloced
GFS2: combine gfs2_alloc_block and gfs2_alloc_di
GFS2: Add non-try locks back to get_local_rgrp
GFS2: f_ra is always valid in dir readahead function
GFS2: Fix very unlikley memory leak in ACL xattr code
GFS2: More automated code analysis fixes
GFS2: Add readahead to sequential directory traversal
GFS2: Fix up REQ flags
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (22 commits)
xfs: mark the xfssyncd workqueue as non-reentrant
xfs: simplify xfs_qm_detach_gdquots
xfs: fix acl count validation in xfs_acl_from_disk()
xfs: remove unused XBT_FORCE_SLEEP bit
xfs: remove XFS_QMOPT_DQSUSER
xfs: kill xfs_qm_idtodq
xfs: merge xfs_qm_dqinit_core into the only caller
xfs: add a xfs_dqhold helper
xfs: simplify xfs_qm_dqattach_grouphint
xfs: nest qm_dqfrlist_lock inside the dquot qlock
xfs: flatten the dquot lock ordering
xfs: implement lazy removal for the dquot freelist
xfs: remove XFS_DQ_INACTIVE
xfs: cleanup xfs_qm_dqlookup
xfs: cleanup dquot locking helpers
xfs: remove the sync_mode argument to xfs_qm_dqflush_all
xfs: remove xfs_qm_sync
xfs: make sure to really flush all dquots in xfs_qm_quotacheck
xfs: untangle SYNC_WAIT and SYNC_TRYLOCK meanings for xfs_qm_dqflush
xfs: remove the lid_size field in struct log_item_desc
...
Fix up trivial conflict in fs/xfs/xfs_sync.c
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
reiserfs: Properly display mount options in /proc/mounts
vfs: prevent remount read-only if pending removes
vfs: count unlinked inodes
vfs: protect remounting superblock read-only
vfs: keep list of mounts for each superblock
vfs: switch ->show_options() to struct dentry *
vfs: switch ->show_path() to struct dentry *
vfs: switch ->show_devname() to struct dentry *
vfs: switch ->show_stats to struct dentry *
switch security_path_chmod() to struct path *
vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
vfs: trim includes a bit
switch mnt_namespace ->root to struct mount
vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
vfs: opencode mntget() mnt_set_mountpoint()
vfs: spread struct mount - remaining argument of next_mnt()
vfs: move fsnotify junk to struct mount
vfs: move mnt_devname
vfs: move mnt_list to struct mount
vfs: switch pnode.h macros to struct mount *
...
NFS might send us offsets that are not PAGE aligned. So
we must read in the reminder of the first/last pages, in cases
we need it for Parity calculations.
We only add an sg segments to read the partial page. But
we don't mark it as read=true because it is a lock-for-write
page.
TODO: In some cases (IO spans a single unit) we can just
adjust the raid_unit offset/length, but this is left for
later Kernels.
[Bug in 3.2.0 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Fix compile error
fs/fs-writeback.c:515:33: error: ‘PAGE_CACHE_SHIFT’ undeclared (first use in this function)
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
arm: fix up some samsung merge sysdev conversion problems
firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
Drivers:hv: Fix a bug in vmbus_driver_unregister()
driver core: remove __must_check from device_create_file
debugfs: add missing #ifdef HAS_IOMEM
arm: time.h: remove device.h #include
driver-core: remove sysdev.h usage.
clockevents: remove sysdev.h
arm: convert sysdev_class to a regular subsystem
arm: leds: convert sysdev_class to a regular subsystem
kobject: remove kset_find_obj_hinted()
m86k: gpio - convert sysdev_class to a regular subsystem
mips: txx9_sram - convert sysdev_class to a regular subsystem
mips: 7segled - convert sysdev_class to a regular subsystem
sh: dma - convert sysdev_class to a regular subsystem
sh: intc - convert sysdev_class to a regular subsystem
power: suspend - convert sysdev_class to a regular subsystem
power: qe_ic - convert sysdev_class to a regular subsystem
power: cmm - convert sysdev_class to a regular subsystem
s390: time - convert sysdev_class to a regular subsystem
...
Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
- arch/arm/mach-exynos/cpu.c
- arch/arm/mach-exynos/irq-eint.c
- arch/arm/mach-s3c64xx/common.c
- arch/arm/mach-s3c64xx/cpu.c
- arch/arm/mach-s5p64x0/cpu.c
- arch/arm/mach-s5pv210/common.c
- arch/arm/plat-samsung/include/plat/cpu.h
- arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
...so that we can do the uid/gid mapping outside the asynchronous RPC
context.
This fixes a bug in the current NFSv4 atomic open code where the client
isn't able to determine what the true uid/gid fields of the file are,
(because the asynchronous nature of the OPEN call denies it the ability
to do an upcall) and so fills them with default values, marking the
inode as needing revalidation.
Unfortunately, in some cases, the VFS will do some additional sanity
checks on the file, and may override the server's decision to allow
the open because it sees the wrong owner/group fields.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Make reiserfs properly display mount options in /proc/mounts.
CC: reiserfs-devel@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If there are any inodes on the super block that have been unlinked
(i_nlink == 0) but have not yet been deleted then prevent the
remounting the super block read-only.
Reported-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a new counter to the superblock that keeps track of unlinked but
not yet deleted inodes.
Do not WARN_ON if set_nlink is called with zero count, just do a
ratelimited printk. This happens on xfs and probably other
filesystems after an unclean shutdown when the filesystem reads inodes
which already have zero i_nlink. Reported by Christoph Hellwig.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Currently remouting superblock read-only is racy in a major way.
With the per mount read-only infrastructure it is now possible to
prevent most races, which this patch attempts.
Before starting the remount read-only, iterate through all mounts
belonging to the superblock and if none of them have any pending
writes, set sb->s_readonly_remount. This indicates that remount is in
progress and no further write requests are allowed. If the remount
succeeds set MS_RDONLY and reset s_readonly_remount.
If the remounting is unsuccessful just reset s_readonly_remount.
This can result in transient EROFS errors, despite the fact the
remount failed. Unfortunately hodling off writes is difficult as
remount itself may touch the filesystem (e.g. through load_nls())
which would deadlock.
A later patch deals with delayed writes due to nlink going to zero.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Keep track of vfsmounts belonging to a superblock. List is protected
by vfsmount_lock.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1958 commits)
net: pack skb_shared_info more efficiently
net_sched: red: split red_parms into parms and vars
net_sched: sfq: extend limits
cnic: Improve error recovery on bnx2x devices
cnic: Re-init dev->stats_addr after chip reset
net_sched: Bug in netem reordering
bna: fix sparse warnings/errors
bna: make ethtool_ops and strings const
xgmac: cleanups
net: make ethtool_ops const
vmxnet3" make ethtool ops const
xen-netback: make ops structs const
virtio_net: Pass gfp flags when allocating rx buffers.
ixgbe: FCoE: Add support for ndo_get_fcoe_hbainfo() call
netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call
igb: reset PHY after recovering from PHY power down
igb: add basic runtime PM support
igb: Add support for byte queue limits.
e1000: cleanup CE4100 MDIO registers access
e1000: unmap ce4100_gbe_mdio_base_virt in e1000_remove
...
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.
The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
sched/tracing: Add a new tracepoint for sleeptime
sched: Disable scheduler warnings during oopses
sched: Fix cgroup movement of waking process
sched: Fix cgroup movement of newly created process
sched: Fix cgroup movement of forking process
sched: Remove cfs bandwidth period check in tg_set_cfs_period()
sched: Fix load-balance lock-breaking
sched: Replace all_pinned with a generic flags field
sched: Only queue remote wakeups when crossing cache boundaries
sched: Add missing rcu_dereference() around ->real_parent usage
[S390] fix cputime overflow in uptime_proc_show
[S390] cputime: add sparse checking and cleanup
sched: Mark parent and real_parent as __rcu
sched, nohz: Fix missing RCU read lock
sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
sched, nohz: Fix the idle cpu check in nohz_idle_balance
sched: Use jump_labels for sched_feat
sched/accounting: Fix parameter passing in task_group_account_field
sched/accounting: Fix user/system tick double accounting
sched/accounting: Re-use scheduler statistics for the root cgroup
...
Fix up conflicts in
- arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
usecs_to_cputime64() vs the sparse cleanups
- kernel/sched/fair.c, kernel/time/tick-sched.c
scheduler changes in multiple branches