Commit Graph

29177 Commits

Author SHA1 Message Date
Jeff Layton
669abf4e55 vfs: make path_openat take a struct filename pointer
...and fix up the callers. For do_file_open_root, just declare a
struct filename on the stack and fill out the .name field. For
do_filp_open, make it also take a struct filename pointer, and fix up its
callers to call it appropriately.

For filp_open, add a variant that takes a struct filename pointer and turn
filp_open into a wrapper around it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 20:15:09 -04:00
Jeff Layton
873f1eedc1 vfs: turn do_path_lookup into wrapper around struct filename variant
...and make the user_path callers use that variant instead.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 20:15:08 -04:00
Jeff Layton
7ac86265dc audit: allow audit code to satisfy getname requests from its names_list
Currently, if we call getname() on a userland string more than once,
we'll get multiple copies of the string and multiple audit_names
records.

Add a function that will allow the audit_names code to satisfy getname
requests using info from the audit_names list, avoiding a new allocation
and audit_names records.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 20:15:08 -04:00
Jeff Layton
91a27b2a75 vfs: define struct filename and have getname() return it
getname() is intended to copy pathname strings from userspace into a
kernel buffer. The result is just a string in kernel space. It would
however be quite helpful to be able to attach some ancillary info to
the string.

For instance, we could attach some audit-related info to reduce the
amount of audit-related processing needed. When auditing is enabled,
we could also call getname() on the string more than once and not
need to recopy it from userspace.

This patchset converts the getname()/putname() interfaces to return
a struct instead of a string. For now, the struct just tracks the
string in kernel space and the original userland pointer for it.

Later, we'll add other information to the struct as it becomes
convenient.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 20:14:55 -04:00
Eric W. Biederman
e9069f4708 btrfs: Fix compilation with user namespace support enabled
When compiling with user namespace support btrfs fails like:

fs/btrfs/tree-log.c: In function ‘fill_inode_item’:
fs/btrfs/tree-log.c:2955:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_uid’
fs/btrfs/ctree.h:2026:1: note: expected ‘u32’ but argument is of type ‘kuid_t’
fs/btrfs/tree-log.c:2956:2: error: incompatible type for argument 3 of ‘btrfs_set_inode_gid’
fs/btrfs/ctree.h:2027:1: note: expected ‘u32’ but argument is of type ‘kgid_t’

Fix this by using i_uid_read and i_gid_read in

Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-10-12 15:01:42 -07:00
Eric W. Biederman
ea1fd7776e userns: Fix posix_acl_file_xattr_userns gid conversion
The code needs to be from_kgid(make_kgid(...)...) not
from_kuid(make_kgid(...)...). Doh!

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-10-12 13:16:48 -07:00
Jeff Layton
8e377d1507 vfs: unexport getname and putname symbols
I see no callers in module code.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 00:32:09 -04:00
Jeff Layton
4fa6b5ecbf audit: overhaul __audit_inode_child to accomodate retrying
In order to accomodate retrying path-based syscalls, we need to add a
new "type" argument to audit_inode_child. This will tell us whether
we're looking for a child entry that represents a create or a delete.

If we find a parent, don't automatically assume that we need to create a
new entry. Instead, use the information we have to try to find an
existing entry first. Update it if one is found and create a new one if
not.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 00:32:03 -04:00
Jeff Layton
bfcec70874 audit: set the name_len in audit_inode for parent lookups
Currently, this gets set mostly by happenstance when we call into
audit_inode_child. While that might be a little more efficient, it seems
wrong. If the syscall ends up failing before audit_inode_child ever gets
called, then you'll have an audit_names record that shows the full path
but has the parent inode info attached.

Fix this by passing in a parent flag when we call audit_inode that gets
set to the value of LOOKUP_PARENT. We can then fix up the pathname for
the audit entry correctly from the get-go.

While we're at it, clean up the no-op macro for audit_inode in the
!CONFIG_AUDITSYSCALL case.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 00:32:01 -04:00
Jeff Layton
c43a25abba audit: reverse arguments to audit_inode_child
Most of the callers get called with an inode and dentry in the reverse
order. The compiler then has to reshuffle the arg registers and/or
stack in order to pass them on to audit_inode_child.

Reverse those arguments for a micro-optimization.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 00:32:00 -04:00
Jeff Layton
f78570dd6a audit: remove unnecessary NULL ptr checks from do_path_lookup
As best I can tell, whenever retval == 0, nd->path.dentry and nd->inode
are also non-NULL. Eliminate those checks and the superfluous
audit_context check.

Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-12 00:31:59 -04:00
Linus Torvalds
79360ddd73 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull pile 2 of vfs updates from Al Viro:
 "Stuff in this one - assorted fixes, lglock tidy-up, death to
  lock_super().

  There'll be a VFS pile tomorrow (with patches from Jeff Layton,
  sanitizing getname() and related parts of audit and preparing for
  ESTALE fixes), but I'd rather push the stuff in this one ASAP - some
  of the bugs closed here are quite unpleasant."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: bogus warnings in fs/namei.c
  consitify do_mount() arguments
  lglock: add DEFINE_STATIC_LGLOCK()
  lglock: make the per_cpu locks static
  lglock: remove unused DEFINE_LGLOCK_LOCKDEP()
  MAX_LFS_FILESIZE definition for 64bit needs LL...
  tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking
  vfs: drop lock/unlock super
  ufs: drop lock/unlock super
  sysv: drop lock/unlock super
  hpfs: drop lock/unlock super
  fat: drop lock/unlock super
  ext3: drop lock/unlock super
  exofs: drop lock/unlock super
  dup3: Return an error when oldfd == newfd.
  fs: handle failed audit_log_start properly
  fs: prevent use after free in auditing when symlink following was denied
2012-10-12 10:52:03 +09:00
Linus Torvalds
40924754f2 Merge branch 'writeback-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux
Pull writeback fixes from Fengguang Wu:
 "Three trivial writeback fixes"

* 'writeback-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  CPU hotplug, writeback: Don't call writeback_set_ratelimit() too often during hotplug
  writeback: correct comment for move_expired_inodes()
  backing-dev: use kstrto* in preference to simple_strtoul
2012-10-12 10:46:03 +09:00
Linus Torvalds
940e3a8dd6 The following changes since commit 4cbe5a555f:
Linux 3.6-rc4 (2012-09-01 10:39:58 -0700)
 
 are available in the git repository at:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs.git for-next
 
 for you to fetch changes up to 552aad02a283ee88406b102b4d6455eef7127196:
 
   9P: Fix race between p9_write_work() and p9_fd_request() (2012-09-17 14:54:11 -0500)
 
 ----------------------------------------------------------------
 Jeff Layton (1):
       9p: don't use __getname/__putname for uname/aname
 
 Jim Meyering (1):
       fs/9p: avoid debug OOPS when reading a long symlink
 
 Simon Derr (5):
       net/9p: Check errno validity
       9P: Fix race in p9_read_work()
       9P: fix test at the end of p9_write_work()
       9P: Fix race in p9_write_work()
       9P: Fix race between p9_write_work() and p9_fd_request()
 
  fs/9p/v9fs.c      |   30 +++++++++++++++++++-----------
  fs/9p/vfs_inode.c |    8 ++++----
  net/9p/client.c   |   18 ++++++++++++++++--
  net/9p/trans_fd.c |   38 ++++++++++++++++++++------------------
  4 files changed, 59 insertions(+), 35 deletions(-)
 
 Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
 Comment: GPGTools - http://gpgtools.org
 
 iQIcBAABAgAGBQJQdvxdAAoJEDZk62b0Tg6xru8QAL1I6YH+O6c+sHONQQnifkl/
 WciZDUSYx7Pd4Ffy48m6y5J6M2VNFUIzqpIKnm4xAiHUwct/E/+yuyfK2zAe1Jxc
 nqPxoU2iyFWXc1Hu5HQhrjMXMlqePPuF1kwTYd0vCXXbVgWfbhwYfjoRr3PGuVTD
 3SpQrBxIvQj1aWRMyyQTcnqnmTLPFr1kX0TRBgvipSfQETVFR5gCXK8sJUDvU+0S
 4kywmb3y31/EpcKdDs7CE1m5kCi6T2mguP5NR4dHtN8YT76IW4urIqyAw6069wQV
 AMmoqhJP2cJ6kyyh93ltZSgcMIUgfrDj2pIsGT3hILusTh9vBT10Db8iNT2ledy8
 W+TxjK0/H0h5rfitHYqD+XnCF4pKFRm5aOOYL8jg02Uh8jU9MzkAIw1/fmXUOZ7O
 rht+HttJht2QCFniV1C442hbzL0J5mYsGPwpWZ5j4dN7PBIi8SYh+Ik0la4rRa8I
 m9C04HHvPsc0gRXPAp1+Ptby4FnPS846a9Ffm4xrkNhFl3z916ef67MnoCGu3roM
 GU9FEOdWhSWJ+52qLcXZwqkrPvlUMOehwnSjlab3BCThPRVK0D7gdTzBN4NDQZWo
 AzhK5sNRFwEidnYo7gy0g2UsRWRgPP7fiUe/xtlWaBlm0DU1+jZc/uzjEn6/h77R
 fQfniKFcMRFIeksGts5e
 =hADE
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-merge-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull v9fs update from Eric Van Hensbergen.

* tag 'for-linus-merge-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9P: Fix race between p9_write_work() and p9_fd_request()
  9P: Fix race in p9_write_work()
  9P: fix test at the end of p9_write_work()
  9P: Fix race in p9_read_work()
  9p: don't use __getname/__putname for uname/aname
  net/9p: Check errno validity
  fs/9p: avoid debug OOPS when reading a long symlink
2012-10-12 09:59:23 +09:00
Arnd Bergmann
98f6ef64b1 vfs: bogus warnings in fs/namei.c
The follow_link() function always initializes its *p argument,
or returns an error, but when building with 'gcc -s', the compiler
gets confused by the __always_inline attribute to the function
and can no longer detect where the cookie was initialized.

The solution is to always initialize the pointer from follow_link,
even in the error path. When building with -O2, this has zero impact
on generated code and adds a single instruction in the error path
for a -Os build on ARM.

Without this patch, building with gcc-4.6 through gcc-4.8 and
CONFIG_CC_OPTIMIZE_FOR_SIZE results in:

fs/namei.c: In function 'link_path_walk':
fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
fs/namei.c:1544:9: note: 'cookie' was declared here
fs/namei.c: In function 'path_lookupat':
fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
fs/namei.c:1934:10: note: 'cookie' was declared here
fs/namei.c: In function 'path_openat':
fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
fs/namei.c:2899:9: note: 'cookie' was declared here

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-11 20:02:16 -04:00
Al Viro
808d4e3cfd consitify do_mount() arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-11 20:02:04 -04:00
J. Bruce Fields
a9ca4043d0 Merge Trond's bugfixes
Merge branch 'bugfixes' of git://linux-nfs.org/~trondmy/nfs-2.6 into
for-3.7-incoming.  Mainly needed for Bryan's "SUNRPC: Set alloc_slot for
backchannel tcp ops", without which the 4.1 server oopses.
2012-10-11 12:41:05 -04:00
Ian Kent
49999ab27e autofs4 - fix reset pending flag on mount fail
In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING
mount pending flag is not cleared.

One effect of this is when using the "browse" option, directory entry
attributes show up with all "?"s due to the incorrect callback and
subsequent failure return (when in fact no callback should be made).

Signed-off-by: Ian Kent <ikent@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-11 10:21:16 +09:00
Linus Torvalds
ce40be7a82 Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block
Pull block IO update from Jens Axboe:
 "Core block IO bits for 3.7.  Not a huge round this time, it contains:

   - First series from Kent cleaning up and generalizing bio allocation
     and freeing.

   - WRITE_SAME support from Martin.

   - Mikulas patches to prevent O_DIRECT crashes when someone changes
     the block size of a device.

   - Make bio_split() work on data-less bio's (like trim/discards).

   - A few other minor fixups."

Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
Morton.  It is due to the VM no longer using a prio-tree (see commit
6b2dbba8b6: "mm: replace vma prio_tree with an interval tree").

So make set_blocksize() use mapping_mapped() instead of open-coding the
internal VM knowledge that has changed.

* 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
  block: makes bio_split support bio without data
  scatterlist: refactor the sg_nents
  scatterlist: add sg_nents
  fs: fix include/percpu-rwsem.h export error
  percpu-rw-semaphore: fix documentation typos
  fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
  blockdev: turn a rw semaphore into a percpu rw semaphore
  Fix a crash when block device is read and block size is changed at the same time
  block: fix request_queue->flags initialization
  block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
  block: ioctl to zero block ranges
  block: Make blkdev_issue_zeroout use WRITE SAME
  block: Implement support for WRITE SAME
  block: Consolidate command flag and queue limit checks for merges
  block: Clean up special command handling logic
  block/blk-tag.c: Remove useless kfree
  block: remove the duplicated setting for congestion_threshold
  block: reject invalid queue attribute values
  block: Add bio_clone_bioset(), bio_clone_kmalloc()
  block: Consolidate bio_alloc_bioset(), bio_kmalloc()
  ...
2012-10-11 09:04:23 +09:00
Linus Torvalds
df632d3ce7 NFS client updates for Linux 3.7
Features include:
 
 - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
   Aside from the issues discussed at the LKS, distros are shipping
   NFSv4.1 with all the trimmings.
 - Fix fdatasync()/fsync() for the corner case of a server reboot.
 - NFSv4 OPEN access fix: finally distinguish correctly between
   open-for-read and open-for-execute permissions in all situations.
 - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
 - More idmapper bugfixes
 - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
   make the code easier to read.
 - In cases where a pNFS read or write fails, allow the client to
   resume trying layoutgets after two minutes of read/write-through-mds.
 - More net namespace fixes to the NFSv4 callback code.
 - More net namespace fixes to the NFSv3 locking code.
 - More NFSv4 migration preparatory patches.
   Including patches to detect network trunking in both NFSv4 and NFSv4.1
 - pNFS block updates to optimise LAYOUTGET calls.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQdMvBAAoJEGcL54qWCgDyV84P/0XvcEXj6kdMv9EiWfRczo7r
 iAwAIhiEmG1agtZa6v+Gso2MYRQbkGyJi0LKIwzGqNUi0BLQGQCoV93kB0ITVpiN
 g7poDTnPyoItW1oJCtC48/Mx0G5C1yrHSwFAJrXmtzDF1mwd/BIQReafYp6x+/TU
 Mvwm7au3Y2ySRBEDmY4zyBERHXGt//JmsZ9Ays6jewQg5ZOyjDQKoeHVYaaeJoF0
 A0tQGcBSNdySagI5dt4SlkuO7AClhzVHlilep2dsBu/TLS0F2pEdHXvM2W0koZmM
 uazaIpzd2F7TfokTYExgsyKsqpkzpDf1kebN4Y1+Ioi7Yy30dQrX6lNaUNcOmOJQ
 xx694HDHV90KdRBVSFhOIHMTBRcls68hBcWib3MXWHTKX6HVgnFMwhwxGH0MRezf
 3rmXoqn+CO1j5WeQmA3BqdVbHSZHi913TKEwE/qoW4pmOFhv5I2flXWQS/Rwvdng
 2xDCe6TlvhMS92IpyvNEIicXLRSm+DUAmoAfSqqlifZIAEM5R29e/wCAsmVprO3B
 LPHyUoIMO6SZ1PL6Rk20+6qQfvCK7U/ChULsUL/zb7R88Pc3sFE2BeAvZVATsvH3
 +FJWTz43fwUBoMhPsn8xSBLn/fq6az5C19syz6Fpu3DZ4X0EwyVWifiFk6HgcxZD
 J8ajEl+dNZeFE8rkwykX
 =uBk7
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Features include:

   - Remove CONFIG_EXPERIMENTAL dependency from NFSv4.1
     Aside from the issues discussed at the LKS, distros are shipping
     NFSv4.1 with all the trimmings.
   - Fix fdatasync()/fsync() for the corner case of a server reboot.
   - NFSv4 OPEN access fix: finally distinguish correctly between
     open-for-read and open-for-execute permissions in all situations.
   - Ensure that the TCP socket is closed when we're in CLOSE_WAIT
   - More idmapper bugfixes
   - Lots of pNFS bugfixes and cleanups to remove unnecessary state and
     make the code easier to read.
   - In cases where a pNFS read or write fails, allow the client to
     resume trying layoutgets after two minutes of read/write-
     through-mds.
   - More net namespace fixes to the NFSv4 callback code.
   - More net namespace fixes to the NFSv3 locking code.
   - More NFSv4 migration preparatory patches.
     Including patches to detect network trunking in both NFSv4 and
     NFSv4.1
   - pNFS block updates to optimise LAYOUTGET calls."

* tag 'nfs-for-3.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (113 commits)
  pnfsblock: cleanup nfs4_blkdev_get
  NFS41: send real read size in layoutget
  NFS41: send real write size in layoutget
  NFS: track direct IO left bytes
  NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
  NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
  NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
  NFSv4 set open access operation call flag in nfs4_init_opendata_res
  NFSv4.1: Remove the dependency on CONFIG_EXPERIMENTAL
  NFSv4 reduce attribute requests for open reclaim
  NFSv4: nfs4_open_done first must check that GETATTR decoded a file type
  NFSv4.1: Deal with wraparound when updating the layout "barrier" seqid
  NFSv4.1: Deal with wraparound issues when updating the layout stateid
  NFSv4.1: Always set the layout stateid if this is the first layoutget
  NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layout
  NFSv4: don't put ACCESS in OPEN compound if O_EXCL
  NFSv4: don't check MAY_WRITE access bit in OPEN
  NFS: Set key construction data for the legacy upcall
  NFSv4.1: don't do two EXCHANGE_IDs on mount
  NFS: nfs41_walk_client_list(): re-lock before iterating
  ...
2012-10-10 23:52:35 +09:00
Michal Hocko
7386cdbf2f nohz: Fix idle ticks in cpu summary line of /proc/stat
Git commit 09a1d34f85 "nohz: Make idle/iowait counter update
conditional" introduced a bug in regard to cpu hotplug. The effect is
that the number of idle ticks in the cpu summary line in /proc/stat is
still counting ticks for offline cpus.

Reproduction is easy, just start a workload that keeps all cpus busy,
switch off one or more cpus and then watch the idle field in top.
On a dual-core with one cpu 100% busy and one offline cpu you will get
something like this:

%Cpu(s): 48.7 us,  1.3 sy,  0.0 ni, 50.0 id,  0.0 wa,  0.0 hi,  0.0 si,
%0.0 st

The problem is that an offline cpu still has ts->idle_active == 1.
To fix this we should make sure that the cpu is online when calling
get_cpu_idle_time_us and get_cpu_iowait_time_us.

[Srivatsa: Rebased to current mainline]

Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20121010061820.8999.57245.stgit@srivatsabhat.in.ibm.com
Cc: deepthi@linux.vnet.ibm.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-10 14:05:21 +02:00
Lai Jiangshan
4b2c551f77 lglock: add DEFINE_STATIC_LGLOCK()
When the lglock doesn't need to be exported we can use
DEFINE_STATIC_LGLOCK().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-10 01:15:44 -04:00
Theodore Ts'o
06db49e68a ext4: fix metadata checksum calculation for the superblock
The function ext4_handle_dirty_super() was calculating the superblock
on the wrong block data.  As a result, when the superblock is modified
while it is mounted (most commonly, when inodes are added or removed
from the orphan list), the superblock checksum would be wrong.  We
didn't notice because the superblock *was* being correctly calculated
in ext4_commit_super(), and this would get called when the file system
was unmounted.  So the problem only became obvious if the system
crashed while the file system was mounted.

Fix this by removing the poorly designed function signature for
ext4_superblock_csum_set(); if it only took a single argument, the
pointer to a struct superblock, the ambiguity which caused this
mistake would have been impossible.

Reported-by: George Spelvin <linux@horizon.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-10-10 01:06:58 -04:00
Dmitry Monakhov
dee1f973ca ext4: race-condition protection for ext4_convert_unwritten_extents_endio
We assumed that at the time we call ext4_convert_unwritten_extents_endio()
extent in question is fully inside [map.m_lblk, map->m_len] because
it was already split during submission.  But this may not be true due to
a race between writeback vs fallocate.

If extent in question is larger than requested we will split it again.
Special precautions should being done if zeroout required because
[map.m_lblk, map->m_len] already contains valid data.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2012-10-10 01:04:58 -04:00
Hugh Dickins
35c2a7f490 tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking
Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
	u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():

BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
 [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6

Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.

But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Sage Weil <sage@inktank.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:55 -04:00
Marco Stornelli
8e22cc88d6 vfs: drop lock/unlock super
Removed s_lock from super_block and removed lock/unlock super.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:39 -04:00
Marco Stornelli
b6963327e0 ufs: drop lock/unlock super
Removed lock/unlock super. Added a new private s_lock mutex.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:39 -04:00
Marco Stornelli
c07cb01c45 sysv: drop lock/unlock super
Removed lock/unlock super. Added a new private s_lock mutex.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:39 -04:00
Marco Stornelli
f6e12dc4fc hpfs: drop lock/unlock super
Removed lock/unlock super.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Acked-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:38 -04:00
Marco Stornelli
e40b34c792 fat: drop lock/unlock super
Removed lock/unlock super. Added a new private s_lock mutex.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:38 -04:00
Marco Stornelli
67e2c19a3b ext3: drop lock/unlock super
Removed lock/unlock super.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:38 -04:00
Marco Stornelli
4f7754c889 exofs: drop lock/unlock super
Removed lock/unlock super.

Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:38 -04:00
Richard W.M. Jones
aed976475b dup3: Return an error when oldfd == newfd.
I have tested the attached patch to fix the dup3 regression.

Rich.

From 0944e30e12dec6544b3602626b60ff412375c78f Mon Sep 17 00:00:00 2001
From: "Richard W.M. Jones" <rjones@redhat.com>
Date: Tue, 9 Oct 2012 14:42:45 +0100
Subject: [PATCH] dup3: Return an error when oldfd == newfd.

The following commit:

  commit fe17f22d7f
  Author: Al Viro <viro@zeniv.linux.org.uk>
  Date:   Tue Aug 21 11:48:11 2012 -0400

    take purely descriptor-related stuff from fcntl.c to file.c

was supposed to be just code motion, but it dropped the following two
lines:

  if (unlikely(oldfd == newfd))
          return -EINVAL;

from the dup3 system call.  dup3 is not specified by POSIX, so Linux
can do what it likes.  However the POSIX proposal for dup3 [1] states
that it should return an error if oldfd == newfd.

[1] http://austingroupbugs.net/view.php?id=411

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:38 -04:00
Sasha Levin
ffd8d101a3 fs: prevent use after free in auditing when symlink following was denied
Commit "fs: add link restriction audit reporting" has added auditing of failed
attempts to follow symlinks. Unfortunately, the auditing was being done after
the struct path structure was released earlier.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-10-09 23:33:37 -04:00
Linus Torvalds
42859eea96 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull generic execve() changes from Al Viro:
 "This introduces the generic kernel_thread() and kernel_execve()
  functions, and switches x86, arm, alpha, um and s390 over to them."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (26 commits)
  s390: convert to generic kernel_execve()
  s390: switch to generic kernel_thread()
  s390: fold kernel_thread_helper() into ret_from_fork()
  s390: fold execve_tail() into start_thread(), convert to generic sys_execve()
  um: switch to generic kernel_thread()
  x86, um/x86: switch to generic sys_execve and kernel_execve
  x86: split ret_from_fork
  alpha: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
  alpha: switch to generic kernel_thread()
  alpha: switch to generic sys_execve()
  arm: get rid of execve wrapper, switch to generic execve() implementation
  arm: optimized current_pt_regs()
  arm: introduce ret_from_kernel_execve(), switch to generic kernel_execve()
  arm: split ret_from_fork, simplify kernel_thread() [based on patch by rmk]
  generic sys_execve()
  generic kernel_execve()
  new helper: current_pt_regs()
  preparation for generic kernel_thread()
  um: kill thread->forking
  um: let signal_delivered() do SIGTRAP on singlestepping into handler
  ...
2012-10-10 12:02:25 +09:00
Linus Torvalds
f59b51fe3d Merge branch 'for-linus-37rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML changes from Richard Weinberger:
 "UML receives this time only cleanups.

  The most outstanding change is the 'include "foo.h"' do 'include
  <foo.h>' conversion done by Al Viro.

  It touches many files, that's why the diffstat is rather big."

* 'for-linus-37rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  typo in UserModeLinux-HOWTO
  hppfs: fix the return value of get_inode()
  hostfs: drop vmtruncate
  um: get rid of pointless include "..." where include <...> will do
  um: move sysrq.h out of include/shared
  um/x86: merge 32 and 64 bit variants of ptrace.h
  um/x86: merge 32 and 64bit variants of checksum.h
2012-10-10 11:15:20 +09:00
Linus Torvalds
10f39f04b2 MTD merge for 3.7
- Disable broken mtdchar mmap() on MMU systems
  - Additional ECC tests for NAND flash, and some test cleanups
  - New NAND and SPI chip support
  - Fixes/cleanup for SH FLCTL NAND controller driver
  - Improved hardware support for GPMI NAND controller
  - Conversions to device-tree support for various drivers
  - Removal of obsolete drivers (sbc8xxx, bcmring, etc.)
  - New LPC32xx drivers for MLC and SLC NAND
  - Further cleanup of NAND OOB/ECC handling
  - UAPI cleanup merge from David Howells (just moving files, since MTD
    headers were sorted out long ago to separate user-visible from kernel
    bits)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iEYEABECAAYFAlB0OosACgkQdwG7hYl686OSPACeKLrlHmyG8KXgAqcGZwAj1RM+
 X9YAoI2Kd6Sz8v6sLbJidnxUBr/oJVa8
 =/kFV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20121009' of git://git.infradead.org/mtd-2.6

Pull MTD updates from David Woodhouse:

 - Disable broken mtdchar mmap() on MMU systems
 - Additional ECC tests for NAND flash, and some test cleanups
 - New NAND and SPI chip support
 - Fixes/cleanup for SH FLCTL NAND controller driver
 - Improved hardware support for GPMI NAND controller
 - Conversions to device-tree support for various drivers
 - Removal of obsolete drivers (sbc8xxx, bcmring, etc.)
 - New LPC32xx drivers for MLC and SLC NAND
 - Further cleanup of NAND OOB/ECC handling
 - UAPI cleanup merge from David Howells (just moving files, since MTD
   headers were sorted out long ago to separate user-visible from kernel
   bits)

* tag 'for-linus-20121009' of git://git.infradead.org/mtd-2.6: (168 commits)
  mtd: Disable mtdchar mmap on MMU systems
  UAPI: (Scripted) Disintegrate include/mtd
  mtd: nand: detect Samsung K9GBG08U0A, K9GAG08U0F ID
  mtd: nand: decode Hynix MLC, 6-byte ID length
  mtd: nand: increase max OOB size to 640
  mtd: nand: add generic READ ID length calculation functions
  mtd: nand: split simple ID decode into its own function
  mtd: nand: split extended ID decoding into its own function
  mtd: nand: split BB marker options decoding into its own function
  mtd: nand: remove redundant ID read
  mtd: nand: remove unnecessary variable
  mtd: docg4: add missing HAS_IOMEM dependency
  mtd: gpmi: initialize the timing registers only one time
  mtd: gpmi: add EDO feature for imx6q
  mtd: gpmi: do not set the default values for the extra clocks
  mtd: gpmi: simplify the DLL setting code
  mtd: gpmi: add a new field for HW_GPMI_CTRL1
  mtd: gpmi: do not get the clock frequency in gpmi_begin()
  mtd: gpmi: add a new field for HW_GPMI_TIMING1
  mtd: add helpers to get the supportted ONFI timing mode
  ...
2012-10-10 10:51:35 +09:00
Linus Torvalds
72055425e5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update from Chris Mason:
 "This is a large pull, with the bulk of the updates coming from:

   - Hole punching

   - send/receive fixes

   - fsync performance

   - Disk format extension allowing more hardlinks inside a single
     directory (btrfs-progs patch required to enable the compat bit for
     this one)

  I'm cooking more unrelated RAID code, but I wanted to make sure this
  original batch makes it in.  The largest updates here are relatively
  old and have been in testing for some time."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (121 commits)
  btrfs: init ref_index to zero in add_inode_ref
  Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_buffer
  Btrfs: fix page leakage
  Btrfs: do not warn_on when we cannot alloc a page for an extent buffer
  Btrfs: don't bug on enomem in readpage
  Btrfs: cleanup pages properly when ENOMEM in compression
  Btrfs: make filesystem read-only when submitting barrier fails
  Btrfs: detect corrupted filesystem after write I/O errors
  Btrfs: make compress and nodatacow mount options mutually exclusive
  btrfs: fix message printing
  Btrfs: don't bother committing delayed inode updates when fsyncing
  btrfs: move inline function code to header file
  Btrfs: remove unnecessary IS_ERR in bio_readpage_error()
  btrfs: remove unused function btrfs_insert_some_items()
  Btrfs: don't commit instead of overcommitting
  Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called
  Btrfs: be smarter about dropping things from the tree log
  Btrfs: don't lookup csums for prealloc extents
  Btrfs: cache extent state when writing out dirty metadata pages
  Btrfs: do not hold the file extent leaf locked when adding extent item
  ...
2012-10-10 10:49:20 +09:00
Linus Torvalds
fc81c038c2 Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French.

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: reinstate the forcegid option
  Convert properly UTF-8 to UTF-16
  [CIFS] WARN_ON_ONCE if kernel_sendmsg() returns -ENOSPC
2012-10-10 10:48:32 +09:00
J. Bruce Fields
f474af7051 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWxOxKuMESys7AQKN4w//XDwALfbf0MXIw+gwyRiUtJe9mGexvI6X
 1R4FWU9a3ImzEZP4cWnmPGT2wmC/x007DcIvx8cyvbdlSuqtR2i/DC+HbWabiLRn
 nJS7Eer1BJvLv5dn6NmXMEz7yB4Z46+frcmBs3WQeR0sqBMDm+rjQzCqECznO8Jc
 VtCbox+VR2DuWcM++YECTblYEH3Z+doDXUN2eBaD8L9x3klPbPXD7OcRyOnry8w+
 ynmUTKKyH4+hpxDakYrObPIg+vFCxb4QRck1mlgA4wbvb3eqjhM0oOCYJ8GvmILA
 vdFYztWCjkiuOl5djtXBlsClX8SAMOBYlRed+R1GvjNCSR+WCWrFJJ2F8qoQ1w87
 9ts2/8qrozS8luTB475SkT2uLdJkIUKX89Oh+dWeE8YkbPnRPj5lNAdtNY5QSyDq
 VaRpIo+YfmZygyvHJQlAXBuZ0mvzcPzArfcPgSVTD3B7xTEGVu/45V7SnQX5os/V
 v39ySPXMdGOIdvK51gw7OtZl64uqrEKu39PyYDX/GUADflp/CHD0J7PJrQePbsH9
 AQolVZDIxTfKqYQnUdL8+C8Zc24RowEzz3c2+aO89MSzwGqev3q8sXRVbW/Iqryg
 p+V3nHe+ipKcga5tOBlPr9KDtDd7j3xN2yaIwf5/QyO1OHBpjAZP1gjSVDcUcwpi
 svYy4kPn3PA=
 =etoL
 -----END PGP SIGNATURE-----

nfs: disintegrate UAPI for nfs

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently.  After these patches, userspace
headers will be segregated into:

        include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

        include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-10-09 18:35:22 -04:00
Jan Kara
6c29c50fda quota: Silence warning about PRJQUOTA not being handled in need_print_warning()
PRJQUOTA value of quota type should never reach need_print_warning() since XFS
(which is the only fs which uses that type) doesn't use generic functions
calling this function. Anyway, add PRJQUOTA case to the switch to make gcc
happy.

Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:37:38 +02:00
Zhao Hongjiang
66415f6b50 ext3: fix return values on parse_options() failure
parse_options() in ext3 should return 0 when parse the mount options fails.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:24:03 +02:00
Zhao Hongjiang
ae2cf4284e ext2: fix return values on parse_options() failure
parse_options() in ext2 should return 0 when parse the mount options fails.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:23:53 +02:00
Carlos Maiolino
c3d59ad6ab ext3: ext3_bread usage audit
This is the ext3 version of the same patch applied to Ext4, where such goal is
to audit the usage of ext3_bread() due a possible misinterpretion of its return
value.

Focused on directory blocks, a NULL value returned from ext3_bread() means a
hole, which cannot exist into a directory inode. It can pass undetected after a
fix in an uninitialized error variable.

The (now) initialized variable into ext3_getblk() may lead to a zero'ed return
value of ext3_bread() to its callers, which can make the caller do not detect
the hole in the directory inode.

This patch creates a new wrapper function ext3_dir_bread() which checks for
holes properly, reports error, and returns EIO in that case.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:21:42 +02:00
Carlos Maiolino
aa9660196b ext3: fix possible non-initialized variable on htree_dirblock_to_tree()
This is a backport of ext4 commit 90b0a9732 which fixes a possible
non-initialized variable on htree_dirblock_to_tree().
Ext3 has the same non initialized variable, but, in any case it will be
initialized by ext3_get_blocks_handle(), which will avoid the bug to be
triggered, but, the non-initialized variable by htree_dirblock_to_tree() is
still a bug.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-10-09 23:21:42 +02:00
Wei Yongjun
91312c53af hppfs: fix the return value of get_inode()
In case of error, the function get_inode() returns ERR_PTR().
But the users hppfs_lookup() and hppfs_fill_super() use NULL
test for check the return value, not IS_ERR(), so we'd better
change the return value of get_inode() to NULL instead of
ERR_PTR().

dpatch engine is used to generated this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-10-09 22:34:52 +02:00
Marco Stornelli
3be2be0a32 hostfs: drop vmtruncate
Removed vmtruncate.

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-10-09 22:33:19 +02:00
Al Viro
37185b3324 um: get rid of pointless include "..." where include <...> will do
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
2012-10-09 22:28:45 +02:00
Chris Mason
f46dbe3dee btrfs: init ref_index to zero in add_inode_ref
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-10-09 11:17:20 -04:00
David Woodhouse
ffe3150125 UAPI Disintegration 2012-10-09
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIVAwUAUHPmWhOxKuMESys7AQLCZRAAsZAuAK0MxZ4iuq/+fmy7Uxb1jrzLOYSb
 3UgbTgXAjR0WAUHNegVZLX1Xc+12KxvMCj/8sO62Ai+wtgHeDAuUl2T0FbSZjlGK
 qqx/qQqTFHUfJRbm3Lu9iarZ2K49v1kTDk4C+nC8J9mEEW4WFlVPD10n90j+4hxr
 ZCEYril7qOQQV65oor3BT2V64+X1WDHriTLugH1o8RziRF9jh6Z2hgZAWnThcGxu
 lPsmXF2e7jDqGcM3gWtxZWu/yTBPxw549R+JUg4aVKho9WI5ClyjNAKnE7wtd3iW
 HyrylRH+ch2oeYFa5+xoyopRARUUPmujKaHU+ZI1o++eNzuw5JYiwuMlZBLyUc9I
 foWMSUw31U7695exyf66HiH7GEKI1PVpgJVNu41eJvl0iWSWCpKCB6Gs8Sw4xnp2
 auUCYSniXHNTFhFktjNdIUAn0+1X/b/SEfb/id4GvLp1K98QGOfe8dMCC8hEnXiF
 4iIViM8Sv1GB1us5huSjbMeRPbZ3x/loqEpApfgcaqcyrUR29FTE/lFQ4fj9xviL
 JjckPLMMZb4Ho5wrkCi5NtXJ16mx1qKzbBGDdqzmqaNdN+08rNF//kA9m9hCwgD8
 XfAV286DKDC0SllZIG+Uz7YLnSZjNAUhjvWN3ipV+SdT5DGybL3uSW5tYiSAzI2E
 3cayGTWINMg=
 =U9Qq
 -----END PGP SIGNATURE-----

Merge tag 'disintegrate-mtd-20121009' of git://git.infradead.org/users/dhowells/linux-headers

UAPI Disintegration 2012-10-09

Conflicts:
	MAINTAINERS
	arch/arm/configs/bcmring_defconfig
	arch/arm/mach-imx/clk-imx51-imx53.c
	drivers/mtd/nand/Kconfig
	drivers/mtd/nand/bcm_umi_nand.c
	drivers/mtd/nand/nand_bcm_umi.h
	drivers/mtd/nand/orion_nand.c
2012-10-09 15:04:25 +01:00
Wang Sheng-Hui
1037a5affc Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_buffer
In csum_dirty_buffer, we first get eb from page->private.
Then we check if the page is the first page of eb. Later
we check it again. Remove the repeated check here.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
2012-10-09 09:37:30 -04:00
Josef Bacik
f60b1b49f6 Btrfs: fix page leakage
Alloc_dummy_extent_buffer will not free the first page in the eb array if we
fail to allocate a page, fix this.  Thanks,

Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:20:56 -04:00
Josef Bacik
4804b38293 Btrfs: do not warn_on when we cannot alloc a page for an extent buffer
It's just annoying and the user will have gotten a nice OOM killer message
so they are already fully aware they are screwed :).  Thanks,

Reported-by: Jérôme Poulin <jeromepoulin@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:20:43 -04:00
Josef Bacik
edd33c99c4 Btrfs: don't bug on enomem in readpage
Get rid of the BUG_ON(ret == -ENOMEM) in __extent_read_full_page.  Thanks,

Reported-by: Jérôme Poulin <jeromepoulin@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:20:31 -04:00
Josef Bacik
15e3004a0e Btrfs: cleanup pages properly when ENOMEM in compression
We were freeing non-existent pages which was causing a panic for a user who
was suffering from ENOMEM.  This patch fixes the problem.  Thanks,

Reported-by: Jérôme Poulin <jeromepoulin@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:20:25 -04:00
Stefan Behrens
5af3e8cce8 Btrfs: make filesystem read-only when submitting barrier fails
So far the return code of barrier_all_devices() is ignored, which
means that errors are ignored. The result can be a corrupt
filesystem which is not consistent.
This commit adds code to evaluate the return code of
barrier_all_devices(). The normal btrfs_error() mechanism is used to
switch the filesystem into read-only mode when errors are detected.

In order to decide whether barrier_all_devices() should return
error or success, the number of disks that are allowed to fail the
barrier submission is calculated. This calculation accounts for the
worst RAID level of metadata, system and data. If single, dup or
RAID0 is in use, a single disk error is already considered to be
fatal. Otherwise a single disk error is tolerated.

The calculation of the number of disks that are tolerated to fail
the barrier operation is performed when the filesystem gets mounted,
when a balance operation is started and finished, and when devices
are added or removed.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-10-09 09:20:19 -04:00
Stefan Behrens
62856a9b73 Btrfs: detect corrupted filesystem after write I/O errors
In check-integrity, detect when a superblock is written that points
to blocks that have not been written to disk due to I/O write errors.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-10-09 09:20:10 -04:00
Andrei Popa
bedb2cca72 Btrfs: make compress and nodatacow mount options mutually exclusive
If a filesystem is mounted with compression and then remounted by adding nodatacow,
the compression is disabled but the compress flag is still visible.
Also, if a filesystem is mounted with nodatacow and then remounted with compression,
nodatacow flag is still present but it's not active.
This patch:
- removes compress flags and notifies that the compression has been disabled if the
  filesystem is mounted with nodatacow
- removes nodatacow and nodatasum flags if mounted with compress.

Signed-off-by: Andrei Popa <andrei.popa@i-neo.ro>
2012-10-09 09:20:03 -04:00
Daniel J Blueman
489406626c btrfs: fix message printing
Fix various messages to include newline and module prefix.

Signed-off-by: Daniel J Blueman <daniel@quora.org>
2012-10-09 09:19:57 -04:00
Josef Bacik
94edf4ae43 Btrfs: don't bother committing delayed inode updates when fsyncing
We can just copy the in memory inode into the tree log directly, no sense in
updating the fs tree so we can copy it into the tree log tree.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:19:50 -04:00
Robin Dong
479ed9abdb btrfs: move inline function code to header file
When building btrfs from kernel code, it will report:

	fs/btrfs/extent_io.h:281: warning: 'extent_buffer_page' declared inline after being called
	fs/btrfs/extent_io.h:281: warning: previous declaration of 'extent_buffer_page' was here
	fs/btrfs/extent_io.h:280: warning: 'num_extent_pages' declared inline after being called
	fs/btrfs/extent_io.h:280: warning: previous declaration of 'num_extent_pages' was here

because of the wrong declaration of inline functions.

Signed-off-by: Robin Dong <sanbai@taobao.com>
2012-10-09 09:15:43 -04:00
Tsutomu Itoh
7a2d6a6464 Btrfs: remove unnecessary IS_ERR in bio_readpage_error()
Because the value of extent_map is only a correct value or NULL,
so IS_ERR is unnecessary.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-10-09 09:15:43 -04:00
Robin Dong
8d1a1317af btrfs: remove unused function btrfs_insert_some_items()
The function btrfs_insert_some_items() would not be called by any other functions,
so remove it.

Signed-off-by: Robin Dong <sanbai@taobao.com>
2012-10-09 09:15:43 -04:00
Josef Bacik
44734ed1ca Btrfs: don't commit instead of overcommitting
I don't think we have the same problem that this was supposed to fix
originally since we can allocate chunks in the enospc path now.  This code
is causing us to constantly commit the transaction as we get close to using
all of our available space in our currently allocated chunks, instead of
allocating another chunk and carrying on with life, which is not nice for
performance.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:42 -04:00
Tsutomu Itoh
f0bd95ea72 Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called
We should confirm the value of extent_map before calling
trace_btrfs_get_extent() because the value of extent_map has the
possibility of NULL.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-10-09 09:15:42 -04:00
Josef Bacik
18ec90d63f Btrfs: be smarter about dropping things from the tree log
When we truncate existing items in the tree log we've been searching for
each individual item and removing them.  This is unnecessary churn and
searching, just keep track of the slot we are on and how many items we need
to delete and delete them all at once.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:41 -04:00
Josef Bacik
6f1fed7753 Btrfs: don't lookup csums for prealloc extents
The tree logging stuff was looking up csums to copy over for prealloc
extents which is just work we don't need to be doing.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:41 -04:00
Josef Bacik
e6138876ad Btrfs: cache extent state when writing out dirty metadata pages
Everytime we write out dirty pages we search for an offset in the tree,
convert the bits in the state, and then when we wait we search for the
offset again and clear the bits.  So for every dirty range in the io tree we
are doing 4 rb searches, which is suboptimal.  With this patch we are only
doing 2 searches for every cycle (modulo weird things happening).  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:41 -04:00
Josef Bacik
ce19533256 Btrfs: do not hold the file extent leaf locked when adding extent item
For some reason we unlock everything except the leaf we are on, set the path
blocking and then add the extent item for the extent we just finished
writing.  I can't for the life of me figure out why we would want to do
this, and the history doesn't really indicate that there was a real reason
for it, so just remove it.  This will reduce our tree lock contention on
heavy writes.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:40 -04:00
Josef Bacik
de0022b9da Btrfs: do not async metadata csumming in certain situations
There are a coule scenarios where farming metadata csumming off to an async
thread doesn't help.  The first is if our processor supports crc32c, in
which case the csumming will be fast and so the overhead of the async model
is not worth the cost.  The other case is for our tree log.  We will be
making that stuff dirty and writing it out and waiting for it immediately.
Even with software crc32c this gives me a ~15% increase in speed with O_SYNC
workloads.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:40 -04:00
Zach Brown
221b831835 btrfs: fix min csum item size warnings in 32bit
commit 7ca4be45a0 limited csum items to
PAGE_CACHE_SIZE.  It used min() with incompatible types in 32bit which
generates warnings:

fs/btrfs/file-item.c: In function ‘btrfs_csum_file_blocks’:
fs/btrfs/file-item.c:717: warning: comparison of distinct pointer types lacks a cast

This uses min_t(u32,) to fix the warnings.  u32 seemed reasonable
because btrfs_root->leafsize is u32 and PAGE_CACHE_SIZE is unsigned
long.

Signed-off-by: Zach Brown <zab@zabbo.net>
2012-10-09 09:15:39 -04:00
Josef Bacik
67b0fd63d5 Btrfs: run delayed refs first when out of space
Running delayed refs is faster than running delalloc, so lets do that first
to try and reclaim space.  This makes my fs_mark test about 20% faster.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-09 09:15:39 -04:00
Miao Xie
354aa0fb6d Btrfs: fix orphan transaction on the freezed filesystem
With the following debug patch:

 static int btrfs_freeze(struct super_block *sb)
 {
+ 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+	struct btrfs_transaction *trans;
+
+	spin_lock(&fs_info->trans_lock);
+	trans = fs_info->running_transaction;
+	if (trans) {
+		printk("Transid %llu, use_count %d, num_writer %d\n",
+			trans->transid, atomic_read(&trans->use_count),
+			atomic_read(&trans->num_writers));
+	}
+	spin_unlock(&fs_info->trans_lock);
 	return 0;
 }

I found there was a orphan transaction after the freeze operation was done.

It is because the transaction may not be committed when the transaction handle
end even though it is the last handle of the current transaction. This design
avoid committing the transaction frequently, but also introduce the above
problem.

So I add btrfs_attach_transaction() which can catch the current transaction
and commit it. If there is no transaction, it will return ENOENT, and do not
anything.

This function also can be used to instead of btrfs_join_transaction_freeze()
because it don't increase the writer counter and don't start a new transaction,
so it also can fix the deadlock between sync and freeze.

Besides that, it is used to instead of btrfs_join_transaction() in
transaction_kthread(), because if there is no transaction, the transaction
kthread needn't anything.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-09 09:15:39 -04:00
Miao Xie
a698d0755a Btrfs: add a type field for the transaction handle
This patch add a type field into the transaction handle structure,
in this way, we needn't implement various end-transaction functions
and can make the code more simple and readable.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-09 09:15:38 -04:00
Miao Xie
e8830e606f Btrfs: fix memory leak in start_transaction()
This patch fixes memory leak of the transaction handle which happened
when starting transaction failed on a freezed fs.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-09 09:15:38 -04:00
Mark Fasheh
d24bec3ae5 btrfs: extended inode ref iteration
The iterate_irefs in backref.c is used to build path components from inode
refs. This patch adds code to iterate extended refs as well.

I had modify the callback function signature to abstract out some of the
differences between ref structures. iref_to_path() also needed similar
changes.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
2012-10-09 09:15:01 -04:00
Mark Fasheh
f186373fef btrfs: extended inode refs
This patch adds basic support for extended inode refs. This includes support
for link and unlink of the refs, which basically gets us support for rename
as well.

Inode creation does not need changing - extended refs are only added after
the ref array is full.

Signed-off-by: Mark Fasheh <mfasheh@suse.de>
2012-10-09 09:14:45 -04:00
Linus Torvalds
9e2d8656f5 Merge branch 'akpm' (Andrew's patch-bomb)
Merge patches from Andrew Morton:
 "A few misc things and very nearly all of the MM tree.  A tremendous
  amount of stuff (again), including a significant rbtree library
  rework."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (160 commits)
  sparc64: Support transparent huge pages.
  mm: thp: Use more portable PMD clearing sequenece in zap_huge_pmd().
  mm: Add and use update_mmu_cache_pmd() in transparent huge page code.
  sparc64: Document PGD and PMD layout.
  sparc64: Eliminate PTE table memory wastage.
  sparc64: Halve the size of PTE tables
  sparc64: Only support 4MB huge pages and 8KB base pages.
  memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning
  mm: memcg: clean up mm_match_cgroup() signature
  mm: document PageHuge somewhat
  mm: use %pK for /proc/vmallocinfo
  mm, thp: fix mlock statistics
  mm, thp: fix mapped pages avoiding unevictable list on mlock
  memory-hotplug: update memory block's state and notify userspace
  memory-hotplug: preparation to notify memory block's state at memory hot remove
  mm: avoid section mismatch warning for memblock_type_name
  make GFP_NOTRACK definition unconditional
  cma: decrease cc.nr_migratepages after reclaiming pagelist
  CMA: migrate mlocked pages
  kpageflags: fix wrong KPF_THP on non-huge compound pages
  ...
2012-10-09 16:23:15 +09:00
Naoya Horiguchi
7a71932d56 kpageflags: fix wrong KPF_THP on non-huge compound pages
KPF_THP can be set on non-huge compound pages (like slab pages or pages
allocated by drivers with __GFP_COMP) because PageTransCompound only
checks PG_head and PG_tail.  Obviously this is a bug and breaks user space
applications which look for thp via /proc/kpageflags.

This patch rules out setting KPF_THP wrongly by additionally checking
PageLRU on the head pages.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:00 +09:00
Yan Hong
cd8ed2a45a fs/fs-writeback.c: remove unneccesary parameter of __writeback_single_inode()
The parameter 'wb' is never used in this function.

Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:23:00 +09:00
Michel Lespinasse
38a76013ad mm: avoid taking rmap locks in move_ptes()
During mremap(), the destination VMA is generally placed after the
original vma in rmap traversal order: in move_vma(), we always have
new_pgoff >= vma->vm_pgoff, and as a result new_vma->vm_pgoff >=
vma->vm_pgoff unless vma_merge() merged the new vma with an adjacent one.

When the destination VMA is placed after the original in rmap traversal
order, we can avoid taking the rmap locks in move_ptes().

Essentially, this reintroduces the optimization that had been disabled in
"mm anon rmap: remove anon_vma_moveto_tail".  The difference is that we
don't try to impose the rmap traversal order; instead we just rely on
things being in the desired order in the common case and fall back to
taking locks in the uncommon case.  Also we skip the i_mmap_mutex in
addition to the anon_vma lock: in both cases, the vmas are traversed in
increasing vm_pgoff order with ties resolved in tree insertion order.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:42 +09:00
Michel Lespinasse
6b2dbba8b6 mm: replace vma prio_tree with an interval tree
Implement an interval tree as a replacement for the VMA prio_tree.  The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA.  So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:39 +09:00
Michel Lespinasse
bf7ad8eeab rbtree: move some implementation details from rbtree.h to rbtree.c
rbtree users must use the documented APIs to manipulate the tree
structure.  Low-level helpers to manipulate node colors and parenthood are
not part of that API, so move them to lib/rbtree.c

[dwmw2@infradead.org: fix jffs2 build issue due to renamed __rb_parent_color field]
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:32 +09:00
Michel Lespinasse
ea5272f5c9 rbtree: fix incorrect rbtree node insertion in fs/proc/proc_sysctl.c
The recently added code to use rbtrees in sysctl did not follow the proper
rbtree interface on insertion - it was calling rb_link_node() which
inserts a new node into the binary tree, but missed the call to
rb_insert_color() which properly balances the rbtree and establishes all
expected rbtree invariants.

I found out about this only because faulty commit also used
rb_init_node(), which I am removing within this patchset.  But I think
it's an easy mistake to make, and it makes me wonder if we should change
the rbtree API so that insertions would be done with a single rb_insert()
call (even if its implementation could still inline the rb_link_node()
part and call a private __rb_insert_color function to do the rebalancing).

Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:32 +09:00
Michel Lespinasse
4c199a93a2 rbtree: empty nodes have no color
Empty nodes have no color.  We can make use of this property to simplify
the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also,
we can get rid of the rb_init_node function which had been introduced by
commit 88d19cf379 ("timers: Add rb_init_node() to allow for stack
allocated rb nodes") to avoid some issue with the empty node's color not
being initialized.

I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
doing there, though.  axboe introduced them in commit 10fd48f237
("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I
see it, the 'empty node' abstraction is only used by rbtree users to
flag nodes that they haven't inserted in any rbtree, so asking the
predecessor or successor of such nodes doesn't make any sense.

One final rb_init_node() caller was recently added in sysctl code to
implement faster sysctl name lookups.  This code doesn't make use of
RB_EMPTY_NODE at all, and from what I could see it only called
rb_init_node() under the mistaken assumption that such initialization was
required before node insertion.

[sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Daniel Santos <daniel.santos@pobox.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:32 +09:00
Davidlohr Bueso
01dc52ebdf oom: remove deprecated oom_adj
The deprecated /proc/<pid>/oom_adj is scheduled for removal this month.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:24 +09:00
Konstantin Khlebnikov
314e51b985 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
Konstantin Khlebnikov
0103bd16fb mm: prepare VM_DONTDUMP for using in drivers
Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags:
VM_DONTEXPAND, VM_DONTCOPY.  Currently this flag used only for
sys_madvise.  The next patch will use it for replacing the outdated flag
VM_RESERVED.

Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL
(VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:18 +09:00
Konstantin Khlebnikov
0b173bc4da mm: kill vma flag VM_CAN_NONLINEAR
Move actual pte filling for non-linear file mappings into the new special
vma operation: ->remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:17 +09:00
Linus Torvalds
8711798772 Merge branch 'linux-next' of git://git.open-osd.org/linux-open-osd
Pull exofs update from Boaz Harrosh:
 "Just three one liners"

* 'linux-next' of git://git.open-osd.org/linux-open-osd:
  pnfs_osd_xdr: Remove unused #include from pnfs_osd_xdr.h
  ore: signedness bug in _sp2d_min_pg()
  exofs: check for allocation failure in uri_store()
2012-10-09 15:54:27 +09:00
Al Viro
121977187c Fix F_DUPFD_CLOEXEC breakage
Fix a braino in F_DUPFD_CLOEXEC; f_dupfd() expects flags for alloc_fd(),
get_unused_fd() etc and there clone-on-exec if O_CLOEXEC, not
FD_CLOEXEC.

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 15:52:31 +09:00
Jan Schmidt
5a1d7843ca btrfs: improved readablity for add_inode_ref
Moved part of the code into a sub function and replaced most of the gotos
by ifs, hoping that it will be easier to read now.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
2012-10-08 20:09:02 -04:00
Josef Bacik
0aa4a17d82 Btrfs: handle not finding the extent exactly when logging changed extents
I started hitting warnings when running xfstest 68 in a loop because there
were EM's that were not lined up properly with the physical extents.  This
is ok, if we do something like punch a hole or write to a preallocated space
or something like that we can have an EM that doesn't cover the entire
physical extent.  So fix the tree logging stuff to cope with this case so we
don't just commit the transaction.  With this patch I no longer see the
warnings from the tree logging code.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-08 20:09:02 -04:00
David Sterba
005d6427ac btrfs: move transaction aborts to the point of failure
Call btrfs_abort_transaction as early as possible when an error
condition is detected, that way the line number reported is useful
and we're not clueless anymore which error path led to the abort.

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-10-08 20:09:02 -04:00
Miao Xie
8732d44f80 Btrfs: fix the missing error information in create_pending_snapshot()
The macro btrfs_abort_transaction() can get the line number of the code
where the problem happens, so we should invoke it in the place that the
error occurs, or we will lose the line number.

Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-08 20:07:33 -04:00
Liu Bo
aa42ffd918 Btrfs: fix off-by-one in file clone
Btrfs uses inclusive range end for lock_extent(), unlock_extent() and
related functions, so we made off-by-one errors in file clone.

This fixes it and also fixes some style problems.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
2012-10-08 20:07:32 -04:00
Peng Tao
af283885b7 pnfsblock: cleanup nfs4_blkdev_get
It is not needed at all and it is messing with return values...

Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:40 -04:00
Peng Tao
1fd937bd75 NFS41: send real read size in layoutget
For buffer read, use offst-to-isize.

For direct read, use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:34 -04:00
Peng Tao
6296556f0b NFS41: send real write size in layoutget
For buffer write, block layout client scan inode mapping to find
next hole and use offset-to-hole as layoutget length. Object
layout client uses offset-to-isize as layoutget length.

For direct write, both block layout and object layout use dreq->bytes_left.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:32:22 -04:00
Peng Tao
35754bc00e NFS: track direct IO left bytes
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08 19:31:55 -04:00
Oleg Nesterov
d5bbd43d5f exec: make de_thread() killable
Change de_thread() to use KILLABLE rather than UNINTERRUPTIBLE while
waiting for other threads.  The only complication is that we should
clear ->group_exit_task and ->notify_count before we return, and we
should do this under tasklist_lock.  -EAGAIN is used to match the
initial signal_group_exit() check/return, it doesn't really matter.

This fixes the (unlikely) race with coredump.  de_thread() checks
signal_group_exit() before it starts to kill the subthreads, but this
can't help if another CLONE_VM (but non CLONE_THREAD) task starts the
coredumping after de_thread() unlocks ->siglock.  In this case the
killed sub-thread can block in exit_mm() waiting for coredump_finish(),
execing thread waits for that sub-thead, and the coredumping thread
waits for execing thread.  Deadlock.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 06:53:20 +09:00
David Howells
cf7f601c06 KEYS: Add payload preparsing opportunity prior to key instantiate or update
Give the key type the opportunity to preparse the payload prior to the
instantiation and update routines being called.  This is done with the
provision of two new key type operations:

	int (*preparse)(struct key_preparsed_payload *prep);
	void (*free_preparse)(struct key_preparsed_payload *prep);

If the first operation is present, then it is called before key creation (in
the add/update case) or before the key semaphore is taken (in the update and
instantiate cases).  The second operation is called to clean up if the first
was called.

preparse() is given the opportunity to fill in the following structure:

	struct key_preparsed_payload {
		char		*description;
		void		*type_data[2];
		void		*payload;
		const void	*data;
		size_t		datalen;
		size_t		quotalen;
	};

Before the preparser is called, the first three fields will have been cleared,
the payload pointer and size will be stored in data and datalen and the default
quota size from the key_type struct will be stored into quotalen.

The preparser may parse the payload in any way it likes and may store data in
the type_data[] and payload fields for use by the instantiate() and update()
ops.

The preparser may also propose a description for the key by attaching it as a
string to the description field.  This can be used by passing a NULL or ""
description to the add_key() system call or the key_create_or_update()
function.  This cannot work with request_key() as that required the description
to tell the upcall about the key to be created.

This, for example permits keys that store PGP public keys to generate their own
name from the user ID and public key fingerprint in the key.

The instantiate() and update() operations are then modified to look like this:

	int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
	int (*update)(struct key *key, struct key_preparsed_payload *prep);

and the new payload data is passed in *prep, whether or not it was preparsed.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-10-08 13:49:48 +10:30
Jeff Layton
72bd481f86 cifs: reinstate the forcegid option
Apparently this was lost when we converted to the standard option
parser in 8830d7e07a

Cc: Sachin Prabhu <sprabhu@redhat.com>
Cc: stable@vger.kernel.org # v3.4+
Reported-by: Gregory Lee Bartholomew <gregory.lee.bartholomew@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-10-07 20:05:47 -05:00
Frediano Ziglio
fd3ba42c76 Convert properly UTF-8 to UTF-16
wchar_t is currently 16bit so converting a utf8 encoded characters not
in plane 0 (>= 0x10000) to wchar_t (that is calling char2uni) lead to a
-EINVAL return. This patch detect utf8 in cifs_strtoUTF16 and add special
code calling utf8s_to_utf16s.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-10-07 20:04:53 -05:00
Steve French
b7a10626c8 [CIFS] WARN_ON_ONCE if kernel_sendmsg() returns -ENOSPC
kernel_sendmsg() is less likely to return -ENOSPC and it might be
a bug to do so. However, in the past there might have been cases
where a -ENOSPC was returned from a low level driver.

Add a WARN_ON_ONCE() to ensure that it is safe to assume that -ENOSPC
is no longer returned. This -ENOSPC specific handling will be removed
once we are sure it is no longer returned.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-10-07 20:00:47 -05:00
Linus Torvalds
7035cdf36d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
 "The bulk of this pull is a series from Alex that refactors and cleans
  up the RBD code to lay the groundwork for supporting the new image
  format and evolving feature set.  There are also some cleanups in
  libceph, and for ceph there's fixed validation of file striping
  layouts and a bugfix in the code handling a shrinking MDS cluster."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
  ceph: avoid 32-bit page index overflow
  ceph: return EIO on invalid layout on GET_DATALOC ioctl
  rbd: BUG on invalid layout
  ceph: propagate layout error on osd request creation
  libceph: check for invalid mapping
  ceph: convert to use le32_add_cpu()
  ceph: Fix oops when handling mdsmap that decreases max_mds
  rbd: update remaining header fields for v2
  rbd: get snapshot name for a v2 image
  rbd: get the snapshot context for a v2 image
  rbd: get image features for a v2 image
  rbd: get the object prefix for a v2 rbd image
  rbd: add code to get the size of a v2 rbd image
  rbd: lay out header probe infrastructure
  rbd: encapsulate code that gets snapshot info
  rbd: add an rbd features field
  rbd: don't use index in __rbd_add_snap_dev()
  rbd: kill create_snap sysfs entry
  rbd: define rbd_dev_image_id()
  rbd: define some new format constants
  ...
2012-10-08 06:38:18 +09:00
Linus Torvalds
6432f21284 The big new feature added this time is supporting online resizing
using the meta_bg feature.  This allows us to resize file systems
 which are greater than 16TB.  In addition, the speed of online
 resizing has been improved in general.
 
 We also fix a number of races, some of which could lead to deadlocks,
 in ext4's Asynchronous I/O and online defrag support, thanks to good
 work by Dmitry Monakhov.
 
 There are also a large number of more minor bug fixes and cleanups
 from a number of other ext4 contributors, quite of few of which have
 submitted fixes for the first time.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQbxMXAAoJENNvdpvBGATwlg4QAJZ4mHNSL2eaaxjRtTbL1pAz
 +FVXpJ3lhw1lSfE9hJGqPVE8EfU2fWjIqxEI7dgh95Tukc5pUnPAQ2/hBz8ZA0qq
 o0AFMk3mRnvCEh6HsZfumsV83eqpR3k/zEy4uFH+KtxBskPe2sEKy3B7qOxvgdKW
 Gh8B2WqF2BpIj9WIT1P9G6xsxZW64EMHTbWcgRhuoRD7bakDNnwQ3kElz/TJQU5q
 bM/5wE7pqKwU2J1L0Ho0mxDi0f/BbXeJdA9k1tQy2KM1pZwHtpj4Ls0qmfoi49GE
 KyZqQOXlFbAz/9tidPDceY5KoRRQm1MwZ+1MimQX1P+40cs/w3pNu3yiibcaXIru
 UZ63AQMCj5JHMcFNVi20sVCwjU/ibNtEO75cfDD4bzPgHJvfCj73EbHTLl21nbTu
 izIMffhJEHmRnmRXiiortYVuI4b19oIfnXg7eclrJoUWSuGwKKsJOc5nMjDqidG4
 B7Gq4TD89sGkIYzx+50E+ll2ispcBN0BQnGqp4k2BzgDyEHhuFYk7VuVQvJgCGTi
 eobzQJj7JUXPWxyemcAVkQTtUq4vVbkm/IwS+/GA9b9Z80X8hR8x6EVHUW5lX3qC
 YHoBSCU4XKZXXWqzx0fIVCXyKKFiBzM+OXcgHOKH90vK8k6kPmPODhNCxvV3pITU
 jfl9q+X1dY4SpybZjLt5
 =iYeV
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "The big new feature added this time is supporting online resizing
  using the meta_bg feature.  This allows us to resize file systems
  which are greater than 16TB.  In addition, the speed of online
  resizing has been improved in general.

  We also fix a number of races, some of which could lead to deadlocks,
  in ext4's Asynchronous I/O and online defrag support, thanks to good
  work by Dmitry Monakhov.

  There are also a large number of more minor bug fixes and cleanups
  from a number of other ext4 contributors, quite of few of which have
  submitted fixes for the first time."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
  ext4: fix ext4_flush_completed_IO wait semantics
  ext4: fix mtime update in nodelalloc mode
  ext4: fix ext_remove_space for punch_hole case
  ext4: punch_hole should wait for DIO writers
  ext4: serialize truncate with owerwrite DIO workers
  ext4: endless truncate due to nonlocked dio readers
  ext4: serialize unlocked dio reads with truncate
  ext4: serialize dio nonlocked reads with defrag workers
  ext4: completed_io locking cleanup
  ext4: fix unwritten counter leakage
  ext4: give i_aiodio_unwritten a more appropriate name
  ext4: ext4_inode_info diet
  ext4: convert to use leXX_add_cpu()
  ext4: ext4_bread usage audit
  fs: reserve fallocate flag codepoint
  ext4: remove redundant offset check in mext_check_arguments()
  ext4: don't clear orphan list on ro mount with errors
  jbd2: fix assertion failure in commit code due to lacking transaction credits
  ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
  ext4: enable FITRIM ioctl on bigalloc file system
  ...
2012-10-08 06:36:39 +09:00
Linus Torvalds
7f60ba388f 1. We no longer ad-hoc to the function tracer "high level" infrastructure
and no longer use its debugfs knobs. The change slightly touches
    kernel/trace directory, but it got the needed ack from Steven Rostedt:
    http://lkml.org/lkml/2012/8/21/688
 2. Added maintainers entry;
 3. A bunch of fixes, nothing special.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQbjoFAAoJEGgI9fZJve1bZgUP/A/ZwFGfdnochDgRhK5p7ljY
 baRZpgSh2B+BxIDTEPLfVh6HbOivmYJ8WF0unD9kKTzCCS71ZUMiLB25G/bV4lnZ
 fawAOhGfOLG3rXmldxf6nJllHr9JpoSmVEHypvjFNcbjYZ04zhe7jM+YsaWmBw68
 eHXQkOSdfPPpKXZ2B0Eef/EoGWhORW0kTD7xFlorsxYAkksSheY0PC0nYgCFhvCZ
 168y9pi4T4lucr4s44x8AJ/r/5BQ1jEQAY/A2qUE/iBfRFP4XyE1Oao4OHtVDYdU
 KjVPA1VYmwkKSfnkiVFrpb/94IyrKslblgR8nX0kK3L/ccFYjQix4nd9jR+n857s
 xfAuj9nfhUO6fI5qoaVSOBufxKyPp1S7X8INEAJ7WQ0c9VoMv00biK9M77ifDGZg
 ll/Ecq1CADtcbOnQXf6qwGwRKmpR+qgPkIzpNXcuGMuM4AEPwtckOhCyXFr37Txk
 6ZoGM8IIaBJ0yXxHkfpUA7l9ZF0gXR+qHMQCwpUS8tIMx35On+IbybEaKbniKEi1
 AURgQ7ZimVYAHPi0Y0L00+EKI3IPVQJvCFH7SG+wUfLWcbEtNbTv3MAer5o3DANJ
 GMnWBwNw9ClTydWKI0GMNmnWpFukWhd4OXleyl2+q4qRJi3HhNacrok3s/2r+CnT
 QRg8i/0SDvxGuXazrTZT
 =1HAE
 -----END PGP SIGNATURE-----

Merge tag 'for-v3.7' of git://git.infradead.org/users/cbou/linux-pstore

Pull pstore changes from Anton Vorontsov:

 1) We no longer ad-hoc to the function tracer "high level"
    infrastructure and no longer use its debugfs knobs.  The change
    slightly touches kernel/trace directory, but it got the needed ack
    from Steven Rostedt:

      http://lkml.org/lkml/2012/8/21/688

 2) Added maintainers entry;

 3) A bunch of fixes, nothing special.

* tag 'for-v3.7' of git://git.infradead.org/users/cbou/linux-pstore:
  pstore: Avoid recursive spinlocks in the oops_in_progress case
  pstore/ftrace: Convert to its own enable/disable debugfs knob
  pstore/ram: Add missing platform_device_unregister
  MAINTAINERS: Add pstore maintainers
  pstore/ram: Mark ramoops_pstore_write_buf() as notrace
  pstore/ram: Fix printk format warning
  pstore/ram: Fix possible NULL dereference
2012-10-07 17:30:50 +09:00
Trond Myklebust
19c54abab7 NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()
Split it into two functions, one which checks if layoutgets are blocked,
and one which checks if the layout stateid has expired.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-05 16:56:58 -07:00
Wei Yongjun
c99b6841d7 omfs: convert to use beXX_add_cpu()
Convert cpu_to_beXX(beXX_to_cpu(E1) + E2) to use beXX_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:31 +09:00
Sachin Kamat
9fb8844210 fs/proc/root.c: use NULL instead of 0 for pointer
This cleanup also fixes the following sparse warning:

  fs/proc/root.c:64:45: warning: Using plain integer as NULL pointer

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:19 +09:00
Prasad Joshi
ab4a1f2470 proc_sysctl.c: use BUG_ON instead of BUG
The use of if (!head) BUG(); can be replaced with the single line
BUG_ON(!head).

Signed-off-by: Prasad Joshi <prasadjoshi.linux@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:18 +09:00
yan
17baa2a2c4 proc: use kzalloc instead of kmalloc and memset
Part of the memory will be written twice after this change, but that
should be negligible.

[akpm@linux-foundation.org: fix __proc_create() coding-style issues, remove unneeded zero-initialisations]
Signed-off-by: yan <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:18 +09:00
yan
0e06936057 proc: no need to initialize proc_inode->fd in proc_get_inode()
proc_get_inode() obtains the inode via a call to iget_locked().
iget_locked() calls alloc_inode() which will call proc_alloc_inode() which
clears proc_inode.fd, so there is no need to clear this field in
proc_get_inode().

If iget_locked() instead found the inode via find_inode_fast(), that inode
will not have I_NEW set so this change has no effect.

Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:18 +09:00
yan
620727506d proc: return -ENOMEM when inode allocation failed
If proc_get_inode() returns NULL then presumably it encountered memory
exhaustion.  proc_lookup_de() should return -ENOMEM in this case, not
-EINVAL.

Signed-off-by: yan <clouds.yan@gmail.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:17 +09:00
Denys Vlasenko
2aa362c49c coredump: extend core dump note section to contain file names of mapped files
This note has the following format:

long count     -- how many files are mapped
long page_size -- units for file_ofs
array of [COUNT] elements of
   long start
   long end
   long file_ofs
followed by COUNT filenames in ASCII: "FILE1" NUL "FILE2" NUL...

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:17 +09:00
Denys Vlasenko
49ae4d4b11 coredump: add a new elf note with siginfo of the signal
Existing PRSTATUS note contains only si_signo, si_code, si_errno fields
from the siginfo of the signal which caused core to be dumped.

There are tools which try to analyze crashes for possible security
implications, and they want to use, among other data, si_addr field from
the SIGSEGV.

This patch adds a new elf note, NT_SIGINFO, which contains the complete
siginfo_t of the signal which killed the process.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:16 +09:00
Denys Vlasenko
5ab1c309b3 coredump: pass siginfo_t* to do_coredump() and below, not merely signr
This is a preparatory patch for the introduction of NT_SIGINFO elf note.

With this patch we pass "siginfo_t *siginfo" instead of "int signr" to
do_coredump() and put it into coredump_params.  It will be used by the
next patch.  Most changes are simple s/signr/siginfo->si_signo/.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Amerigo Wang <amwang@redhat.com>
Cc: "Jonathan M. Foote" <jmfoote@cert.org>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:16 +09:00
Oleg Nesterov
0f4cfb2e4e coredump: use SUID_DUMPABLE_ENABLED rather than hardcoded 1
Cosmetic. Change setup_new_exec() and task_dumpable() to use
SUID_DUMPABLE_ENABLED for /bin/grep.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:16 +09:00
Oleg Nesterov
12a2b4b224 coredump: add support for %d=__get_dumpable() in core name
Some coredump handlers want to create a core file in a way compatible with
standard behavior.  Standard behavior with fs.suid_dumpable = 2 is to
create core file with uid=gid=0.  However, there was no way for coredump
handler to know that the process being dumped was suid'ed.

This patch adds the new %d specifier for format_corename() which simply
reports __get_dumpable(mm->flags), this is compatible with
/proc/sys/fs/suid_dumpable we already have.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=787135

Developed during a discussion with Denys Vlasenko.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Alex Kelly <alex.page.kelly@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Jiri Moskovcak <jmoskovc@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:15 +09:00
Alex Kelly
179899fd5d coredump: update coredump-related headers
Create a new header file, fs/coredump.h, which contains functions only
used by the new coredump.c.  It also moves do_coredump to the
include/linux/coredump.h header file, for consistency.

Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:15 +09:00
Alex Kelly
046d662f48 coredump: make core dump functionality optional
Adds an expert Kconfig option, CONFIG_COREDUMP, which allows disabling of
core dump.  This saves approximately 2.6k in the compiled kernel, and
complements CONFIG_ELF_CORE, which now depends on it.

CONFIG_COREDUMP also disables coredump-related sysctls, except for
suid_dumpable and related functions, which are necessary for ptrace.

[akpm@linux-foundation.org: fix binfmt_aout.c build]
Signed-off-by: Alex Kelly <alex.page.kelly@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:15 +09:00
Namjae Jeon
14864655c0 fat: simplify writeback_inode()
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:12 +09:00
Namjae Jeon
126ac0518c fat: no need to reset EOF in ent_put for FAT32
#define FAT_ENT_EOF(EOF_FAT32)

there is no need to reset value of 'new' for FAT32 as the values is
already correct

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:12 +09:00
Cruz Julian Bishop
441dff34aa fs/fat: fix checkpatch issues in fatent.c
1: Stop any lines going over 80 characters
2: Remove a blank line before EXPORT_SYMBOL_GPL

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:11 +09:00
Cruz Julian Bishop
f08b4988f2 fs/fat: fix all other checkpatch issues in dir.c
1: Import linux/uaccess.h instead of asm.uaccess.h
2: Stop any lines going over 80 characters
3: Stopped setting any variables in if statements
4: Stopped splitting quoted strings
5: Removed unneeded parentheses

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:11 +09:00
Cruz Julian Bishop
3f36f6100a fs/fat: fix some small checkpatch issues in dir.c
Simply remove the spacing between function definitions and
EXPORT_SYMBOL_GPL calls, which were previously generating warnings.

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:11 +09:00
Cruz Julian Bishop
c90518290e fs/fat: fix two checkpatch issues in cache.c
This does the following:
	1: Splits the arguments of a function call to stop it
		from exceeding 80 characters
	2: Re-indents the arguments of another function call
		to prevent the splitting of a quoted string.

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:10 +09:00
Cruz Julian Bishop
4a3aeb13b7 fs/fat: chang indentation of some comments in fat.h
The comments were not lined up properly, so I just re-indented them.

This also fixes a stupid checkpatch issue unknowingly

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:10 +09:00
Cruz Julian Bishop
d5a4a3867f fs/fat: fix some checkpatch issues in fat.h
Mainly fix spacing issues such as "foo * bar" and "foo= bar"

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:10 +09:00
Cruz Julian Bishop
85cb9bf535 fs/fat: fix a checkpatch issue in namei_msdos.c
Add a space before an equals sign/operator in line 410.

Signed-off-by: Cruz Julian Bishop <cruzjbishop@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:09 +09:00
Steven J. Magnani
7669e8fb09 fat (exportfs): fix dentry reconnection
Maintain an index of directory inodes by starting cluster, so that
fat_get_parent() can return the proper cached inode rather than inventing
one that cannot be traced back to the filesystem root.

Add a new msdos/vfat binary mount option "nfs" so that FAT filesystems
that are _not_ exported via NFS are not saddled with maintenance of an
index they will never use.

Finally, simplify NFS file handle generation and lookups.  An
ext2-congruent implementation is adequate for FAT needs.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:09 +09:00
Steven J. Magnani
21b6633d51 fat (exportfs): move NFS support code
Under memory pressure, the system may evict dentries from cache.  When the
FAT driver receives a NFS request involving an evicted dentry, it is
unable to reconnect it to the filesystem root.  This causes the request to
fail, often with ENOENT.

This is partially due to ineffectiveness of the current FAT NFS
implementation, and partially due to an unimplemented fh_to_parent method.
 The latter can cause file accesses to fail on shares exported with
subtree_check.

This patch set provides the FAT driver with the ability to
reconnect dentries.  NFS file handle generation and lookups are simplified
and made congruent with ext2.

Testing has involved a memory-starved virtual machine running 3.5-rc5 that
exports a ~2 GB vfat filesystem containing a kernel tree (~770 MB, ~40000
files, 9 levels).  Both 'cp -r' and 'ls -lR' operations were performed
from a client, some overlapping, some consecutive.  Exports with
'subtree_check' and 'no_subtree_check' have been tested.

Note that while this patch set improves FAT's NFS support, it does not
eliminate ESTALE errors completely.

The following should be considered for NFS clients who are sensitive to ESTALE:

* Mounting with lookupcache=none
  Unfortunately this can degrade performance severely, particularly for deep
  filesystems.

* Incorporating VFS patches to retry ESTALE failures on the client-side,
  such as https://lkml.org/lkml/2012/6/29/381

* Handling ESTALE errors in client application code

This patch:

Move NFS-related code into its own C file.  No functional changes.

Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:09 +09:00
Namjae Jeon
4b63709861 fat: use accessor function for msdos_dir_entry 'start'
Use accessor function for msdos_dir_entry 'start'

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:08 +09:00
Wei Yongjun
32daab969c hpfs: convert to use leXX_add_cpu()
Convert cpu_to_leXX(leXX_to_cpu(E1) + E2) to use leXX_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:08 +09:00
Alan Cox
6eec482f47 binfmt_elf: Uninitialized variable
load_elf_interp() has interp_map_addr carefully described as
"uninitialized_var" and marked so as to avoid a warning.  However if you
trace the code it is passed into load_elf_interp and then this value is
checked against NULL.

As this return value isn't used this is actually safe but it freaks
various analysis tools that see un-initialized memory addresses being read
before their value is ever defined.

Set it to NULL as a matter of programming good taste if nothing else

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:00 +09:00
Paton J. Lewis
03a7beb55b epoll: support for disabling items, and a self-test app
Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll
item.  If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the epoll item in a multi-threaded environment.  Also added a new
test_epoll self- test app to both demonstrate the need for this feature
and test it.

Signed-off-by: Paton J. Lewis <palewis@adobe.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Paul Holland <pholland@adobe.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:05:00 +09:00
Fengguang Wu
125c4c706b idr: rename MAX_LEVEL to MAX_IDR_LEVEL
To avoid name conflicts:

  drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

While at it, also make the other names more consistent and add
parentheses.

[akpm@linux-foundation.org: repair fallout]
[sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: walter harms <wharms@bfs.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:04:56 +09:00
Dmitry Monakhov
60d4616f3d ext4: serialize fallocate with ext4_convert_unwritten_extents
Fallocate should wait for pended ext4_convert_unwritten_extents()
otherwise following race may happen:

ftruncate( ,12288);
fallocate( ,0, 4096)
io_sibmit( ,0, 4096); /* Write to fallocated area, split extent if needed */
fallocate( ,0, 8192); /* Grow extent and broke assumption about extent */

Later kwork completion will do:
 ->ext4_convert_unwritten_extents (0, 4096)
   ->ext4_map_blocks(handle, inode, &map, EXT4_GET_BLOCKS_IO_CONVERT_EXT);
    ->ext4_ext_map_blocks() /* Will find new extent:  ex = [0,2] !!!!!! */
      ->ext4_ext_handle_uninitialized_extents()
        ->ext4_convert_unwritten_extents_endio()
        /* convert [0,2] extent to initialized, but only[0,1] was written */

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-10-05 11:32:02 -04:00
Dmitry Monakhov
c278531d39 ext4: fix ext4_flush_completed_IO wait semantics
BUG #1) All places where we call ext4_flush_completed_IO are broken
    because buffered io and DIO/AIO goes through three stages
    1) submitted io,
    2) completed io (in i_completed_io_list) conversion pended
    3) finished  io (conversion done)
    And by calling ext4_flush_completed_IO we will flush only
    requests which were in (2) stage, which is wrong because:
     1) punch_hole and truncate _must_ wait for all outstanding unwritten io
      regardless to it's state.
     2) fsync and nolock_dio_read should also wait because there is
        a time window between end_page_writeback() and ext4_add_complete_io()
        As result integrity fsync is broken in case of buffered write
        to fallocated region:
        fsync                                      blkdev_completion
	 ->filemap_write_and_wait_range
                                                   ->ext4_end_bio
                                                     ->end_page_writeback
          <-- filemap_write_and_wait_range return
	 ->ext4_flush_completed_IO
   	 sees empty i_completed_io_list but pended
   	 conversion still exist
                                                     ->ext4_add_complete_io

BUG #2) Race window becomes wider due to the 'ext4: completed_io
locking cleanup V4' patch series

This patch make following changes:
1) ext4_flush_completed_io() now first try to flush completed io and when
   wait for any outstanding unwritten io via ext4_unwritten_wait()
2) Rename function to more appropriate name.
3) Assert that all callers of ext4_flush_unwritten_io should hold i_mutex to
   prevent endless wait

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2012-10-05 11:31:55 -04:00
Trond Myklebust
22aaf71495 NFSv4.1: Ensure that the layout sequence id stays 'close' to the current
Clamp the layout barrier sequence id to the current sequence id
minus the maximum number of outstanding layoutget requests.

Also ensure that we correctly initialise lo->plh_barrier if there are
no layout segments associated to this layout header.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04 16:57:48 -07:00
Trond Myklebust
0f35ad6f68 NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04 16:28:17 -07:00
Linus Torvalds
e1cc485262 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 & udf fixes from Jan Kara:
 "Shortlog pretty much says it all.

  The interesting bits are UDF support for direct IO and ext3 fix for a
  long standing oops in data=journal mode."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: Fix assertion failure in commit code due to lacking transaction credits
  UDF: Add support for O_DIRECT
  ext3: Replace 0 with NULL for pointer in super.c file
  udf: add writepages support for udf
  ext3: don't clear orphan list on ro mount with errors
  reiserfs: Make reiserfs_xattr_handlers static
2012-10-04 09:14:01 -07:00
David Sterba
7e97b8daf6 btrfs: allow setting NOCOW for a zero sized file via ioctl
Hi,

the patch si simple, but it has user visible impact and I'm not quite sure how
to resolve it.

In short, $subj says it, chattr -C supports it and we want to use it.

The conditions that acutally allow to change the NOCOW flag are clear. What if
I try to set the flag on a file that is not empty? Options:

1) whole ioctl will fail, EINVAL
2.1) ioctl will succeed, the NOCOW flag will be silently removed, but the file
     will stay COW-ed and checksummed
2.2) ioctl will succeed, flag will not be removed and a syslog message will
     warn that the COW flag has not been changed
2.2.1) dtto, no syslog message

Man page of chattr states that

 "If it is set on a file which already has data blocks, it is undefined when
 the blocks assigned to the file will be fully stable."

Yes, it's undefined and with current implementation it'll never happen. So from
this end, the user cannot expect anything. I'm trying to find a reasonable
behaviour, so that a command like 'chattr -R -aijS +C' to tweak a broad set of
flags in a deep directory does not fail unnecessarily and does not pollute the
log.

My personal preference is 2.2.1, but my dev's oppinion is skewed, not counting
the fact that I know the code and otherwise would look there before consulting
the documentation.

The patch implements 2.2.1.

david

-------------8<-------------------
From: David Sterba <dsterba@suse.cz>

It's safe to turn off checksums for a zero sized file.

http://thread.gmane.org/gmane.comp.file-systems.btrfs/18030

"We cannot switch on NODATASUM for a file that already has extents that
are checksummed. The invariant here is that either all the extents or
none are checksummed.

Theoretically it's possible to add/remove all checksums from a given
file, but it's a potentially longtime operation, the file has to be in
some intermediate state where the checksums partially exist but have to
be ignored (for the csum->nocsum) until the file is fully converted,
this brings more special cases to extent handling, it has to survive
power failure and remain consistent, and probably needs to be restarted
after next mount."

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-10-04 09:40:00 -04:00
Josef Bacik
c3308f84c1 Btrfs: fix punch hole when no extent exists
I saw the warning in btrfs_drop_extent_cache where our end is less than our
start while running xfstests 68 in a loop.  This is because we
unconditionally do drop_end = min(end, extent_end) in
__btrfs_drop_extents(), even though we may not have found an extent in the
range we were looking to drop.  So keep track of wether or not we found
something, and if we didn't just use our end.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:40:00 -04:00
Josef Bacik
926ced123b Btrfs: don't do anything in our ->freeze_fs and ->unfreeze_fs
We do not need to do anything special to freeze or unfreeze, it's all taken
care of by the generic work, and what we currently have is wrong anyway
since we shouldn't be returnning to userspace with mutexes held anyway.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:40:00 -04:00
Josef Bacik
892951a92e Btrfs: remove unused write cache pages hook
The btree inode has it's own write cache pages so we can remove this write
cache pages hook as it's not used.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:59 -04:00
Josef Bacik
b5bae2612a Btrfs: fix race when getting the eb out of page->private
We can race when checking wether PagePrivate is set on a page and we
actually have an eb saved in the pages private pointer.  We could have
easily written out this page and released it in the time that we did the
pagevec lookup and actually got around to looking at this page.  So use
mapping->private_lock to ensure we get a consistent view of the
page->private pointer.  This is inline with the alloc and releasepage paths
which use private_lock when manipulating page->private.  Thanks,

Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:59 -04:00
Josef Bacik
ff44c6e36d Btrfs: do not hold the write_lock on the extent tree while logging
Dave Sterba pointed out a sleeping while atomic bug while doing fsync.  This
is because I'm an idiot and didn't realize that rwlock's were spin locks, so
we've been holding this thing while doing allocations and such which is not
good.  This patch fixes this by dropping the write lock before we do
anything heavy and re-acquire it when it is done.  We also need to take a
ref on the em's in case their corresponding pages are evicted and mark them
as being logged so that releasepage does not remove them and doesn't remove
them from our local list.  Thanks,

Reported-by: Dave Sterba <dave@jikos.cz>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:58 -04:00
Josef Bacik
98114659e0 Btrfs: fix race with freeze and free space inodes
So we start our freeze, somebody comes in and does an fsync() on a file
where we have to commit a transaction for whatever reason, and we will
deadlock because the freeze is waiting on FS_FREEZE people to stop writing
to the file system, but the transaction is waiting for its free space inodes
to be written out, which are in turn waiting on sb_start_intwrite while
trying to write the file extents.  To fix this we'll just skip the
sb_start_intwrite() if we TRANS_JOIN_NOLOCK since we're being waited on by a
transaction commit so we're safe wrt to freeze and this will keep us from
deadlocking.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-10-04 09:39:58 -04:00