Commit Graph

542017 Commits

Author SHA1 Message Date
Guenter Roeck
aacfbe6a97 kernel/watchdog: move NMI function header declarations from watchdog.h to nmi.h
The kernel's NMI watchdog has nothing to do with the watchdog subsystem.
Its header declarations should be in linux/nmi.h, not linux/watchdog.h.

The code provided two sets of dummy functions if HARDLOCKUP_DETECTOR is
not configured, one in the include file and one in kernel/watchdog.c.
Remove the dummy functions from kernel/watchdog.c and use those from the
include file.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Frederic Weisbecker
314b08ff52 watchdog: simplify housekeeping affinity with the appropriate mask
housekeeping_mask gathers all the CPUs that aren't part of the nohz_full
set.  This is exactly what we want the watchdog to be affine to without
the need to use complicated cpumask operations.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Frederic Weisbecker
230ec93909 smpboot: allow passing the cpumask on per-cpu thread registration
It makes the registration cheaper and simpler for the smpboot per-cpu
kthread users that don't need to always update the cpumask after threads
creation.

[sfr@canb.auug.org.au: fix for allow passing the cpumask on per-cpu thread registration]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Frederic Weisbecker
3dd08c0c91 smpboot: make cleanup to mirror setup
The per-cpu kthread cleanup() callback is the mirror of the setup()
callback.  When the per-cpu kthread is started, it first calls setup()
to initialize the resources which are then released by cleanup() when
the kthread exits.

Now since the introduction of a per-cpu kthread cpumask, the kthreads
excluded by the cpumask on boot may happen to be parked immediately
after their creation without taking the setup() stage, waiting to be
asked to unpark to do so.  Then when smpboot_unregister_percpu_thread()
is later called, the kthread is stopped without having ever called
setup().

But this triggers a bug as the kthread unconditionally calls cleanup()
on exit but this doesn't mirror any setup().  Thus the kernel crashes
because we try to free resources that haven't been initialized, as in
the watchdog case:

    WATCHDOG disable 0
    WATCHDOG disable 1
    WATCHDOG disable 2
    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: hrtimer_active+0x26/0x60
    [...]
    Call Trace:
      hrtimer_try_to_cancel+0x1c/0x280
      hrtimer_cancel+0x1d/0x30
      watchdog_disable+0x56/0x70
      watchdog_cleanup+0xe/0x10
      smpboot_thread_fn+0x23c/0x2c0
      kthread+0xf8/0x110
      ret_from_fork+0x3f/0x70

This bug is currently masked with explicit kthread unparking before
kthread_stop() on smpboot_destroy_threads(). This forces a call to
setup() and then unpark().

We could fix this by unconditionally calling setup() on kthread entry.
But setup() isn't always cheap.  In the case of watchdog it launches
hrtimer, perf events, etc...  So we may as well like to skip it if there
are chances the kthread will never be used, as in a reduced cpumask value.

So let's simply do a state machine check before calling cleanup() that
makes sure setup() has been called before mirroring it.

And remove the nasty hack workaround.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Frederic Weisbecker
5869b5064b smpboot: fix memory leak on error handling
The cpumask is allocated before threads get created. If the latter step
fails, we need to free the cpumask.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Kees Cook
a068acf2ee fs: create and use seq_show_option for escaping
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
46359295a3 ocfs2: clean up redundant NULL checks before kfree
NULL check before kfree is redundant and so clean them up.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joe Perches
7ecef14ab1 ocfs2: neaten do_error, ocfs2_error and ocfs2_abort
These uses sometimes do and sometimes don't have '\n' terminations.  Make
the uses consistently use '\n' terminations and remove the newline from
the functions.

Miscellanea:

o Coalesce formats
o Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Xue jiufei
d0c97d52f5 ocfs2: do not set fs read-only if rec[0] is empty while committing truncate
While appending an extent to a file, it will call these functions:
ocfs2_insert_extent

  -> call ocfs2_grow_tree() if there's no free rec
     -> ocfs2_add_branch add a new branch to extent tree,
        now rec[0] in the leaf of rightmost path is empty
  -> ocfs2_do_insert_extent
     -> ocfs2_rotate_tree_right
       -> ocfs2_extend_rotate_transaction
          -> jbd2_journal_restart if jbd2_journal_extend fail
     -> ocfs2_insert_path
        -> ocfs2_extend_trans
          -> jbd2_journal_restart if jbd2_journal_extend fail
        -> ocfs2_insert_at_leaf
     -> ocfs2_et_update_clusters
Function jbd2_journal_restart() may be called and it may happened that
buffers dirtied in ocfs2_add_branch() are committed
while buffers dirtied in ocfs2_insert_at_leaf() and
ocfs2_et_update_clusters() are not.
So an empty rec[0] is left in rightmost path which will cause
read-only filesystem when call ocfs2_commit_truncate()
with the error message: "Inode %lu has an empty extent record".

This is not a serious problem, so remove the rightmost path when call
ocfs2_commit_truncate().

Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
yangwenfang
7f27ec978b ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()
1: After we call ocfs2_journal_access_di() in ocfs2_write_begin(),
   jbd2_journal_restart() may also be called, in this function transaction
   A's t_updates-- and obtains a new transaction B.  If
   jbd2_journal_commit_transaction() is happened to commit transaction A,
   when t_updates==0, it will continue to complete commit and unfile
   buffer.

   So when jbd2_journal_dirty_metadata(), the handle is pointed a new
   transaction B, and the buffer head's journal head is already freed,
   jh->b_transaction == NULL, jh->b_next_transaction == NULL, it returns
   EINVAL, So it triggers the BUG_ON(status).

thread 1                                          jbd2
ocfs2_write_begin                     jbd2_journal_commit_transaction
ocfs2_write_begin_nolock
  ocfs2_start_trans
    jbd2__journal_start(t_updates+1,
                       transaction A)
    ocfs2_journal_access_di
    ocfs2_write_cluster_by_desc
      ocfs2_mark_extent_written
        ocfs2_change_extent_flag
          ocfs2_split_extent
            ocfs2_extend_rotate_transaction
              jbd2_journal_restart
              (t_updates-1,transaction B) t_updates==0
                                        __jbd2_journal_refile_buffer
                                        (jh->b_transaction = NULL)
ocfs2_write_end
ocfs2_write_end_nolock
    ocfs2_journal_dirty
        jbd2_journal_dirty_metadata(bug)
   ocfs2_commit_trans

2.  In ext4, I found that: jbd2_journal_get_write_access() called by
   ext4_write_end.

ext4_write_begin
    ext4_journal_start
        __ext4_journal_start_sb
            ext4_journal_check_start
            jbd2__journal_start

ext4_write_end
    ext4_mark_inode_dirty
        ext4_reserve_inode_write
            ext4_journal_get_write_access
                jbd2_journal_get_write_access
        ext4_mark_iloc_dirty
            ext4_do_update_inode
                ext4_handle_dirty_metadata
                    jbd2_journal_dirty_metadata

3. So I think we should put ocfs2_journal_access_di before
   ocfs2_journal_dirty in the ocfs2_write_end.  and it works well after my
   modification.

Signed-off-by: vicky <vicky.yangwenfang@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Zhangguanghui <zhang.guanghui@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Tina Ruchandani
40476b8294 ocfs2: use 64bit variables to track heartbeat time
o2hb_elapsed_msecs computes the time taken for a disk heartbeat.
'struct timeval' variables are used to store start and end times.  On
32-bit systems, the 'tv_sec' component of 'struct timeval' will overflow
in year 2038 and beyond.

This patch solves the overflow with the following:

1. Replace o2hb_elapsed_msecs using 'ktime_t' values to measure start
   and end time, and built-in function 'ktime_ms_delta' to compute the
   elapsed time.  ktime_get_real() is used since the code prints out the
   wallclock time.

2. Changes format string to print time as a single 64-bit nanoseconds
   value ("%lld") instead of seconds and microseconds.  This simplifies
   the code since converting ktime_t to that format would need expensive
   computation.  However, the debug log string is less readable than the
   previous format.

Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Suggested by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
ad69482122 ocfs2: fix race between crashed dio and rm
There is a race case between crashed dio and rm, which will lead to
OCFS2_VALID_FL not set read-only.

  N1                              N2
  ------------------------------------------------------------------------
  dd with direct flag
                                  rm file
  crashed with an dio entry left
  in orphan dir
                                  clear OCFS2_VALID_FL in
                                  ocfs2_remove_inode
                                  recover N1 and read the corrupted inode,
                                  and set filesystem read-only

So we skip the inode deletion this time and wait for dio entry recovered
first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Yiwen Jiang
f57a22ddec ocfs2: avoid access invalid address when read o2dlm debug messages
The following case will lead to a lockres is freed but is still in use.

cat /sys/kernel/debug/o2dlm/locking_state	dlm_thread
lockres_seq_start
    -> lock dlm->track_lock
    -> get resA
                                                resA->refs decrease to 0,
                                                call dlm_lockres_release,
                                                and wait for "cat" unlock.
Although resA->refs is already set to 0,
increase resA->refs, and then unlock
                                                lock dlm->track_lock
                                                    -> list_del_init()
                                                    -> unlock
                                                    -> free resA

In such a race case, invalid address access may occurs.  So we should
delete list res->tracking before resA->refs decrease to 0.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Tariq Saeed
743b5f1434 ocfs2: take inode lock in ocfs2_iop_set/get_acl()
This bug in mainline code is pointed out by Mark Fasheh.  When
ocfs2_iop_set_acl() and ocfs2_iop_get_acl() are entered from VFS layer,
inode lock is not held.  This seems to be regression from older kernels.
The patch is to fix that.

Orabug: 20189959
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Tariq Saeed
3d46a44a0c ocfs2: fix BUG_ON() in ocfs2_ci_checkpointed()
PID: 614    TASK: ffff882a739da580  CPU: 3   COMMAND: "ocfs2dc"
  #0 [ffff882ecc3759b0] machine_kexec at ffffffff8103b35d
  #1 [ffff882ecc375a20] crash_kexec at ffffffff810b95b5
  #2 [ffff882ecc375af0] oops_end at ffffffff815091d8
  #3 [ffff882ecc375b20] die at ffffffff8101868b
  #4 [ffff882ecc375b50] do_trap at ffffffff81508bb0
  #5 [ffff882ecc375ba0] do_invalid_op at ffffffff810165e5
  #6 [ffff882ecc375c40] invalid_op at ffffffff815116fb
     [exception RIP: ocfs2_ci_checkpointed+208]
     RIP: ffffffffa0a7e940  RSP: ffff882ecc375cf0  RFLAGS: 00010002
     RAX: 0000000000000001  RBX: 000000000000654b  RCX: ffff8812dc83f1f8
     RDX: 00000000000017d9  RSI: ffff8812dc83f1f8  RDI: ffffffffa0b2c318
     RBP: ffff882ecc375d20   R8: ffff882ef6ecfa60   R9: ffff88301f272200
     R10: 0000000000000000  R11: 0000000000000000  R12: ffffffffffffffff
     R13: ffff8812dc83f4f0  R14: 0000000000000000  R15: ffff8812dc83f1f8
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #7 [ffff882ecc375d28] ocfs2_check_meta_downconvert at ffffffffa0a7edbd [ocfs2]
  #8 [ffff882ecc375d38] ocfs2_unblock_lock at ffffffffa0a84af8 [ocfs2]
  #9 [ffff882ecc375dc8] ocfs2_process_blocked_lock at ffffffffa0a85285 [ocfs2]
#10 [ffff882ecc375e18] ocfs2_downconvert_thread_do_work at ffffffffa0a85445 [ocfs2]
#11 [ffff882ecc375e68] ocfs2_downconvert_thread at ffffffffa0a854de [ocfs2]
#12 [ffff882ecc375ee8] kthread at ffffffff81090da7
#13 [ffff882ecc375f48] kernel_thread_helper at ffffffff81511884
assert is tripped because the tran is not checkpointed and the lock level is PR.

Some time ago, chmod command had been executed. As result, the following call
chain left the inode cluster lock in PR state, latter on causing the assert.
system_call_fastpath
  -> my_chmod
   -> sys_chmod
    -> sys_fchmodat
     -> notify_change
      -> ocfs2_setattr
       -> posix_acl_chmod
        -> ocfs2_iop_set_acl
         -> ocfs2_set_acl
          -> ocfs2_acl_set_mode
Here is how.
1119 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr)
1120 {
1247         ocfs2_inode_unlock(inode, 1); <<< WRONG thing to do.
..
1258         if (!status && attr->ia_valid & ATTR_MODE) {
1259                 status =  posix_acl_chmod(inode, inode->i_mode);

519 posix_acl_chmod(struct inode *inode, umode_t mode)
520 {
..
539         ret = inode->i_op->set_acl(inode, acl, ACL_TYPE_ACCESS);

287 int ocfs2_iop_set_acl(struct inode *inode, struct posix_acl *acl, ...
288 {
289         return ocfs2_set_acl(NULL, inode, NULL, type, acl, NULL, NULL);

224 int ocfs2_set_acl(handle_t *handle,
225                          struct inode *inode, ...
231 {
..
252                                 ret = ocfs2_acl_set_mode(inode, di_bh,
253                                                          handle, mode);

168 static int ocfs2_acl_set_mode(struct inode *inode, struct buffer_head ...
170 {
183         if (handle == NULL) {
                    >>> BUG: inode lock not held in ex at this point <<<
184                 handle = ocfs2_start_trans(OCFS2_SB(inode->i_sb),
185                                            OCFS2_INODE_UPDATE_CREDITS);

ocfs2_setattr.#1247 we unlock and at #1259 call posix_acl_chmod. When we reach
ocfs2_acl_set_mode.#181 and do trans, the inode cluster lock is not held in EX
mode (it should be). How this could have happended?

We are the lock master, were holding lock EX and have released it in
ocfs2_setattr.#1247.  Note that there are no holders of this lock at
this point.  Another node needs the lock in PR, and we downconvert from
EX to PR.  So the inode lock is PR when do the trans in
ocfs2_acl_set_mode.#184.  The trans stays in core (not flushed to disc).
Now another node want the lock in EX, downconvert thread gets kicked
(the one that tripped assert abovt), finds an unflushed trans but the
lock is not EX (it is PR).  If the lock was at EX, it would have flushed
the trans ocfs2_ci_checkpointed -> ocfs2_start_checkpoint before
downconverting (to NULL) for the request.

ocfs2_setattr must not drop inode lock ex in this code path.  If it
does, takes it again before the trans, say in ocfs2_set_acl, another
cluster node can get in between, execute another setattr, overwriting
the one in progress on this node, resulting in a mode acl size combo
that is a mix of the two.

Orabug: 20189959
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Norton.Zhu
72f6fe1fe5 ocfs2: optimize error handling in dlm_request_join
Currently error handling in dlm_request_join is a little obscure, so
optimize it to promote readability.

If packet.code is invalid, reset it to JOIN_DISALLOW to keep it
meaningful.  It only influences the log printing.

Signed-off-by: Norton.Zhu <norton.zhu@huawei.com>
Cc: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Yiwen Jiang
928dda1f94 ocfs2: fix a tiny case that inode can not removed
When running dirop_fileop_racer we found a case that inode
can not removed.

Two nodes, say Node A and Node B, mount the same ocfs2 volume.  Create
two dirs /race/1/ and /race/2/ in the filesystem.

  Node A                            Node B
  rm -r /race/2/
                                    mv /race/1/ /race/2/
  call ocfs2_unlink(), get
  the EX mode of /race/2/
                                    wait for B unlock /race/2/
  decrease i_nlink of /race/2/ to 0,
  and add inode of /race/2/ into
  orphan dir, unlock /race/2/
                                    got EX mode of /race/2/. because
                                    /race/1/ is dir, so inc i_nlink
                                    of /race/2/ and update into disk,
                                    unlock /race/2/
  because i_nlink of /race/2/
  is not zero, this inode will
  always remain in orphan dir

This patch fixes this case by test whether i_nlink of new dir is zero.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Xue jiufei <xuejiufei@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
WeiWei Wang
6ab855a99b ocfs2: add ip_alloc_sem in direct IO to protect allocation changes
In ocfs2, ip_alloc_sem is used to protect allocation changes on the
node.  In direct IO, we add ip_alloc_sem to protect date consistent
between direct-io and ocfs2_truncate_file race (buffer io use
ip_alloc_sem already).  Although inode->i_mutex lock is used to avoid
concurrency of above situation, i think ip_alloc_sem is still needed
because protect allocation changes is significant.

Other filesystem like ext4 also uses rw_semaphore to protect data
consistent between get_block-vs-truncate race by other means, So
ip_alloc_sem in ocfs2 direct io is needed.

Signed-off-by: Weiwei Wang <wangww631@huawei.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Goldwyn Rodrigues
34237681e0 ocfs2: clear the rest of the buffers on error
In case a validation fails, clear the rest of the buffers and return the
error to the calling function.

This also facilitates bubbling up the error originating from ocfs2_error
to calling functions.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Goldwyn Rodrigues
17a5b9ab32 ocfs2: acknowledge return value of ocfs2_error()
Caveat: This may return -EROFS for a read case, which seems wrong.  This
is happening even without this patch series though.  Should we convert
EROFS to EIO?

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Goldwyn Rodrigues
7d0fb9148a ocfs2: add errors=continue
OCFS2 is often used in high-availaibility systems.  However, ocfs2
converts the filesystem to read-only at the drop of the hat.  This may
not be necessary, since turning the filesystem read-only would affect
other running processes as well, decreasing availability.

This attempt is to add errors=continue, which would return the EIO to
the calling process and terminate furhter processing so that the
filesystem is not corrupted further.  However, the filesystem is not
converted to read-only.

As a future plan, I intend to create a small utility or extend
fsck.ocfs2 to fix small errors such as in the inode.  The input to the
utility such as the inode can come from the kernel logs so we don't have
to schedule a downtime for fixing small-enough errors.

The patch changes the ocfs2_error to return an error.  The error
returned depends on the mount option set.  If none is set, the default
is to turn the filesystem read-only.

Perhaps errors=continue is not the best option name.  Historically it is
used for making an attempt to progress in the current process itself.
Should we call it errors=eio? or errors=killproc? Suggestions/Comments
welcome.

Sources are available at:
  https://github.com/goldwynr/linux/tree/error-cont

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Xue jiufei
513e2dae94 ocfs2: flush inode data to disk and free inode when i_count becomes zero
Disk inode deletion may be heavily delayed when one node unlink a file
after the same dentry is freed on another node(say N1) because of memory
shrink but inode is left in memory.  This inode can only be freed while
N1 doing the orphan scan work.

However, N1 may skip orphan scan for several times because other nodes
may do the work earlier.  In our tests, it may take 1 hour on 4 nodes
cluster and it hurts the user experience.  So we think the inode should
be freed after the data flushed to disk when i_count becomes zero to
avoid such circumstances.

Signed-off-by: Joyce.xue <xuejiufei@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Sanidhya Kashyap
0f5e7b41f9 ocfs2: trusted xattr missing CAP_SYS_ADMIN check
The trusted extended attributes are only visible to the process which
hvae CAP_SYS_ADMIN capability but the check is missing in ocfs2
xattr_handler trusted list.  The check is important because this will be
used for implementing mechanisms in the userspace for which other
ordinary processes should not have access to.

Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Taesoo kim <taesoo@gatech.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
jiangyiwen
807a790711 ocfs2: set filesytem read-only when ocfs2_delete_entry failed.
In ocfs2_rename, it will lead to an inode with two entried(old and new) if
ocfs2_delete_entry(old) failed.  Thus, filesystem will be inconsistent.

The case is described below:

ocfs2_rename
    -> ocfs2_start_trans
    -> ocfs2_add_entry(new)
    -> ocfs2_delete_entry(old)
        -> __ocfs2_journal_access *failed* because of -ENOMEM
    -> ocfs2_commit_trans

So filesystem should be set to read-only at the moment.

Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
f83c7b5e9f ocfs2/dlm: use list_for_each_entry instead of list_for_each
Use list_for_each_entry instead of list_for_each to simplify code.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
0e3d9eafb8 ocfs2: remove unneeded code in dlm_register_domain_handlers
The last goto statement is unneeded, so remove it.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
cdd09f49cb ocfs2: fix BUG when o2hb_register_callback fails
In dlm_register_domain_handlers, if o2hb_register_callback fails, it
will call dlm_unregister_domain_handlers to unregister.  This will
trigger the BUG_ON in o2hb_unregister_callback because hc_magic is 0.
So we should call o2hb_setup_callback to initialize hc first.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
914a9b7429 ocfs2: remove unneeded code in ocfs2_dlm_init
status is already initialized and it will only be 0 or negatives in the
code flow.  So remove the unneeded assignment after the lable 'local'.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
3cb2ec43f6 ocfs2: adjust code to match locking/unlocking order
Unlocking order in ocfs2_unlink and ocfs2_rename mismatches the
corresponding locking order, although it won't cause issues, adjust the
code so that it looks more reasonable.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
bf59e6623a ocfs2: clean up unused local variables in ocfs2_file_write_iter
Since commit 86b9c6f3f8 ("ocfs2: remove filesize checks for sync I/O
journal commit") removes filesize checks for sync I/O journal commit,
variables old_size and old_clusters are not actually used any more.  So
clean them up.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Christophe JAILLET
372a447c4b ocfs2: do not log twice error messages
'o2hb_map_slot_data' and 'o2hb_populate_slot_data' are called from only
one place, in 'o2hb_region_dev_write'.  Return value is checked and
'mlog_errno' is called to log a message if it is not 0.

So there is no need to call 'mlog_errno' directly within these functions.
This would result on logging the message twice.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
acf8fdbe6a ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access
When storage network is unstable, it may trigger the BUG in
__ocfs2_journal_access because of buffer not uptodate.  We can retry the
write in this case or return error instead of BUG.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Zhangguanghui <zhang.guanghui@h3c.com>
Tested-by: Zhangguanghui <zhang.guanghui@h3c.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
faaebf18f8 ocfs2: fix several issues of append dio
1) Take rw EX lock in case of append dio.
2) Explicitly treat the error code -EIOCBQUEUED as normal.
3) Set di_bh to NULL after brelse if it may be used again later.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Joseph Qi
512f62acbd ocfs2: fix race between dio and recover orphan
During direct io the inode will be added to orphan first and then
deleted from orphan.  There is a race window that the orphan entry will
be deleted twice and thus trigger the BUG when validating
OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan.

ocfs2_direct_IO_write
    ...
    ocfs2_add_inode_to_orphan
    >>>>>>>> race window.
             1) another node may rm the file and then down, this node
             take care of orphan recovery and clear flag
             OCFS2_DIO_ORPHANED_FL.
             2) since rw lock is unlocked, it may race with another
             orphan recovery and append dio.
    ocfs2_del_inode_from_orphan

So take inode mutex lock when recovering orphans and make rw unlock at the
end of aio write in case of append dio.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Weiwei Wang <wangww631@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Alexander Kuleshov
81cf09edc7 sh: use PFN_DOWN macro
Replace ((x) >> PAGE_SHIFT) with the predefined PFN_DOWN macro.

Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
SF Markus Elfring
917520e100 ntfs: delete unnecessary checks before calling iput()
iput() tests whether its argument is NULL and then returns immediately.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Zhao Lei
35108d7138 scripts/spelling.txt: add some typo-words
I wrote a small script to show word-pair from all linux spelling-typo
commits, and get following result by sort | uniq -c:

    181 occured -> occurred
     78 transfered -> transferred
     67 recieved -> received
     65 dependant -> dependent
     58 wether -> whether
     56 accomodate -> accommodate
     54 occured -> occurred
     51 recieve -> receive
     47 cant -> can't
     40 sucessfully -> successfully
     ...

Some of them are not in spelling.txt, this patch adds the most common
word-pairs into spelling.txt.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Robert Jarzmik
e260fe01fa scripts: decode_stacktrace: fix ARM architecture decoding
Fix the stack decoder for the ARM architecture.
An ARM stack is designed as :

[   81.547704] [<c023eb04>] (bucket_find_contain) from [<c023ec88>] (check_sync+0x40/0x4f8)
[   81.559668] [<c023ec88>] (check_sync) from [<c023f8c4>] (debug_dma_sync_sg_for_cpu+0x128/0x194)
[   81.571583] [<c023f8c4>] (debug_dma_sync_sg_for_cpu) from [<c0327dec>] (__videobuf_s

The current script doesn't expect the symbols to be bound by
parenthesis, and triggers the following errors :

  awk: cmd. line:1: error: Unmatched ( or \(: / (check_sync$/
  [   81.547704] (bucket_find_contain) from (check_sync+0x40/0x4f8)

Fix it by chopping starting and ending parenthesis from the each symbol
name.

As a side note, this probably comes from the function
dump_backtrace_entry(), which is implemented differently for each
architecture.  That makes a single decoding script a bit a challenge.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Jean Delvare
fa70900e09 scripts/Lindent: handle missing indent gracefully
If indent is not found, bail out immediately instead of spitting random
shell script error messages.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Bart Van Assche
d40e1e6532 kerneldoc: Convert error messages to GNU error message format
Editors like emacs and vi recognize a number of error message formats.
The format used by the kerneldoc tool is not recognized by emacs.

Change the kerneldoc error message format to the GNU style such that the
emacs prev-error and next-error commands can be used to navigate through
kerneldoc error messages.  For more information about the GNU error
message format, see also
  https://www.gnu.org/prep/standards/html_node/Errors.html.

This patch has been generated via the following sed command:

  sed -i.orig 's/Error(\${file}:\$.):/\${file}:\$.: error:/g;s/Warning(\${file}:\$.):/\${file}:\$.: warning:/g;s/Warning(\${file}):/\${file}:1: warning:/g;s/Info(\${file}:\$.):/\${file}:\$.: info:/g' scripts/kernel-doc

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Sudip Mukherjee
c22b6ae69e scripts/spelling.txt: spelling of uninitialized
I just did a spelling mistake of uninitialized and wrote that as
unintialized.  Fortunately I noticed it in my final review.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Maninder Singh
779a6ce877 scripts/spelling.txt: add misspelled words for check
misspelled words for check:-
 chcek
 chck
 cehck

I myself did these spell mistakes in changelog for patches, Thus
suggesting to add in spelling.txt, so that checkpatch.pl warns it
earlier.  References:-

./arch/powerpc/kernel/exceptions-64e.S:456: . . . make sure you chcek
https://lkml.org/lkml/2015/6/25/289
./arch/x86/mm/pageattr.c:1368: * No need to cehck in that case

[akpm@linux-foundation.org: add whcih->which, whcih I always get wrong]
Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Jan Kara
4712e722f9 fsnotify: get rid of fsnotify_destroy_mark_locked()
fsnotify_destroy_mark_locked() is subtle to use because it temporarily
releases group->mark_mutex.  To avoid future problems with this
function, split it into two.

fsnotify_detach_mark() is the part that needs group->mark_mutex and
fsnotify_free_mark() is the part that must be called outside of
group->mark_mutex.  This way it's much clearer what's going on and we
also avoid some pointless acquisitions of group->mark_mutex.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Jan Kara
925d1132a0 fsnotify: remove mark->free_list
Free list is used when all marks on given inode / mount should be
destroyed when inode / mount is going away.  However we can free all of
the marks without using a special list with some care.

Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Jan Kara
1e39fc0183 fsnotify: document mark locking
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Jan Kara
3c53e51421 fsnotify: fix check in inotify fdinfo printing
A check in inotify_fdinfo() checking whether mark is valid was always
true due to a bug.  Luckily we can never get to invalidated marks since
we hold mark_mutex and invalidated marks get removed from the group list
when they are invalidated under that mutex.

Anyway fix the check to make code more future proof.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Dave Hansen
7c49b86164 fs/notify: optimize inotify/fsnotify code for unwatched files
I have a _tiny_ microbenchmark that sits in a loop and writes single
bytes to a file.  Writing one byte to a tmpfs file is around 2x slower
than reading one byte from a file, which is a _bit_ more than I expecte.
This is a dumb benchmark, but I think it's hard to deny that write() is
a hot path and we should avoid unnecessary overhead there.

I did a 'perf record' of 30-second samples of read and write.  The top
item in a diffprofile is srcu_read_lock() from fsnotify().  There are
active inotify fd's from systemd, but nothing is actually listening to
the file or its part of the filesystem.

I *think* we can avoid taking the srcu_read_lock() for the common case
where there are no actual marks on the file.  This means that there will
both be nothing to notify for *and* implies that there is no need for
clearing the ignore mask.

This patch gave a 13.1% speedup in writes/second on my test, which is an
improvement from the 10.8% that I saw with the last version.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Yuriy Kolerov
031e29b587 drivers/video/concole: add negative dependency for VGA_CONSOLE on ARC
Architectures which support VGA console must define screen_info
structurture from "uapi/linux/screen_info.h".  Otherwise undefined
symbol error occurs.  Usually it's defined in "setup.c" for each
architecture.

If an architecture does not support VGA console (ARC's case) there are 2
ways: define a dummy instance of screen_info or add a negative
dependency for VGA_CONSOLE in to prevent selecting this option.

I've implemented the second way.  However the best solution is to add
HAVE_VGA_CONSOLE option for targets which support VGA console.  Then
turn off VGA_CONSOLE by default and add dependency to HAVE_VGA_CONSOLE.
But right now it's better to just add a negative dependency for ARC and
then consider how to collaborate about this issue with maintainers of
other architectures.

Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jaya Kumar <jayalk@intworks.biz>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andy Lutomirski
746bf6d642 capabilities: add a securebit to disable PR_CAP_AMBIENT_RAISE
Per Andrew Morgan's request, add a securebit to allow admins to disable
PR_CAP_AMBIENT_RAISE.  This securebit will prevent processes from adding
capabilities to their ambient set.

For simplicity, this disables PR_CAP_AMBIENT_RAISE entirely rather than
just disabling setting previously cleared bits.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Aaron Jones <aaronmdjones@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Markku Savela <msa@moth.iki.fi>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Andy Lutomirski
32ae976ed3 selftests/capabilities: Add tests for capability evolution
This test focuses on ambient capabilities.  It requires either root or
the ability to create user namespaces.  Some of the test cases will be
skipped for nonroot users.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Christoph Lameter <cl@linux.com> # Original author
Cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00