Commit Graph

1739 Commits

Author SHA1 Message Date
Andreas Gruenbacher
cda9dd4207 gfs2: Large-filesystem fix for 32-bit systems
Commit ff34245d switched from iget5_locked to iget_locked among other
things, but iget_locked doesn't work for filesystems larger than 2^32
blocks on 32-bit systems.  Switch back to iget5_locked.  Filesystems
larger than 2^32 blocks are unrealistic to work well on 32-bit systems,
so this is mostly a code cleanliness fix.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:08 -05:00
Andreas Gruenbacher
ec5ec66ba4 gfs2: Get rid of gfs2_ilookup
Now that gfs2_lookup_by_inum only takes the inode glock for new inodes
(and not for cached inodes anymore), there no longer is a need to
optimize the cached-inode case in gfs2_get_dentry or delete_work_func,
and gfs2_ilookup can be removed.

In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in
i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this
inconsistency goes away as well.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:08 -05:00
Andreas Gruenbacher
3ce37b2cb4 gfs2: Fix gfs2_lookup_by_inum lock inversion
The current gfs2_lookup_by_inum takes the glock of a presumed inode
identified by block number, verifies that the block is indeed an inode,
and then instantiates and reads the new inode via gfs2_inode_lookup.

However, instantiating a new inode may block on freeing a previous
instance of that inode (__wait_on_freeing_inode), and freeing an inode
requires to take the glock already held, leading to lock inversion and
deadlock.

Fix this by first instantiating the new inode, then verifying that the
block is an inode (if required), and then reading in the new inode, all
in gfs2_inode_lookup.

If the block we are looking for is not an inode, we discard the new
inode via iget_failed, which marks inodes as bad and unhashes them.
Other tasks waiting on that inode will get back a bad inode back from
ilookup or iget_locked; in that case, retry the lookup.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-27 09:47:07 -05:00
Andreas Gruenbacher
1e875f5a95 gfs2: Initialize iopen glock holder for new inodes
In gfs2_init_inode_once, initialize inode->i_iopen_gh.gh_gl to NULL:
otherwise, when gfs2_inode_lookup fails, the iopen glock holder can
remain unset and iget_failed can end up accessing random memory.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-17 08:35:03 -05:00
Bob Peterson
36e4ad0316 GFS2: don't set rgrp gl_object until it's inserted into rgrp tree
Before this patch, function read_rindex_entry would set a rgrp
glock's gl_object pointer to itself before inserting the rgrp into
the rgrp rbtree. The problem is: if another process was also reading
the rgrp in, and had already inserted its newly created rgrp, then
the second call to read_rindex_entry would overwrite that value,
then return a bad return code to the caller. Later, other functions
would reference the now-freed rgrp memory by way of gl_object.
In some cases, that could result in gfs2_rgrp_brelse being called
twice for the same rgrp: once for the failed attempt and once for
the "real" rgrp release. Eventually the kernel would panic.
There are also a number of other things that could go wrong when
a kernel module is accessing freed storage. For example, this could
result in rgrp corruption because the fake rgrp would point to a
fake bitmap in memory too, causing gfs2_inplace_reserve to search
some random memory for free blocks, and find some, since we were
never setting rgd->rd_bits to NULL before freeing it.

This patch fixes the problem by not setting gl_object until we
have successfully inserted the rgrp into the rbtree. Also, it sets
rd_bits to NULL as it frees them, which will ensure any accidental
access to the wrong rgrp will result in a kernel panic rather than
file system corruption, which is preferred.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-06-10 07:01:58 -05:00
Linus Torvalds
be1332c099 We've got nine patches this time:
- Abhi Das has two patches that fix a GFS2 splice issue (and an adjustment).
 - Ben Marzinski has a patch which allows the proper unmount of a GFS2
   file system after hitting a withdraw error.
 - I have a patch to fix a problem where GFS2 would dereference an error
   value, plus three cosmetic / refactoring patches.
 - Daniel DeFreez has a patch to fix two glock reference count problems,
   where GFS2 was not properly "uninitializing" its glock holder on error
   paths.
 - Denys Vlasenko has a patch to change a function to not be inlined,
   thus reducing the memory footprint of the GFS2 module.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXP2KLAAoJENeLYdPf93o7nScH/jlDCv+JzwBwAiFEVelgB657
 lFY+Q3bPvmcTaHiOqXzco7U1EeyvAuZ+x9duaFBBEGC2/NydIbkAdCP1XjDOTAGZ
 aVVAP8qov4adFiXTfBdzMYjJ7DArOy/JRnlC8WK5OiD/3yxsXJZNlAHTgjjL/ZMq
 PkhjxX2Lc9ZHGF9YIFTyeoBlqf1qO6WbhkxvSNefgq3c702QPM+T9XND5T1duGDF
 ThEcXS2DRRoE+6wr3h+ruvS5kcfxXRLLausAgqNpj2FYQOFCQ6XXgGK8pyXA5V/t
 3Vh8mHW0VmSjAfrRWslCRjd+APXoOf+7QHqOHUAA1dvQ6CGfC9qTETH9Wp32JCI=
 =8g+a
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got nine patches this time:

   - Abhi Das has two patches that fix a GFS2 splice issue (and an
     adjustment).

   - Ben Marzinski has a patch which allows the proper unmount of a GFS2
     file system after hitting a withdraw error.

   - I have a patch to fix a problem where GFS2 would dereference an
     error value, plus three cosmetic / refactoring patches.

   - Daniel DeFreez has a patch to fix two glock reference count
     problems, where GFS2 was not properly "uninitializing" its glock
     holder on error paths.

   - Denys Vlasenko has a patch to change a function to not be inlined,
     thus reducing the memory footprint of the GFS2 module"

* tag 'gfs2-4.7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Refactor gfs2_remove_from_journal
  GFS2: Remove allocation parms from gfs2_rbm_find
  gfs2: use inode_lock/unlock instead of accessing i_mutex directly
  GFS2: Add calls to gfs2_holder_uninit in two error handlers
  GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
  GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
  gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
  GFS2: Get rid of dead code in inode_go_demote_ok
  GFS2: ignore unlock failures after withdraw
2016-05-20 15:11:26 -07:00
Linus Torvalds
ba5a2655c2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull remaining vfs xattr work from Al Viro:
 "The rest of work.xattr (non-cifs conversions)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  btrfs: Switch to generic xattr handlers
  ubifs: Switch to generic xattr handlers
  jfs: Switch to generic xattr handlers
  jfs: Clean up xattr name mapping
  gfs2: Switch to generic xattr handlers
  ceph: kill __ceph_removexattr()
  ceph: Switch to generic xattr handlers
  ceph: Get rid of d_find_alias in ceph_set_acl
2016-05-18 10:08:45 -07:00
Linus Torvalds
a7fd20d1c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support SPI based w5100 devices, from Akinobu Mita.

   2) Partial Segmentation Offload, from Alexander Duyck.

   3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.

   4) Allow cls_flower stats offload, from Amir Vadai.

   5) Implement bpf blinding, from Daniel Borkmann.

   6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
      actually using FASYNC these atomics are superfluous.  From Eric
      Dumazet.

   7) Run TCP more preemptibly, also from Eric Dumazet.

   8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
      driver, from Gal Pressman.

   9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.

  10) Improve BPF usage documentation, from Jesper Dangaard Brouer.

  11) Support tunneling offloads in qed, from Manish Chopra.

  12) aRFS offloading in mlx5e, from Maor Gottlieb.

  13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
      Leitner.

  14) Add MSG_EOR support to TCP, this allows controlling packet
      coalescing on application record boundaries for more accurate
      socket timestamp sampling.  From Martin KaFai Lau.

  15) Fix alignment of 64-bit netlink attributes across the board, from
      Nicolas Dichtel.

  16) Per-vlan stats in bridging, from Nikolay Aleksandrov.

  17) Several conversions of drivers to ethtool ksettings, from Philippe
      Reynes.

  18) Checksum neutral ILA in ipv6, from Tom Herbert.

  19) Factorize all of the various marvell dsa drivers into one, from
      Vivien Didelot

  20) Add VF support to qed driver, from Yuval Mintz"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
  Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
  Revert "phy dp83867: Make rgmii parameters optional"
  r8169: default to 64-bit DMA on recent PCIe chips
  phy dp83867: Make rgmii parameters optional
  phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
  bpf: arm64: remove callee-save registers use for tmp registers
  asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
  switchdev: pass pointer to fib_info instead of copy
  net_sched: close another race condition in tcf_mirred_release()
  tipc: fix nametable publication field in nl compat
  drivers: net: Don't print unpopulated net_device name
  qed: add support for dcbx.
  ravb: Add missing free_irq() calls to ravb_close()
  qed: Remove a stray tab
  net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fec-mpc52xx: use phydev from struct net_device
  bpf, doc: fix typo on bpf_asm descriptions
  stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
  net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fs-enet: use phydev from struct net_device
  ...
2016-05-17 16:26:30 -07:00
Linus Torvalds
c2e7b20705 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs cleanups from Al Viro:
 "More cleanups from Christoph"

* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nfsd: use RWF_SYNC
  fs: add RWF_DSYNC aand RWF_SYNC
  ceph: use generic_write_sync
  fs: simplify the generic_write_sync prototype
  fs: add IOCB_SYNC and IOCB_DSYNC
  direct-io: remove the offset argument to dio_complete
  direct-io: eliminate the offset argument to ->direct_IO
  xfs: eliminate the pos variable in xfs_file_dio_aio_write
  filemap: remove the pos argument to generic_file_direct_write
  filemap: remove pos variables in generic_file_read_iter
2016-05-17 15:05:23 -07:00
Al Viro
1a39ba99b5 gfs2: Switch to generic xattr handlers
Switch to the generic xattr handlers and take the necessary glocks at
the layer below. The following are the new xattr "entry points"; they
are called with the glock held already in the following cases:

  gfs2_xattr_get: From SELinux, during lookups.
  gfs2_xattr_set: The glock is never held.
  gfs2_get_acl: From gfs2_create_inode -> posix_acl_create and
                gfs2_setattr -> posix_acl_chmod.
  gfs2_set_acl: From gfs2_setattr -> posix_acl_chmod.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12 22:28:05 -04:00
Al Viro
1d1bb236bc gfs2: switch to ->iterate_shared()
protected by glock and already used without locking the directory
by gfs2_get_name()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-12 17:00:20 -04:00
Bob Peterson
68cd4ce2ca GFS2: Refactor gfs2_remove_from_journal
This patch makes two simple changes to function gfs2_remove_from_journal.
First, it removes the parameter that specifies the transaction.
Since it's always passed in as current->journal_info, we might as well
set that in the function rather than passing it in. Second, it changes
the meta parameter to use an enum to make the code more clear.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-05-06 11:27:27 -05:00
Al Viro
9902af79c0 parallel lookups: actual switch to rwsem
ta-da!

The main issue is the lack of down_write_killable(), so the places
like readdir.c switched to plain inode_lock(); once killable
variants of rwsem primitives appear, that'll be dealt with.

lockdep side also might need more work

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-02 19:49:28 -04:00
Al Viro
84695ffee7 Merge getxattr prototype change into work.lookups
The rest of work.xattr stuff isn't needed for this branch
2016-05-02 19:45:47 -04:00
Bob Peterson
8381e60227 GFS2: Remove allocation parms from gfs2_rbm_find
Struct gfs2_alloc_parms ap is never referenced in function
gfs2_rbm_find, so this patch removes it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-05-02 09:49:22 -05:00
Abhi Das
80f4781d2c gfs2: use inode_lock/unlock instead of accessing i_mutex directly
i_mutex has been replaced by i_rwsem and directly accessing the
non-existent i_mutex breaks the kernel build.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-05-02 07:07:01 -05:00
Christoph Hellwig
dde0c2e798 fs: add IOCB_SYNC and IOCB_DSYNC
This will allow us to do per-I/O sync file writes, as required by a lot
of fileservers or storage targets.

XXX: Will need a few additional audits for O_DSYNC

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-01 19:58:39 -04:00
Christoph Hellwig
c8b8e32d70 direct-io: eliminate the offset argument to ->direct_IO
Including blkdev_direct_IO and dax_do_io.  It has to be ki_pos to actually
work, so eliminate the superflous argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-05-01 19:58:39 -04:00
Daniel DeFreez
9c7fe83530 GFS2: Add calls to gfs2_holder_uninit in two error handlers
This patch fixes two locations that do not call gfs2_holder_uninit
if gfs2_glock_nq returns an error.

Signed-off-by: Daniel DeFreez <dcdefreez@ucdavis.edu>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-19 19:59:11 -04:00
Bob Peterson
e97321fa09 GFS2: Don't dereference inode in gfs2_inode_lookup until it's valid
Function gfs2_inode_lookup was dereferencing the inode, and after,
it checks for the value being NULL. We need to check that first.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-14 09:52:50 -04:00
Denys Vlasenko
a527b38e14 GFS2: fs/gfs2/glock.c: Deinline do_error, save 1856 bytes
This function compiles to 522 bytes of machine code.

Error paths are not very time critical.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-12 12:39:12 -04:00
Al Viro
ce23e64013 ->getxattr(): pass dentry and inode as separate arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-11 00:48:00 -04:00
Al Viro
b296821a7c xattr_handler: pass dentry and inode as separate arguments of ->get()
... and do not assume they are already attached to each other

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 20:48:24 -04:00
Al Viro
fc64005c93 don't bother with ->d_inode->i_sb - it's always equal to ->d_sb
... and neither can ever be NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-04-10 17:11:51 -04:00
David S. Miller
ae95d71261 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-04-09 17:41:41 -04:00
Abhi Das
611526756a gfs2: Use gfs2 wrapper to sync inode before calling generic_file_splice_read()
gfs2_file_splice_read() f_op grabs and releases the cluster-wide
inode glock to sync the inode size to the latest.

Without this, generic_file_splice_read() uses an older i_size value
and can return EOF for valid offsets in the inode.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-04-05 12:06:15 -04:00
Bob Peterson
73cc86252b GFS2: Get rid of dead code in inode_go_demote_ok
Function inode_go_demote_ok had some code that was only executed
if gl_holders was not empty. However, if gl_holders was not empty,
the only caller, demote_ok(), returns before inode_go_demote_ok
would ever be called. Therefore, it's dead code, so I removed it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-04-05 11:59:18 -04:00
Bob Copeland
8f6fd83c6c rhashtable: accept GFP flags in rhashtable_walk_init
In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section.  Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.

Change all existing callsites to pass in GFP_KERNEL.

Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Bob Copeland <me@bobcopeland.com>
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-04-05 10:56:32 +02:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Benjamin Marzinski
3e11e53041 GFS2: ignore unlock failures after withdraw
After gfs2 has withdrawn the filesystem, it may still have many locks not
in the unlocked state.  If it is using lock_dlm, it will failed trying
the unlocks since it has already unmounted the lock manager. Instead, it
should set the SDF_SKIP_DLM_UNLOCK flag on withdraw, to signal that
it can skip the lock_manager on unlocks, and failback to lock_nolock
style unlocking.

Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-03-24 08:28:48 -04:00
Linus Torvalds
1ca80a0a3e GFS2: merge window
We only have six patches ready for this merge window.
 
 - Arnd Bergmann contributed a patch that fixes an uninitialized variable
   warning.
 - The second patch avoids a kernel panic due to referencing an iopen
   glock that may not be held, in an error path.
 - The third patch fixes a rounding error that caused xfs_tests direct IO
   write "fsx" tests to fail on GFS2.
 - The fourth patch tidies up the code path when glocks are being reused
   to recreate a dinode that was recently deleted.
 - The fifth reverts an ages-old patch that should no longer be needed, and
   which interfered with the transition of dinodes from unlinked to free.
 - And lastly, a patch to eliminate a function parameter that's not needed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW6qcKAAoJENeLYdPf93o70SwH/0czrFBIWaPbRgyuGLNvye4G
 qvfIk2Yky3UKfgUA+ZW+cAWkxeKubc9scMwce0elKxm0e03rNwIX0slIOx8hymj3
 19AgMnj3kcCJvRLdkBITNqnd6vTY2quadLN3j8I2cCNbHOV0GelEkP4jWcTHB+2F
 AG0ZJOsvvrcD1ClgdIvGdV52qipZApS/kgZjLpJBEyzxq8SpRe9vNqMDsDyoKWgi
 yjp0+aJ0IJAWA24fzdT5HE4fb5yGRWehg51l6Z2mbfXAvT+oeKcYrQiq1zhUutHw
 dT6SUHbjt+y0EulPClsf3r5zjRjVCCbpj0wkUY3kPB98lgnpAD2QnUP/JDA+CD0=
 =xRmX
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We only have six patches ready for this merge window:

   - Arnd Bergmann contributed a patch that fixes an uninitialized
     variable warning.

   - The second patch avoids a kernel panic due to referencing an iopen
     glock that may not be held, in an error path.

   - The third patch fixes a rounding error that caused xfs_tests direct
     IO write "fsx" tests to fail on GFS2.

   - The fourth patch tidies up the code path when glocks are being
     reused to recreate a dinode that was recently deleted.

   - The fifth reverts an ages-old patch that should no longer be
     needed, and which interfered with the transition of dinodes from
     unlinked to free.

   - And lastly, a patch to eliminate a function parameter that's not
     needed"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: Eliminate parameter non_block on gfs2_inode_lookup
  GFS2: Don't filter out I_FREEING inodes anymore
  GFS2: Prevent delete work from occurring on glocks used for create
  GFS2: Fix direct IO write rounding error
  gfs2: avoid uninitialized variable warning
  GFS2: Check if iopen is held when deleting inode
2016-03-17 16:51:32 -07:00
Bob Peterson
73b462d280 GFS2: Eliminate parameter non_block on gfs2_inode_lookup
Now that we're not filtering out I_FREEING inodes from our lookups
anymore, we can eliminate the non_block parameter from the lookup
function.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-03-15 10:46:50 -04:00
Bob Peterson
ff34245d52 GFS2: Don't filter out I_FREEING inodes anymore
This patch basically reverts a very old patch from 2008,
7a9f53b3c1, with the title
"Alternate gfs2_iget to avoid looking up inodes being freed".
The original patch was designed to avoid a deadlock caused by lock
ordering with try_rgrp_unlink. The patch forced the function to not
find inodes that were being removed by VFS. The problem is, that
made it impossible for nodes to delete their own unlinked dinodes
after a certain point in time, because the inode needed was not found
by this filtering process. There is no longer a need for the patch,
since function try_rgrp_unlink no longer locks the inode: All it does
is queue the glock onto the delete work_queue, so there should be no
more deadlock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2016-03-15 10:46:45 -04:00
Bob Peterson
a4923865ea GFS2: Prevent delete work from occurring on glocks used for create
This patch tries to prevent delete work (queued via iopen callback)
from executing if the glock is currently being used to create
a new inode.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-03-15 10:46:37 -04:00
Bob Peterson
2df6f47150 GFS2: Fix direct IO write rounding error
The fsx test in xfstests was failing because it was using direct IO
writes which were using a bad calculation. It was using
loff_t lstart = offset & (PAGE_CACHE_SIZE - 1); when it should be
loff_t lstart = offset & ~(PAGE_CACHE_SIZE - 1);
Thus, the write at offset 0x67e00 was calculating lstart to be
0xe00, the address of our corruption. Instead, it should have been
0x67000. This patch fixes the calculation.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-03-15 10:46:28 -04:00
Arnd Bergmann
67893f12e5 gfs2: avoid uninitialized variable warning
We get a bogus warning about a potential uninitialized variable
use in gfs2, because the compiler does not figure out that we
never use the leaf number if get_leaf_nr() returns an error:

fs/gfs2/dir.c: In function 'get_first_leaf':
fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/gfs2/dir.c: In function 'dir_split_leaf':
fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized]

Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is
sufficient to let gcc understand that this is exactly the same
condition as in IS_ERR() so it can optimize the code path enough
to understand it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-03-15 10:46:11 -04:00
Al Viro
5955102c99 wrappers for ->i_mutex access
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-22 18:04:28 -05:00
Linus Torvalds
5807fcaa9b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:

 - EVM gains support for loading an x509 cert from the kernel
   (EVM_LOAD_X509), into the EVM trusted kernel keyring.

 - Smack implements 'file receive' process-based permission checking for
   sockets, rather than just depending on inode checks.

 - Misc enhancments for TPM & TPM2.

 - Cleanups and bugfixes for SELinux, Keys, and IMA.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (41 commits)
  selinux: Inode label revalidation performance fix
  KEYS: refcount bug fix
  ima: ima_write_policy() limit locking
  IMA: policy can be updated zero times
  selinux: rate-limit netlink message warnings in selinux_nlmsg_perm()
  selinux: export validatetrans decisions
  gfs2: Invalid security labels of inodes when they go invalid
  selinux: Revalidate invalid inode security labels
  security: Add hook to invalidate inode security labels
  selinux: Add accessor functions for inode->i_security
  security: Make inode argument of inode_getsecid non-const
  security: Make inode argument of inode_getsecurity non-const
  selinux: Remove unused variable in selinux_inode_init_security
  keys, trusted: seal with a TPM2 authorization policy
  keys, trusted: select hash algorithm for TPM2 chips
  keys, trusted: fix: *do not* allow duplicate key options
  tpm_ibmvtpm: properly handle interrupted packet receptions
  tpm_tis: Tighten IRQ auto-probing
  tpm_tis: Refactor the interrupt setup
  tpm_tis: Get rid of the duplicate IRQ probing code
  ...
2016-01-17 19:13:15 -08:00
Vladimir Davydov
5d097056c9 kmemcg: account certain kmem allocations to memcg
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-14 16:00:49 -08:00
Bob Peterson
7508abc4bd GFS2: Check if iopen is held when deleting inode
This patch fixes an error condition in which an inode is partially
created in gfs2_create_inode() but then some error is discovered,
which causes it to fail and call iput() before the iopen glock is
created or held. In that case, gfs2_delete_inode would try to
unlock an iopen glock that doesn't yet exist. Therefore, we test
its holder (which must exist) for the HIF_HOLDER bit before trying
to dq it.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2016-01-14 08:47:42 -05:00
Linus Torvalds
4d58967783 GFS2: merge window
Here is a list of patches we've accumulated for GFS2 for the current upstream
 merge window. Last window's set was short, but I warned that this one would
 be bigger, and so it is. We've got 19 patches:
 
 - A patch from Abhi Das to propagate the GFS2_DIF_SYSTEM bit so that newly
   added journals don't get flagged, deleted, and recreated by fsck.gfs2.
 - Two patches from Andreas Gruenbacher to improve GFS2 performance where
   extended attributes are involved.
 - A patch from Andy Price to fix a suspicious rcu dereference error.
 - Two patches from Ben Marzinski that rework how GFS2's NFS cookies are
   managed. This fixes readdir problems with nfs-over-gfs2.
 - A patch from Ben Marzinski that fixes a race in unmounting GFS2.
 - A set of four patches from me to move the resource group reservations
   inside the gfs2 inode to improve performance and fix a bug whereby
   get_write_access improperly prevented some operations like chown.
 - A patch from me to spinlock-protect the setting of system statfs file data.
   This was causing small discrepancies between df and du.
 - A patch from me to reintroduce a timeout while clearing glocks which was
   accidentally dropped some time ago.
 - A patch from me to wait for iopen glock dequeues in order to improve
   deleting of files that were unlinked from a different cluster node.
 - A patch from me to ensure metadata address spaces get truncated when an
   inode is evicted.
 - A patch from me to fix a bug in which a memory leak could occur in some
   error cases when inodes were trying to be created.
 - A patch to consistently use iopen glocks to transition from the unlinked
   state to the deleted state.
 - A patch to fix a glock reference count error when inode creation fails.
 - A patch from Junxiao Bi to fix an flock panic.
 - A patch from Markus Elfring that removes an unnecessary if.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWlUOCAAoJENeLYdPf93o78/IH/0PfoQtQnYQpvbTPgaFICEUI
 rx+uLqPxpJgjJMsbIy+VA/bhZsbuENDbor99Cs6GiTLy3Q/4TKNY9NxDN+aO8o+Q
 qrp3ZANkyTGaneCzrXzfgmOxe1G8xRQ7pboRUEt2yPlGK1oLax+PR4uPb+IuJ5Wa
 QvSwnZj+9K2LkMQkKPyxoltjZiZLywvNB5dx0F35+dvriQxHXfJ1Naek6gs140MB
 SfG4/Zi8dfGOJQjSQTUENatFhyFB0V7CbDPFuBVzTGD8IS4ASgqIBMFjO0VBHkP4
 gt4zb0BJAn0aQcbxIaDtsuEffpn+5zdK/Onq5g4Fr9ZJv3fkVpVcOgYui/rkDD0=
 =28ev
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "Here is a list of patches we've accumulated for GFS2 for the current
  upstream merge window.  Last window's set was short, but I warned that
  this one would be bigger, and so it is.  We've got 19 patches:

   - A patch from Abhi Das to propagate the GFS2_DIF_SYSTEM bit so that
     newly added journals don't get flagged, deleted, and recreated by
     fsck.gfs2.

   - Two patches from Andreas Gruenbacher to improve GFS2 performance
     where extended attributes are involved.

   - A patch from Andy Price to fix a suspicious rcu dereference error.

   - Two patches from Ben Marzinski that rework how GFS2's NFS cookies
     are managed.  This fixes readdir problems with nfs-over-gfs2.

   - A patch from Ben Marzinski that fixes a race in unmounting GFS2.

   - A set of four patches from me to move the resource group
     reservations inside the gfs2 inode to improve performance and fix a
     bug whereby get_write_access improperly prevented some operations
     like chown.

   - A patch from me to spinlock-protect the setting of system statfs
     file data.  This was causing small discrepancies between df and du.

   - A patch from me to reintroduce a timeout while clearing glocks
     which was accidentally dropped some time ago.

   - A patch from me to wait for iopen glock dequeues in order to
     improve deleting of files that were unlinked from a different
     cluster node.

   - A patch from me to ensure metadata address spaces get truncated
     when an inode is evicted.

   - A patch from me to fix a bug in which a memory leak could occur in
     some error cases when inodes were trying to be created.

   - A patch to consistently use iopen glocks to transition from the
     unlinked state to the deleted state.

   - A patch to fix a glock reference count error when inode creation
     fails.

   - A patch from Junxiao Bi to fix an flock panic.

   - A patch from Markus Elfring that removes an unnecessary if"

* tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: fix flock panic issue
  GFS2: Don't do glock put on when inode creation fails
  GFS2: Always use iopen glock for gl_deletes
  GFS2: Release iopen glock in gfs2_create_inode error cases
  GFS2: Truncate address space mapping when deleting an inode
  GFS2: Wait for iopen glock dequeues
  gfs2: clear journal live bit in	gfs2_log_flush
  gfs2: change gfs2 readdir cookie
  gfs2: keep offset when splitting dir leaf blocks
  GFS2: Reintroduce a timeout in function gfs2_gl_hash_clear
  GFS2: Update master statfs buffer with sd_statfs_spin locked
  GFS2: Reduce size of incore inode
  GFS2: Make rgrp reservations part of the gfs2_inode structure
  GFS2: Extract quota data from reservations structure (revert 5407e24)
  gfs2: Extended attribute readahead optimization
  gfs2: Extended attribute readahead
  GFS2: Use rht_for_each_entry_rcu in glock_hash_walk
  GFS2: Delete an unnecessary check before the function call "iput"
  gfs2: Automatically set GFS2_DIF_SYSTEM flag on system files
2016-01-12 18:09:35 -08:00
Linus Torvalds
33caf82acf Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "All kinds of stuff.  That probably should've been 5 or 6 separate
  branches, but by the time I'd realized how large and mixed that bag
  had become it had been too close to -final to play with rebasing.

  Some fs/namei.c cleanups there, memdup_user_nul() introduction and
  switching open-coded instances, burying long-dead code, whack-a-mole
  of various kinds, several new helpers for ->llseek(), assorted
  cleanups and fixes from various people, etc.

  One piece probably deserves special mention - Neil's
  lookup_one_len_unlocked().  Similar to lookup_one_len(), but gets
  called without ->i_mutex and tries to avoid ever taking it.  That, of
  course, means that it's not useful for any directory modifications,
  but things like getting inode attributes in nfds readdirplus are fine
  with that.  I really should've asked for moratorium on lookup-related
  changes this cycle, but since I hadn't done that early enough...  I
  *am* asking for that for the coming cycle, though - I'm going to try
  and get conversion of i_mutex to rwsem with ->lookup() done under lock
  taken shared.

  There will be a patch closer to the end of the window, along the lines
  of the one Linus had posted last May - mechanical conversion of
  ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
  inode_is_locked()/inode_lock_nested().  To quote Linus back then:

    -----
    |    This is an automated patch using
    |
    |        sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    |        sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    |        sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[     ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    |        sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    |        sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    |    with a very few manual fixups
    -----

  I'm going to send that once the ->i_mutex-affecting stuff in -next
  gets mostly merged (or when Linus says he's about to stop taking
  merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
  nfsd: don't hold i_mutex over userspace upcalls
  fs:affs:Replace time_t with time64_t
  fs/9p: use fscache mutex rather than spinlock
  proc: add a reschedule point in proc_readfd_common()
  logfs: constify logfs_block_ops structures
  fcntl: allow to set O_DIRECT flag on pipe
  fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
  fs: xattr: Use kvfree()
  [s390] page_to_phys() always returns a multiple of PAGE_SIZE
  nbd: use ->compat_ioctl()
  fs: use block_device name vsprintf helper
  lib/vsprintf: add %*pg format specifier
  fs: use gendisk->disk_name where possible
  poll: plug an unused argument to do_poll
  amdkfd: don't open-code memdup_user()
  cdrom: don't open-code memdup_user()
  rsxx: don't open-code memdup_user()
  mtip32xx: don't open-code memdup_user()
  [um] mconsole: don't open-code memdup_user_nul()
  [um] hostaudio: don't open-code memdup_user()
  ...
2016-01-12 17:11:47 -08:00
Linus Torvalds
ddf1d6238d Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs xattr updates from Al Viro:
 "Andreas' xattr cleanup series.

  It's a followup to his xattr work that went in last cycle; -0.5KLoC"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  xattr handlers: Simplify list operation
  ocfs2: Replace list xattr handler operations
  nfs: Move call to security_inode_listsecurity into nfs_listxattr
  xfs: Change how listxattr generates synthetic attributes
  tmpfs: listxattr should include POSIX ACL xattrs
  tmpfs: Use xattr handler infrastructure
  btrfs: Use xattr handler infrastructure
  vfs: Distinguish between full xattr names and proper prefixes
  posix acls: Remove duplicate xattr name definitions
  gfs2: Remove gfs2_xattr_acl_chmod
  vfs: Remove vfs_xattr_cmp
2016-01-11 13:32:10 -08:00
Dmitry Monakhov
a1c6f05733 fs: use block_device name vsprintf helper
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-06 13:03:18 -05:00
Al Viro
fceef393a5 switch ->get_link() to delayed_call, kill ->put_link()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-12-30 13:01:03 -05:00
Andreas Gruenbacher
f39814f60a gfs2: Invalid security labels of inodes when they go invalid
When gfs2 releases the glock of an inode, it must invalidate all
information cached for that inode, including the page cache and acls.
Use the new security_inode_invalidate_secctx hook to also invalidate
security labels in that case.  These items will be reread from disk
when needed after reacquiring the glock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Acked-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
[PM: fixed spelling errors and description line lengths]
Signed-off-by: Paul Moore <pmoore@redhat.com>
2015-12-24 11:09:40 -05:00
Junxiao Bi
a93a998382 gfs2: fix flock panic issue
Commit 4f6563677a ("Move locks API users to locks_lock_inode_wait()")
moved flock/posix lock identify code to locks_lock_inode_wait(), but
missed to set fl_flags to FL_FLOCK which will cause kernel panic in
locks_lock_inode_wait().

Fixes: 4f6563677a ("Move locks API users to locks_lock_inode_wait()")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-22 08:06:08 -06:00
Bob Peterson
6cc4b6e801 GFS2: Don't do glock put on when inode creation fails
Currently the error path of function gfs2_inode_lookup calls function
gfs2_glock_put corresponding to an earlier call to gfs2_glock_get for
the inode glock. That's wrong because the error path also calls
iget_failed() which eventually calls iput, which eventually calls
gfs2_evict_inode, which does another gfs2_glock_put. This double-put
can cause the glock reference count to get off.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-18 11:04:46 -06:00
Bob Peterson
5ea31bc0a6 GFS2: Always use iopen glock for gl_deletes
Before this patch, when function try_rgrp_unlink queued a glock for
delete_work to reclaim the space, it used the inode glock to do so.
That's different from the iopen callback which uses the iopen glock
for the same purpose. We should be consistent and always use the
iopen glock. This may also save us reference counting problems with
the inode glock, since clear_glock does an extra glock_put() for the
inode glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-18 11:02:52 -06:00
Bob Peterson
783013c0f5 GFS2: Release iopen glock in gfs2_create_inode error cases
Some error cases in gfs2_create_inode were not unlocking the iopen
glock, getting the reference count off. This adds the proper unlock.
The error logic in function gfs2_create_inode was also convoluted,
so this patch simplifies it. It also takes care of a bug in
which gfs2_qa_delete() was not called in an error case.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-18 10:57:21 -06:00