Commit Graph

258 Commits

Author SHA1 Message Date
Andreas Gruenbacher
8d3e72a180 iomap: don't mark the inode dirty in iomap_write_end
Marking the inode dirty for each page copied into the page cache can be
very inefficient for file systems that use the VFS dirty inode tracking,
and is completely pointless for those that don't use the VFS dirty inode
tracking.  So instead, only set an iomap flag when changing the in-core
inode size, and open code the rest of __generic_write_end.

Partially based on code from Christoph Hellwig.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-27 17:28:40 -07:00
Bob Peterson
f29e62eed2 gfs2: replace more printk with calls to fs_info and friends
This patch replaces a few leftover printk errors with calls to
fs_info and similar, so that the file system having the error is
properly logged.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:30:27 +02:00
Linus Torvalds
4066524401 Fix rounding error in gfs2_iomap_page_prepare
-----BEGIN PGP SIGNATURE-----
 
 iQIbBAABAgAGBQJdA9AdAAoJENW/n+sDE2U67xgP+PnBqsdJpFVfodQyu+AEHdxJ
 et2nZmcsQGRXj5iha7lWI6Mu0x/7q9FOCS1I0aZDwcHn1A0R5R/xulKcnl6ySjrp
 bd6GnehP1KicufykAPr5lGHmiIIp/SdIDe7z6XUkV1Eksl6ls8WTWi+JEsP7uDjw
 TGQ6GTOziRUKHshuEkvvYfA3OhbebNzP1FjgPr6keg7PFeLAaiUWKxlTYj9Vov8F
 4YnOsdfFEFpiJy8w5N/Ggfmp1fbY+uyzoS70Xe+y7XTdkX57fPnzlCxLsji7PgAp
 AN38cpUHib46Ng8YyGfdB2wzJ5BAauYLy7zYGLiNdHXFvwkm4bP2CyeALyhG/Jg7
 HOBJ1owekaBV/o96NgyuJTfv65zdV0su7s24YBVJlY+aUOkZWJb2N4quyv/WyiTY
 Ds1xR4Uql6WOBt6fb4Ady+4lD0j01k5YMk56fcmHlykldVYXXeJXksS/0MxKBR5T
 qGWPHpldVXK6TZhRo8SkS/H2h6CdqevwxECa2YMrnWFVaioTo5jBtO6l5Nn7z0Ni
 qaELyzFdhDlnq8ywvmo1zBPT2NjF4m0XQkEiy5P1RvkxSq500zvOEVVCx8vXICY7
 TwUpCSRP5Tu9P9LurvgTf/Pltsf0pApKyYRIC4hu8/6jNxum1Rq0JZW2JepNDSJH
 qnQPowGLnhAJDWIfIPU=
 =Z7P5
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:
 "Fix rounding error in gfs2_iomap_page_prepare"

* tag 'gfs2-v5.2.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix rounding error in gfs2_iomap_page_prepare
2019-06-14 17:27:12 -10:00
Andreas Gruenbacher
2741b6723b gfs2: Fix rounding error in gfs2_iomap_page_prepare
The pos and len arguments to the iomap page_prepare callback are not
block aligned, so we need to take that into account when computing the
number of blocks.

Fixes: d0a22a4b03 ("gfs2: Fix iomap write page reclaim deadlock")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-14 18:49:07 +02:00
Thomas Gleixner
7336d0e654 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 398
Based on 1 normalized pattern(s):

  this copyrighted material is made available to anyone wishing to use
  modify copy or redistribute it subject to the terms and conditions
  of the gnu general public license version 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 44 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531081038.653000175@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:12 +02:00
Andreas Gruenbacher
d0a22a4b03 gfs2: Fix iomap write page reclaim deadlock
Since commit 64bc06bb32 ("gfs2: iomap buffered write support"), gfs2 is doing
buffered writes by starting a transaction in iomap_begin, writing a range of
pages, and ending that transaction in iomap_end.  This approach suffers from
two problems:

  (1) Any allocations necessary for the write are done in iomap_begin, so when
  the data aren't journaled, there is no need for keeping the transaction open
  until iomap_end.

  (2) Transactions keep the gfs2 log flush lock held.  When
  iomap_file_buffered_write calls balance_dirty_pages, this can end up calling
  gfs2_write_inode, which will try to flush the log.  This requires taking the
  log flush lock which is already held, resulting in a deadlock.

Fix both of these issues by not keeping transactions open from iomap_begin to
iomap_end.  Instead, start a small transaction in page_prepare and end it in
page_done when necessary.

Reported-by: Edwin Török <edvin.torok@citrix.com>
Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-05-07 23:39:15 +02:00
Andreas Gruenbacher
fbb27873f2 gfs2: Rename gfs2_trans_{add_unrevoke => remove_revoke}
Rename gfs2_trans_add_unrevoke to gfs2_trans_remove_revoke: there is no
such thing as an "unrevoke" object; all this function does is remove
existing revoke objects plus some bookkeeping.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:14 +02:00
Bob Peterson
7c70b89695 gfs2: clean_journal improperly set sd_log_flush_head
This patch fixes regressions in 588bff95c9.
Due to that patch, function clean_journal was setting the value of
sd_log_flush_head, but that's only valid if it is replaying the node's
own journal. If it's replaying another node's journal, that's completely
wrong and will lead to multiple problems. This patch tries to clean up
the mess by passing the value of the logical journal block number into
gfs2_write_log_header so the function can treat non-owned journals
generically. For the local journal, the journal extent map is used for
best performance. For other nodes from other journals, new function
gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get.

This patch also tries to establish more consistency when passing journal
block parameters by changing several unsigned int types to a consistent
u32.

Fixes: 588bff95c9 ("GFS2: Reduce code redundancy writing log headers")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:04 +02:00
Linus Torvalds
b4b52b881c Wimplicit-fallthrough patches for 5.2-rc1
Hi Linus,
 
 This is my very first pull-request.  I've been working full-time as
 a kernel developer for more than two years now. During this time I've
 been fixing bugs reported by Coverity all over the tree and, as part
 of my work, I'm also contributing to the KSPP. My work in the kernel
 community has been supervised by Greg KH and Kees Cook.
 
 OK. So, after the quick introduction above, please, pull the following
 patches that mark switch cases where we are expecting to fall through.
 These patches are part of the ongoing efforts to enable -Wimplicit-fallthrough.
 They have been ignored for a long time (most of them more than 3 months,
 even after pinging multiple times), which is the reason why I've created
 this tree. Most of them have been baking in linux-next for a whole development
 cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails
 going out for newly introduced code that triggers -Wimplicit-fallthrough
 to avoid gaining more of these cases while we work to remove the ones
 that are already present.
 
 I'm happy to let you know that we are getting close to completing this
 work.  Currently, there are only 32 of 2311 of these cases left to be
 addressed in linux-next.  I'm auditing every case; I take a look into
 the code and analyze it in order to determine if I'm dealing with an
 actual bug or a false positive, as explained here:
 
 https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/
 
 While working on this, I've found and fixed the following missing
 break/return bugs, some of them introduced more than 5 years ago:
 
 84242b82d8
 7850b51b6c
 5e420fe635
 09186e5034
 b5be853181
 7264235ee7
 cc5034a5d2
 479826cc86
 5340f23df8
 df997abeeb
 2f10d82373
 307b00c5e6
 5d25ff7a54
 a7ed5b3e7d
 c24bfa8f21
 ad0eaee619
 9ba8376ce1
 dc586a60a1
 a8e9b186f1
 4e57562b48
 60747828ea
 c5b974bee9
 cc44ba9116
 2c930e3d0a
 
 Once this work is finish, we'll be able to universally enable
 "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
 entering the kernel again.
 
 Thanks
 
 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAlzQR2IACgkQRwW0y0cG
 2zEJbQ//X930OcBtT/9DRW4XL1Jeq0Mjssz/GLX2Vpup5CwwcTROG65no80Zezf/
 yQRWnUjGX0OBv/fmUK32/nTxI/7k7NkmIXJHe0HiEF069GEENB7FT6tfDzIPjU8M
 qQkB8NsSUWJs3IH6BVynb/9MGE1VpGBDbYk7CBZRtRJT1RMM+3kQPucgiZMgUBPo
 Yd9zKwn4i/8tcOCli++EUdQ29ukMoY2R3qpK4LftdX9sXLKZBWNwQbiCwSkjnvJK
 I6FDiA7RaWH2wWGlL7BpN5RrvAXp3z8QN/JZnivIGt4ijtAyxFUL/9KOEgQpBQN2
 6TBRhfTQFM73NCyzLgGLNzvd8awem1rKGSBBUvevaPbgesgM+Of65wmmTQRhFNCt
 A7+e286X1GiK3aNcjUKrByKWm7x590EWmDzmpmICxNPdt5DHQ6EkmvBdNjnxCMrO
 aGA24l78tBN09qN45LR7wtHYuuyR0Jt9bCmeQZmz7+x3ICDHi/+Gw7XPN/eM9+T6
 lZbbINiYUyZVxOqwzkYDCsdv9+kUvu3e4rPs20NERWRpV8FEvBIyMjXAg6NAMTue
 K+ikkyMBxCvyw+NMimHJwtD7ho4FkLPcoeXb2ZGJTRHixiZAEtF1RaQ7dA05Q/SL
 gbSc0DgLZeHlLBT+BSWC2Z8SDnoIhQFXW49OmuACwCUC68NHKps=
 =k30z
 -----END PGP SIGNATURE-----

Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva:
 "Mark switch cases where we are expecting to fall through.

  This is part of the ongoing efforts to enable -Wimplicit-fallthrough.

  Most of them have been baking in linux-next for a whole development
  cycle. And with Stephen Rothwell's help, we've had linux-next
  nag-emails going out for newly introduced code that triggers
  -Wimplicit-fallthrough to avoid gaining more of these cases while we
  work to remove the ones that are already present.

  We are getting close to completing this work. Currently, there are
  only 32 of 2311 of these cases left to be addressed in linux-next. I'm
  auditing every case; I take a look into the code and analyze it in
  order to determine if I'm dealing with an actual bug or a false
  positive, as explained here:

      https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/

  While working on this, I've found and fixed the several missing
  break/return bugs, some of them introduced more than 5 years ago.

  Once this work is finished, we'll be able to universally enable
  "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
  entering the kernel again"

* tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
  memstick: mark expected switch fall-throughs
  drm/nouveau/nvkm: mark expected switch fall-throughs
  NFC: st21nfca: Fix fall-through warnings
  NFC: pn533: mark expected switch fall-throughs
  block: Mark expected switch fall-throughs
  ASN.1: mark expected switch fall-through
  lib/cmdline.c: mark expected switch fall-throughs
  lib: zstd: Mark expected switch fall-throughs
  scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through
  scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs
  scsi: ppa: mark expected switch fall-through
  scsi: osst: mark expected switch fall-throughs
  scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_nvme: Mark expected switch fall-through
  scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through
  scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_els: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs
  scsi: imm: mark expected switch fall-throughs
  scsi: csiostor: csio_wr: mark expected switch fall-through
  ...
2019-05-07 12:48:10 -07:00
Andreas Gruenbacher
df0db3ecdb iomap: Add a page_prepare callback
Move the page_done callback into a separate iomap_page_ops structure and
add a page_prepare calback to be called before the next page is written
to.  In gfs2, we'll want to start a transaction in page_prepare and end
it in page_done.  Other filesystems that implement data journaling will
require the same kind of mechanism.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-01 07:47:37 -07:00
Gustavo A. R. Silva
0a4c92657f fs: mark expected switch fall-throughs
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

This patch fixes the following warnings:

fs/affs/affs.h:124:38: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1692:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/configfs/dir.c:1694:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ceph/file.c:249:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:233:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/hash.c:246:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1237:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext2/inode.c:1244:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1182:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1188:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1432:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ext4/indirect.c:1440:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:618:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/f2fs/node.c:620:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/btrfs/ref-verify.c:522:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:711:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/gfs2/bmap.c:722:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/jffs2/fs.c:339:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/nfsd/nfs4proc.c:429:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:62:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/ufs/util.h:43:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/fcntl.c:770:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/seq_file.c:319:10: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:148:11: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/libfs.c:150:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/signalfd.c:178:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
fs/locks.c:1473:16: warning: this statement may fall through [-Wimplicit-fallthrough=]

Warning level 3 was used: -Wimplicit-fallthrough=3

This patch is part of the ongoing efforts to enabling
-Wimplicit-fallthrough.

Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-04-08 18:21:02 -05:00
Bob Peterson
bc0205612b gfs2: take jdata unstuff into account in do_grow
Before this patch, function do_grow would not reserve enough journal
blocks in the transaction to unstuff jdata files while growing them.
This patch adds the logic to add one more block if the file to grow
is jdata.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-18 10:49:02 -06:00
Abhi Das
98583b3e87 gfs2: add more timing info to journal recovery process
Tells you how many milliseconds map_journal_extents and find_jhead
take.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-12-11 17:50:36 +01:00
Linus Torvalds
e6a2562fe2 Merge tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull bfs2 fixes from Andreas Gruenbacher:
 "Fix two bugs leading to leaked buffer head references:

   - gfs2: Put bitmap buffers in put_super
   - gfs2: Fix iomap buffer head reference counting bug

  And one bug leading to significant slow-downs when deleting large
  files:

   - gfs2: Fix metadata read-ahead during truncate (2)"

* tag 'gfs2-4.20.fixes3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix iomap buffer head reference counting bug
  gfs2: Fix metadata read-ahead during truncate (2)
  gfs2: Put bitmap buffers in put_super
2018-11-16 11:38:14 -06:00
Andreas Gruenbacher
c26b5aa8ef gfs2: Fix iomap buffer head reference counting bug
GFS2 passes the inode buffer head (dibh) from gfs2_iomap_begin to
gfs2_iomap_end in iomap->private.  It sets that private pointer in
gfs2_iomap_get.  Users of gfs2_iomap_get other than gfs2_iomap_begin
would have to release iomap->private, but this isn't done correctly,
leading to a leak of buffer head references.

To fix this, move the code for setting iomap->private from
gfs2_iomap_get to gfs2_iomap_begin.

Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Cc: stable@vger.kernel.org # v4.19+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-11-16 11:35:09 -06:00
Andreas Gruenbacher
e7445ceddf gfs2: Fix metadata read-ahead during truncate (2)
The previous attempt to fix for metadata read-ahead during truncate was
incorrect: for files with a height > 2 (1006989312 bytes with a block
size of 4096 bytes), read-ahead requests were not being issued for some
of the indirect blocks discovered while walking the metadata tree,
leading to significant slow-downs when deleting large files.  Fix that.

In addition, only issue read-ahead requests in the first pass through
the meta-data tree, while deallocating data blocks.

Fixes: c3ce5aa9b0 ("gfs2: Fix metadata read-ahead during truncate")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-11-09 10:55:33 +00:00
Linus Torvalds
bfd93a87ea We've got 18 patches for this merge window, none of which are very major.
1. Andreas Gruenbacher contributed several patches to clean up the gfs2
    block allocator to prepare for future performance enhancements.
 2. Andy Price contributed a patch to fix a use-after-free problem.
 3. I contributed some patches that fix gfs2's broken rgrplvb mount option.
 4. I contributed some cleanup patches and error message improvements.
 5. Steve Whitehouse and Abhi Das sent a patch to enable getlabel support.
 6. Tim Smith contributed a patch to flush the glock delete workqueue at exit.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbz3QlAAoJENeLYdPf93o7+VUH/0XUfbyNQsU6fyLP8NrZq05z
 qNsVN3Hm+tPc0/V0C75lSp9ej7B8ogMl0RPysniziRTEWIDK6oGB/JUGvIH0O/z1
 vZ/sofZEDXthV3YjiI8RXLcaJLsOavSXnGwHbNKohM2PdRObVkZbaUL+xWlL9X3q
 yHgP5AHCIrpVzz5l4sLO6N0Npnl0aNRTBxPIyDTaBBmitXkvtqkCbkw185jzkDDs
 fMvZ6I+3UbUxp99InFTHeUXvTr1EbvfPrhZzmppuV1N4LLSa1eRaWmsKTEDPdnsy
 uhsh9ittv8EJXN2dpmZIOdGmDEK07kFoZrsbM5F78sOH/LbUyJ5YfBN02lDaaAI=
 =b7zk
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.20.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 updates from Bob Peterson:
 "We've got 18 patches for this merge window, none of which are very
  major:

   - clean up the gfs2 block allocator to prepare for future performance
     enhancements (Andreas Gruenbacher)

   - fix a use-after-free problem (Andy Price)

   - patches that fix gfs2's broken rgrplvb mount option (me)

   - cleanup patches and error message improvements (me)

   - enable getlabel support (Steve Whitehouse and Abhi Das)

   - flush the glock delete workqueue at exit (Tim Smith)"

* tag 'gfs2-4.20.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Fix minor typo: couln't versus couldn't.
  gfs2: write revokes should traverse sd_ail1_list in reverse
  gfs2: Pass resource group to rgblk_free
  gfs2: Remove unnecessary gfs2_rlist_alloc parameter
  gfs2: Fix marking bitmaps non-full
  gfs2: Fix some minor typos
  gfs2: Rename bitmap.bi_{len => bytes}
  gfs2: Remove unused RGRP_RSRV_MINBYTES definition
  gfs2: Move rs_{sizehint, rgd_gh} fields into the inode
  gfs2: Clean up out-of-bounds check in gfs2_rbm_from_block
  gfs2: Always check the result of gfs2_rbm_from_block
  gfs2: getlabel support
  GFS2: Flush the GFS2 delete workqueue before stopping the kernel threads
  gfs2: Don't leave s_fs_info pointing to freed memory in init_sbd
  gfs2: Use fs_* functions instead of pr_* function where we can
  gfs2: slow the deluge of io error messages
  gfs2: Don't set GFS2_RDF_UPTODATE when the lvb is updated
  gfs2: improve debug information when lvb mismatches are found
2018-10-24 17:30:39 +01:00
Andreas Gruenbacher
fee5150c48 gfs2: Fix iomap buffered write support for journaled files (2)
It turns out that the fix in commit 6636c3cc56 is bad; the assertion
that the iomap code no longer creates buffer heads is incorrect for
filesystems that set the IOMAP_F_BUFFER_HEAD flag.

Instead, what's happening is that gfs2_iomap_begin_write treats all
files that have the jdata flag set as journaled files, which is
incorrect as long as those files are inline ("stuffed").  We're handling
stuffed files directly via the page cache, which is why we ended up with
pages without buffer heads in gfs2_page_add_databufs.

Fix this by handling stuffed journaled files correctly in
gfs2_iomap_begin_write.

This reverts commit 6636c3cc5690c11631e6366cf9a28fb99c8b25bb.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-12 17:14:42 +02:00
Andreas Gruenbacher
0ddeded4ae gfs2: Pass resource group to rgblk_free
Function rgblk_free can only deal with one resource group at a time, so
pass that resource group is as a parameter.  Several of the callers
already have the resource group at hand, so we only need additional
lookup code in a few places.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Reviewed-by: Steven Whitehouse <swhiteho@redhat.com>
2018-10-12 07:33:07 -05:00
Andreas Gruenbacher
dc480feb45 gfs2: Fix iomap buffered write support for journaled files
Commit 64bc06bb32 broke buffered writes to journaled files (chattr
+j): we'll try to journal the buffer heads of the page being written to
in gfs2_iomap_journaled_page_done.  However, the iomap code no longer
creates buffer heads, so we'll BUG() in gfs2_page_add_databufs.  Fix
that by creating buffer heads ourself when needed.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-10-09 18:20:13 +02:00
Andreas Gruenbacher
776125785a gfs2: Special-case rindex for gfs2_grow
To speed up the common case of appending to a file,
gfs2_write_alloc_required presumes that writing beyond the end of a file
will always require additional blocks to be allocated.  This assumption
is incorrect for preallocates files, but there are no negative
consequences as long as *some* space is still left on the filesystem.

One special file that always has some space preallocated beyond the end
of the file is the rindex: when growing a filesystem, gfs2_grow adds one
or more new resource groups and appends records describing those
resource groups to the rindex; the preallocated space ensures that this
is always possible.

However, when a filesystem is completely full, gfs2_write_alloc_required
will indicate that an additional allocation is required, and appending
the next record to the rindex will fail even though space for that
record has already been preallocated.  To fix that, skip the incorrect
optimization in gfs2_write_alloc_required, but for the rindex only.
Other writes to preallocated space beyond the end of the file are still
allowed to fail on completely full filesystems.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-25 22:56:14 +02:00
Andreas Gruenbacher
0ed91eca11 Merge branch 'iomap-4.19-merge' into linux-gfs2/for-next
Merge xfs branch 'iomap-4.19-merge' into linux-gfs2/for-next.  This
brings in readpage and direct I/O support for inline data.

The IOMAP_F_BUFFER_HEAD flag introduced in commit "iomap: add initial
support for writes without buffer heads" needs to be set for gfs2 as
well, so do that in the merge.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-25 00:08:20 +02:00
Andreas Gruenbacher
a3479c7fc0 Merge branch 'iomap-write' into linux-gfs2/for-next
Pull in the gfs2 iomap-write changes: Tweak the existing code to
properly support iomap write and eliminate an unnecessary special case
in gfs2_block_map.  Implement iomap write support for buffered and
direct I/O.  Simplify some of the existing code and eliminate code that
is no longer used:

  gfs2: Remove gfs2_write_{begin,end}
  gfs2: iomap direct I/O support
  gfs2: gfs2_extent_length cleanup
  gfs2: iomap buffered write support
  gfs2: Further iomap cleanups

This is based on the following changes on the xfs 'iomap-4.19-merge'
branch:

  iomap: add private pointer to struct iomap
  iomap: add a page_done callback
  iomap: generic inline data handling
  iomap: complete partial direct I/O writes synchronously
  iomap: mark newly allocated buffer heads as new
  fs: factor out a __generic_write_end helper

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2018-07-24 20:02:40 +02:00
Andreas Gruenbacher
967bcc91b0 gfs2: iomap direct I/O support
The page unmapping previously done in gfs2_direct_IO is now done
generically in iomap_dio_rw.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:27:32 +01:00
Andreas Gruenbacher
bcfe94139a gfs2: gfs2_extent_length cleanup
Now that gfs2_extent_length is no longer used for determining the size
of a hole and always with an upper size limit, the function can be
simplified.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:27:24 +01:00
Andreas Gruenbacher
64bc06bb32 gfs2: iomap buffered write support
With the traditional page-based writes, blocks are allocated separately
for each page written to.  With iomap writes, we can allocate a lot more
blocks at once, with a fraction of the allocation overhead for each
page.

Split calculating the number of blocks that can be allocated at a given
position (gfs2_alloc_size) off from gfs2_iomap_alloc: that size
determines the number of blocks to allocate and reserve in the journal.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:27:17 +01:00
Andreas Gruenbacher
d505a96a3b gfs2: Further iomap cleanups
In gfs2_iomap_alloc, set the type of newly allocated extents to
IOMAP_MAPPED so that iomap_to_bh will set the bh states correctly:
otherwise, the bhs would not be marked as mapped, confusing
__mpage_writepage.  This means that we need to check for the IOMAP_F_NEW
flag in fallocate_chunk now.

Further clean up gfs2_iomap_get and implement gfs2_stuffed_iomap here
directly.  For reads beyond the end of the file, return holes instead of
failing with -ENOENT so that we can get rid of that special case in
gfs2_block_map.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2018-07-02 16:26:01 +01:00
Andreas Gruenbacher
00251a16d7 gfs2: Minor clarification to __gfs2_punch_hole
Rename end_off to end_len to make the code less confusing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-21 07:40:00 -05:00
Linus Torvalds
6567af78ac Changes for 4.18:
- Strengthen inode number and structure validation when allocating inodes.
 - Reduce pointless buffer allocations during cache miss
 - Use FUA for pure data O_DSYNC directio writes
 - Various iomap refactorings
 - Strengthen quota metadata verification to avoid unfixable broken quota
 - Make AGFL block freeing a deferred operation to avoid blowing out
   transaction reservations when running complex operations
 - Get rid of the log item descriptors to reduce log overhead
 - Fix various reflink bugs where inodes were double-joined to
   transactions
 - Don't issue discards when trimming unwritten extents
 - Refactor incore dquot initialization and retrieval interfaces
 - Fix some locking problmes in the quota scrub code
 - Strengthen btree structure checks in scrub code
 - Rewrite swapfile activation to use iomap and support unwritten extents
 - Make scrub exit to userspace sooner when corruptions or
   cross-referencing problems are found
 - Make scrub invoke the data fork scrubber directly on metadata inodes
 - Don't do background reclamation of post-eof and cow blocks when the fs
   is suspended
 - Fix secondary superblock buffer lifespan hinting
 - Refactor growfs to use table-dispatched functions instead of long
   stringy functions
 - Move growfs code to libxfs
 - Implement online fs label getting and setting
 - Introduce online filesystem repair (in a very limited capacity)
 - Fix unit conversion problems in the realtime freemap iteration
   functions
 - Various refactorings and cleanups in preparation to remove buffer
   heads in a future release
 - Reimplement the old bmap call with iomap
 - Remove direct buffer head accesses from seek hole/data
 - Various bug fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlsR9dEACgkQ+H93GTRK
 tOv0dw//cBwRgY4jhC6b9oMk2DNRWUiTt1F2yoqr28661GPo124iXAMLIwJe1DiV
 W/qpN3HUz7P46xKOVY+MXaj0JIDFxJ8c5tHAQMH/TkDc49S+mkcGyaoPJ39hnc6u
 yikG+Hq4m0YWhHaeUhKTe8pnhXBaziz5A2NtKtwh6lPOIW+Wds51T77DJnViqADq
 tZzmAq8fS9/ELpxe0Th/2D7iTWCr2c3FLsW2KgbbNvQ4e34zVE1ix1eBtEzQE+Mm
 GUjdQhYVS1oCzqZfCxJkzR4R/1TAFyS0FXOW7PHo8FAX/kas9aQbRlnHSAQ/08EE
 8Z2p3GsFip7dgmd6O6nAmFAStW6GRvgyycJ7Y+Y0IsJj6aDp9OxhRExyF+uocJR9
 b9ChOH6PMEtRB/RRlBg66pbS61abvNGutzl61ZQZGBHEvL3VqDcd68IomdD5bNSB
 pXo6mOJIcKuXsghZszsHAV9uuMe4zQAMbLy7QH6V8LyWeSAG9hTXOT9EA4MWktEJ
 SCQFf7RRPgU5pEAgOS8LgKrawqnBaqFcFvkvWsQhyiltTFz29cwxH7tjSXYMAOFE
 W+RMp8kbkPnGOaJJeKxT+/RGRB534URk0jIEKtRb679xkEF3HE58exXEVrnojJq6
 0m712+EYuZSYhFBwrvEnQjNHr0x2r/A/iBJZ6HhyV0aO1RWm4n4=
 =11pr
 -----END PGP SIGNATURE-----

Merge tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "New features this cycle include the ability to relabel mounted
  filesystems, support for fallocated swapfiles, and using FUA for pure
  data O_DSYNC directio writes. With this cycle we begin to integrate
  online filesystem repair and refactor the growfs code in preparation
  for eventual subvolume support, though the road ahead for both
  features is quite long.

  There are also numerous refactorings of the iomap code to remove
  unnecessary log overhead, to disentangle some of the quota code, and
  to prepare for buffer head removal in a future upstream kernel.

  Metadata validation continues to improve, both in the hot path
  veifiers and the online filesystem check code. I anticipate sending a
  second pull request in a few days with more metadata validation
  improvements.

  This series has been run through a full xfstests run over the weekend
  and through a quick xfstests run against this morning's master, with
  no major failures reported.

  Summary:

   - Strengthen inode number and structure validation when allocating
     inodes.

   - Reduce pointless buffer allocations during cache miss

   - Use FUA for pure data O_DSYNC directio writes

   - Various iomap refactorings

   - Strengthen quota metadata verification to avoid unfixable broken
     quota

   - Make AGFL block freeing a deferred operation to avoid blowing out
     transaction reservations when running complex operations

   - Get rid of the log item descriptors to reduce log overhead

   - Fix various reflink bugs where inodes were double-joined to
     transactions

   - Don't issue discards when trimming unwritten extents

   - Refactor incore dquot initialization and retrieval interfaces

   - Fix some locking problmes in the quota scrub code

   - Strengthen btree structure checks in scrub code

   - Rewrite swapfile activation to use iomap and support unwritten
     extents

   - Make scrub exit to userspace sooner when corruptions or
     cross-referencing problems are found

   - Make scrub invoke the data fork scrubber directly on metadata
     inodes

   - Don't do background reclamation of post-eof and cow blocks when the
     fs is suspended

   - Fix secondary superblock buffer lifespan hinting

   - Refactor growfs to use table-dispatched functions instead of long
     stringy functions

   - Move growfs code to libxfs

   - Implement online fs label getting and setting

   - Introduce online filesystem repair (in a very limited capacity)

   - Fix unit conversion problems in the realtime freemap iteration
     functions

   - Various refactorings and cleanups in preparation to remove buffer
     heads in a future release

   - Reimplement the old bmap call with iomap

   - Remove direct buffer head accesses from seek hole/data

   - Various bug fixes"

* tag 'xfs-4.18-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (121 commits)
  fs: use ->is_partially_uptodate in page_cache_seek_hole_data
  fs: remove the buffer_unwritten check in page_seek_hole_data
  fs: move page_cache_seek_hole_data to iomap.c
  xfs: use iomap_bmap
  iomap: add an iomap-based bmap implementation
  iomap: add a iomap_sector helper
  iomap: use __bio_add_page in iomap_dio_zero
  iomap: move IOMAP_F_BOUNDARY to gfs2
  iomap: fix the comment describing IOMAP_NOWAIT
  iomap: inline data should be an iomap type, not a flag
  mm: split ->readpages calls to avoid non-contiguous pages lists
  mm: return an unsigned int from __do_page_cache_readahead
  mm: give the 'ret' variable a better name __do_page_cache_readahead
  block: add a lower-level bio_add_page interface
  xfs: fix error handling in xfs_refcount_insert()
  xfs: fix xfs_rtalloc_rec units
  xfs: strengthen rtalloc query range checks
  xfs: xfs_rtbuf_get should check the bmapi_read results
  xfs: xfs_rtword_t should be unsigned, not signed
  dax: change bdev_dax_supported() to support boolean returns
  ...
2018-06-05 13:24:20 -07:00
Andreas Gruenbacher
628e366df1 gfs2: Iomap cleanups and improvements
Clean up gfs2_iomap_alloc and gfs2_iomap_get.  Document how
gfs2_iomap_alloc works: it now needs to be called separately after
gfs2_iomap_get where necessary; this will be used later by iomap write.
Move gfs2_iomap_ops into bmap.c.

Introduce a new gfs2_iomap_get_alloc helper and use it in
fallocate_chunk: gfs2_iomap_begin will become unsuitable for fallocate
with proper iomap write support.

In gfs2_block_map and fallocate_chunk, zero-initialize struct iomap.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-04 07:56:51 -05:00
Andreas Gruenbacher
845802b112 gfs2: Remove ordered write mode handling from gfs2_trans_add_data
In journaled data mode, we need to add each buffer head to the current
transaction.  In ordered write mode, we only need to add the inode to
the ordered inode list.  So far, both cases are handled in
gfs2_trans_add_data.  This makes the code look misleading and is
inefficient for small block sizes as well.  Handle both cases separately
instead.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-04 07:50:16 -05:00
Andreas Gruenbacher
7841b9f084 gfs2: hole_size improvement
Reimplement function hole_size based on a generic function for walking
the metadata tree and rename hole_size to gfs2_hole_size.  While
previously, multiple invocations of hole_size were sometimes needed to
walk across the entire hole, the new implementation always returns the
entire hole at once (provided that the caller is interested in the total
size).

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-04 07:39:23 -05:00
Andreas Gruenbacher
07e23d68f6 gfs2: Update find_metapath comment
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-06-04 07:32:44 -05:00
Christoph Hellwig
7ee66c03e4 iomap: move IOMAP_F_BOUNDARY to gfs2
Just define a range of fs specific flags and use that in gfs2 instead of
exposing this internal flag globally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Christoph Hellwig
19319b5321 iomap: inline data should be an iomap type, not a flag
Inline data is fundamentally different from our normal mapped case in that
it doesn't even have a block address.  So instead of having a flag for it
it should be an entirely separate iomap range type.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-06-01 18:37:32 -07:00
Andreas Gruenbacher
9a38662ba4 gfs2: Remove sdp->sd_jheightsize
GFS2 keeps two arrarys in the superblock that define the maximum size of
an inode depending on the inode's height: sdp->sd_heightsize defines the
heights in units of sb->s_blocksize; sdp->sd_jheightsize defines them in
units of sb->s_blocksize - sizeof(struct gfs2_meta_header).  These
arrays are used to determine when additional layers of indirect blocks
are needed.  The second array is used for directories which have an
additional gfs2_meta_header at the beginning of each block.

Distinguishing between these two cases makes no sense: the height
required for representing N blocks will come out the same no matter if
the calculation is done in gross (sb->s_blocksize) or net
(sb->s_blocksize - sizeof(struct gfs2_meta_header)) units.

Stuffed directories don't have an additional gfs2_meta_header, but the
stuffed case is handled separately for both files and directories,
anyway.

Remove the unncessary sdp->sd_jheightsize array.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-16 09:25:21 -07:00
Bob Peterson
3e7aafc39c GFS2: Minor improvements to comments and documentation
This patch simply fixes some comments and the gfs2-glocks.txt file:
Places where i_rwsem was called i_mutex, and adding i_rw_mutex.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-12 10:07:51 -07:00
Andreas Gruenbacher
fffb64127a gfs2: Zero out fallocated blocks in fallocate_chunk
Instead of zeroing out fallocated blocks in gfs2_iomap_alloc, zero them
out in fallocate_chunk, much higher up the call stack.  This gets rid of
gfs2's abuse of the IOMAP_ZERO flag as well as the gfs2 specific zeronew
buffer flag.  I can't think of a reason why zeroing out the blocks in
gfs2_iomap_alloc would have any benefits: there is no additional locking
at that level that would add protection to the newly allocated blocks.

While at it, change fallocate over from gs2_block_map to gfs2_iomap_begin.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
2018-03-29 06:50:32 -07:00
Andreas Gruenbacher
bb491ce67a gfs2: Check for the end of metadata in punch_hole
When punching a hole or truncating an inode down to a given size, also
check if the truncate point / start of the hole is within the range we
have metadata for.  Otherwise, we can end up freeing blocks that
shouldn't be freed, corrupting the inode, or crashing the machine when
trying to punch a hole into the void.

When growing an inode via truncate, we set the new size but we don't
allocate additional levels of indirect blocks and grow the inode height.
When shrinking that inode again, the new size may still point beyond the
end of the inode's metadata.

Fixes xfstest generic/476.

Debugged-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-23 11:43:02 -07:00
Andreas Gruenbacher
d39d18e0ef gfs2: Improve gfs2_block_map comment
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-08 09:26:20 -07:00
Andreas Gruenbacher
3b5da96e45 gfs2: Fixes to "Implement iomap for block_map" (2)
It turns out that commit 3229c18c0d6b2 'Fixes to "Implement iomap for
block_map"' introduced another bug in gfs2_iomap_begin that can cause
gfs2_block_map to set bh->b_size of an actual buffer to 0.  This can
lead to arbitrary incorrect behavior including crashes or disk
corruption.  Revert the incorrect part of that commit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-03-07 11:40:38 -07:00
Andreas Gruenbacher
49edd5bf42 gfs2: Fixes to "Implement iomap for block_map"
It turns out that commit 3974320ca6 "Implement iomap for block_map"
introduced a few bugs that trigger occasional failures with xfstest
generic/476:

In gfs2_iomap_begin, we jump to do_alloc when we determine that we are
beyond the end of the allocated metadata (height > ip->i_height).
There, we can end up calling hole_size with a metapath that doesn't
match the current metadata tree, which doesn't make sense.  After
untangling the code at do_alloc, fix this by checking if the block we
are looking for is within the range of allocated metadata.

In addition, add a BUG() in case gfs2_iomap_begin is accidentally called
for reading stuffed files: this is handled separately.  Make sure we
don't truncate iomap->length for reads beyond the end of the file; in
that case, the entire range counts as a hole.

Finally, revert to taking a bitmap write lock when doing allocations.
It's unclear why that change didn't lead to any failures during testing.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-13 13:38:10 -07:00
Andreas Gruenbacher
235628c5c7 gfs2: Add gfs2_max_stuffed_size
Add a small inline function for computing the maximum size of a stuffed
inode instead of open coding that in several places throughout the code.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 14:18:53 -07:00
Andreas Gruenbacher
4e56a6411f gfs2: Implement fallocate(FALLOC_FL_PUNCH_HOLE)
Implement the top-level bits of punching a hole into a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 21:15:58 +01:00
Andreas Gruenbacher
10d2cf94c2 gfs2: Turn trunc_dealloc into punch_hole
Add an upper bound to the range of blocks to deallocate blocks to
function trunc_dealloc so that this function can be used for truncating
a file as well as for punching a hole into a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 21:15:57 +01:00
Andreas Gruenbacher
5cf26b1e88 gfs2: Generalize truncate code
Pull the code for computing the range of metapointers to iterate out of
gfs2_metapath_ra (for readahead), sweep_bh_for_rgrps (for deallocating
metapointers within a block), and trunc_dealloc (for walking the
metadata tree).

In sweep_bh_for_rgrps, move the code for looking up the resource group
descriptor of the current resource group out of the inner loop.  The
metatype check moves to trunc_dealloc.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-18 21:15:37 +01:00
Andreas Gruenbacher
bdba0d5ec1 Turn gfs2_block_truncate_page into gfs2_block_zero_range
Turn gfs2_block_truncate_page into a function that zeroes a range within
a block rather than only the end of a block.  This will be used for
cleaning the end of the first partial block and the start of the last
partial block when punching a hole in a file.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:53 -07:00
Andreas Gruenbacher
cb7f0903ef gfs2: Improve non-recursive delete algorithm
In rare cases, the current non-recursive delete algorithm doesn't
deallocate empty intermediary indirect blocks.  This should have very
little practical effect, but deallocating all blocks correctly should
still be preferable as it is cleaner and easier to validate.

The fix consists of using the first block to deallocate to compute the
start marker of the truncate point instead of the last block that needs
to be kept.  With that change, computing which indirect blocks are still
needed becomes relatively easy.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:52 -07:00
Andreas Gruenbacher
c3ce5aa9b0 gfs2: Fix metadata read-ahead during truncate
The metadata read-ahead algorithm broke when switching from recursive to
non-recursive delete: the current algorithm reads ahead blocks at height
N - 1 while deallocating the blocks at hight N.  However, deallocating
the blocks at height N requires a complete walk of the metadata tree,
not only down to height N - 1.  Consequently, all blocks below height
N - 1 will be accessed without read-ahead.

Fix this by issuing read-aheads as early as possible, after each
metapath lookup.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:50 -07:00
Andreas Gruenbacher
e8b43fe0c1 gfs2: Clean up {lookup,fillup}_metapath
Split out the entire lookup loop from lookup_metapath and
fillup_metapath.  Make both functions return the actual height in
mp->mp_aheight, and return 0 on success.  Handle lookup errors properly
in trunc_dealloc.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:48 -07:00
Andreas Gruenbacher
e7fdf00406 gfs2: Remove minor gfs2_journaled_truncate inefficiencies
First, this function truncates the file in chunks.  When the original
file size isn't block aligned, each chunk that is truncated will remain
be misaligned.  This is inefficient.

Second, this function doesn't recognize where holes are, so it loops
through them.  For each chunk of a hole, it creates a new transaction.
At least avoid creating another transactions whe the current one is
still empty.  (An better fix would be to skip large holes, of course.)

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:47 -07:00
Andreas Gruenbacher
8b5860a35c gfs2: truncate: Remove unnecessary oldsize parameters
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:45 -07:00
Andreas Gruenbacher
80990f404d gfs2: Clean up trunc_start error path
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:35:42 -07:00
Steven Whitehouse
90bcab998d gfs2: Add gfs2_blk2rgrpd comment and fix incorrect use
Document when to use gfs2_blk2rgrpd for "inexact" resource group
matching.  Based on that, fix an incorrect use of gfs2_blk2rgrpd in
sweep_bh_for_rgrps.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-01-17 06:34:24 -07:00
Bob Peterson
3974320ca6 GFS2: Implement iomap for block_map
This patch implements iomap for block mapping, and switches the
block_map function to use it under the covers.

The additional IOMAP_F_BOUNDARY iomap flag indicates when iomap has
reached a "metadata boundary" and fetching the next mapping is likely to
incur an additional I/O.  This flag is used for setting the bh buffer
boundary flag.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:33 +01:00
Bob Peterson
5f8bd4440d GFS2: Make height info part of metapath
This patch eliminates height parameters from function gfs2_bmap_alloc.
Function find_metapath determines the metapath's "find height", also
known as the desired height. Function lookup_metapath determines the
metapath's "actual height", previously known as starting height or
sheight. Function gfs2_bmap_alloc now gets both height values from
the metapath. This simplification was done as a step toward switching
the block_map functions to using iomap. The bh_map responsibilities
are also removed from function gfs2_bmap_alloc for the same reason.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-10-31 14:26:23 +01:00
Andreas Gruenbacher
20cdc1931e gfs2: Clarify gfs2_block_map
Add a comment about the logical block size for directories.  Rename
"bsize" in gfs2_block_map to "factor".  Fix a typo in the description of
metaptr1.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-09-25 12:33:18 -05:00
Bob Peterson
c4a9d1892f GFS2: Fix non-recursive truncate bug
Before this patch if you truncated a file to a smaller size it
wasn't freeing all the blocks properly. There are two reasons.

First, the metapath comparison was not comparing previous heights.
I added a function, mp_eq_to_hgt, which checks the metapath at
all heights prior to the target height.

Second, in function find_nonnull_ptr, it needed to zero out all
pointers for heights following the target height. Translated into
decimal integer terms, this way a number like 299, when incremented,
becomes 300, not 399. The 2 gets incremented to 3, and the following
digits need to be reset.

These two things allow the truncate state machine to properly find
the blocks it needs to delete.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-08-30 13:29:22 -05:00
Coly Li
e477b24b50 gfs2: add flag REQ_PRIO for metadata I/O
When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of
the bio. But flag REQ_META is just a hint for block trace, not for block
layer code to handle a bio as metadata request.

For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio
would be very informative to block layer code. For example, if bcache is
used as a I/O cache for gfs2, it will be possible for bcache code to get
the hint and cache the pre-fetched metadata blocks on cache device. This
behavior may be helpful to improve metadata I/O performance if the
following requests hit the cache.

Here are the locations in gfs2 code where a REQ_PRIO flag should be added,
- All places where REQ_READAHEAD is used, gfs2 code uses this flag for
  metadata read ahead.
- In gfs2_meta_rq() where the first metadata block is read in.
- In gfs2_write_buf_to_page(), read in quota metadata blocks to have them
  up to date.
These metadata blocks are probably to be accessed again in future, adding
a REQ_PRIO flag may have bcache to keep such metadata in fast cache
device. For system without a cache layer, REQ_PRIO can still provide hint
to block layer to handle metadata requests more properly.

Signed-off-by: Coly Li <colyli@suse.de>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-21 07:48:22 -05:00
Linus Torvalds
c96e6dabfb We've got eight GFS2 patches for this merge window:
1. Andreas Gruenbacher has four patches related to cleaning up the GFS2
    inode evict process. This is about half of his patches designed to
    fix a long-standing GFS2 hang related to the inode shrinker.
    (Shrinker calls gfs2 evict, evict calls DLM, DLM requires memory
    and blocks on the shrinker.) These 4 patches have been well tested.
    His second set of patches are still being tested, so I plan to hold
    them until the next merge window, after we have more weeks of testing.
    The first patch eliminates the flush_delayed_work, which can block.
 2. Andreas's second patch protects setting of gl_object for rgrps with
    a spin_lock to prevent proven races.
 3. His third patch introduces a centralized mechanism for queueing glock
    work with better reference counting, to prevent more races.
 4. His fourth patch retains a reference to inode glocks when an error
    occurs while creating an inode. This keeps the subsequent evict from
    needing to reacquire the glock, which might call into DLM and block
    in low memory conditions.
 5. Arvind Yadav has a patch to add const to attribute_group structures.
 6. I have a patch to detect directory entry inconsistencies and withdraw
    the file system if any are found. Better that than silent corruption.
 7. I have a patch to remove a vestigial variable from glock structures,
    saving some slab space.
 8. I have another patch to remove a vestigial variable from the GFS2
    in-core superblock structure.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZXOIfAAoJENeLYdPf93o7RVcH/jLEK3hmZOd94pDTYg3Damuo
 KI3xjyutDgQT83uwg8p5UBPwRYCDnyiOLwOWGBJJvjPEI1S4syrXq/FzOmxmX6cV
 nE28ARL/OXCoFEXBMUVHvHL3nK+zEUr8rO6Xz51B1ifVq7GV8iVK+ZgxzRhx0PWP
 f+0SVHiQtU0HKyxR5y9p43oygtHZaGbjy4WL0YbmFZM59y5q9A8rBHFACn2JyPBm
 /zXN6gF/Orao+BDXLT6OM3vNXZcOQ7FUPWwctguHsAO/bLzWiISyfJxLWJsHvSdW
 tzFTN1DByjXvqAhs4HTSuh9JfBDAyxcXkmczXJyATBkCTEJv42Iev+ILmre+wwQ=
 =YTwn
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Bob Peterson:
 "We've got eight GFS2 patches for this merge window:

   - Andreas Gruenbacher has four patches related to cleaning up the
     GFS2 inode evict process. This is about half of his patches
     designed to fix a long-standing GFS2 hang related to the inode
     shrinker: Shrinker calls gfs2 evict, evict calls DLM, DLM requires
     memory and blocks on the shrinker.

     These four patches have been well tested. His second set of patches
     are still being tested, so I plan to hold them until the next merge
     window, after we have more weeks of testing. The first patch
     eliminates the flush_delayed_work, which can block.

   - Andreas's second patch protects setting of gl_object for rgrps with
     a spin_lock to prevent proven races.

   - His third patch introduces a centralized mechanism for queueing
     glock work with better reference counting, to prevent more races.

    -His fourth patch retains a reference to inode glocks when an error
     occurs while creating an inode. This keeps the subsequent evict
     from needing to reacquire the glock, which might call into DLM and
     block in low memory conditions.

   - Arvind Yadav has a patch to add const to attribute_group
     structures.

   - I have a patch to detect directory entry inconsistencies and
     withdraw the file system if any are found. Better that than silent
     corruption.

   - I have a patch to remove a vestigial variable from glock
     structures, saving some slab space.

   - I have another patch to remove a vestigial variable from the GFS2
     in-core superblock structure"

* tag 'gfs2-4.13.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  GFS2: constify attribute_group structures.
  gfs2: gfs2_create_inode: Keep glock across iput
  gfs2: Clean up glock work enqueuing
  gfs2: Protect gl->gl_object by spin lock
  gfs2: Get rid of flush_delayed_work in gfs2_evict_inode
  GFS2: Eliminate vestigial sd_log_flush_wrapped
  GFS2: Remove gl_list from glock structure
  GFS2: Withdraw when directory entry inconsistencies are detected
2017-07-05 16:57:08 -07:00
Andreas Gruenbacher
6f6597baae gfs2: Protect gl->gl_object by spin lock
Put all remaining accesses to gl->gl_object under the
gl->gl_lockref.lock spinlock to prevent races.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-07-05 07:20:52 -05:00
Stephen Rothwell
b32c8c7648 gfs2: replace CURRENT_TIME with current_time
Link: http://lkml.kernel.org/r/20170420161852.0492bc3f@canb.auug.org.au
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:15 -07:00
Bob Peterson
d552a2b9b3 GFS2: Non-recursive delete
Implement truncate/delete as a non-recursive algorithm. The older
algorithm was implemented with recursion to strip off each layer
at a time (going by height, starting with the maximum height.
This version tries to do the same thing but without recursion,
and without needing to allocate new structures or lists in memory.

For example, say you want to truncate a very large file to 1 byte,
and its end-of-file metapath is: 0.505.463.428. The starting
metapath would be 0.0.0.0. Since it's a truncate to non-zero, it
needs to preserve that byte, and all metadata pointing to it.
So it would start at 0.0.0.0, look up all its metadata buffers,
then free all data blocks pointed to at the highest level.
After that buffer is "swept", it moves on to 0.0.0.1, then
0.0.0.2, etc., reading in buffers and sweeping them clean.
When it gets to the end of the 0.0.0 metadata buffer (for 4K
blocks the last valid one is 0.0.0.508), it backs up to the
previous height and starts working on 0.0.1.0, then 0.0.1.1,
and so forth. After it reaches the end and sweeps 0.0.1.508,
it continues with 0.0.2.0, and so on. When that height is
exhausted, and it reaches 0.0.508.508 it backs up another level,
to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep
marching backwards and forwards through the metadata until it's
all swept clean. Once it has all the data blocks freed, it
lowers the strip height, and begins the process all over again,
but with one less height. This time it sweeps 0.0.0 through
0.505.463. When that's clean, it lowers the strip height again
and works to free 0.505. Eventually it strips the lowest height, 0.
For a delete or truncate to 0, all metadata for all heights of
0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would
be preserved.

This isn't much different from normal integer incrementing,
where an integer gets incremented from 0000 (0.0.0.0) to 3021
(3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009,
then on to 0010, 0011 up to 0099, then 0100 and so forth. It's
just that each "digit" goes from 0 to 508 (for a total of 509
pointers) rather than from 0 to 9.

Note that the dinode will only have 483 pointers due to the
dinode structure itself.

Also note: this is just an example. These numbers (509 and 483)
are based on a standard 4K block size. Smaller block sizes will
yield smaller numbers of indirect pointers accordingly.

The truncation process is accomplished with the help of two
major functions and a few helper functions.

Functions do_strip and recursive_scan are obsolete, so removed.

New function sweep_bh_for_rgrps cleans a buffer_head pointed to
by the given metapath and height. By cleaning, I mean it frees
all blocks starting at the offset passed in metapath. It starts
at the first block in the buffer pointed to by the metapath and
identifies its resource group (rgrp). From there it frees all
subsequent block pointers that lie within that rgrp. If it's
already inside a transaction, it stays within it as long as it
can. In other words, it doesn't close a transaction until it knows
it's freed what it can from the resource group. In this way,
multiple buffers may be cleaned in a single transaction, as long
as those blocks in the buffer all lie within the same rgrp.

If it's not in a transaction, it starts one. If the buffer_head
has references to blocks within multiple rgrps, it frees all the
blocks inside the first rgrp it finds, then closes the
transaction. Then it repeats the cycle: identifies the next
unfreed block, uses it to find its rgrp, then starts a new
transaction for that set. It repeats this process repeatedly
until the buffer_head contains no more references to any blocks
past the given metapath.

Function trunc_dealloc has been reworked into a finite state
automaton. It has basically 3 active states:
DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP:

The DEALLOC_MP_FULL state implies the metapath has a full set
of buffers out to the "shrink height", and therefore, it can
call function sweep_bh_for_rgrps to free the blocks within the
highest height of the metapath. If it's just swept the lowest
level (or an error has occurred) the state machine is ended.
Otherwise it proceeds to the DEALLOC_MP_LOWER state.

The DEALLOC_MP_LOWER state implies we are finished with a given
buffer_head, which may now be released, and therefore we are
then missing some buffer information from the metapath. So we
need to find more buffers to read in. In most cases, this is
just a matter of releasing the buffer_head and moving to the
next pointer from the previous height, so it may be read in and
swept as well. If it can't find another non-null pointer to
process, it checks whether it's reached the end of a height
and needs to lower the strip height, or whether it still needs
move forward through the previous height's metadata. In this
state, all zero-pointers are skipped. From this state, it can
only loop around (once more backing up another height) or,
once a valid metapath is found (one that has non-zero
pointers), proceed to state DEALLOC_FILL_MP.

The DEALLOC_FILL_MP state implies that we have a metapath
but not all its buffers are read in. So we must proceed to read
in buffer_heads until the metapath has a valid buffer for every
height. If the previous state backed us up 3 heights, we may
need to read in a buffer, increment the height, then repeat the
process until buffers have been read in for all required heights.
If it's successful reading a buffer, and it's at the highest
height we need, it proceeds back to the DEALLOC_MP_FULL state.
If it's unable to fill in a buffer, (encounters a hole, etc.)
it tries to find another non-zero block pointer. If they're all
zero, it lowers the height and returns to the DEALLOC_MP_LOWER
state. If it finds a good non-null pointer, it loops around and
reads it in, while keeping the metapath in lock-step with the
pointers it examines.

The state machine runs until the truncation request is
satisfied. Then any transactions are ended, the quota and
statfs data are updated, and the function is complete.

Helper function metaptr1 was introduced to be an easy way to
determine the start of a buffer_head's indirect pointers.

Helper function lookup_mp_height was introduced to find a
metapath index and read in the buffer that corresponds to it.
In this way, function lookup_metapath becomes a simple loop to
call it for every height.

Helper function fillup_metapath is similar to lookup_metapath
except it can do partial lookups. If the state machine
backed up multiple levels (like 2999 wrapping to 3000) it
needs to find out the next starting point and start issuing
metadata reads at that point.

Helper function hptrs is a shortcut to determine how many
pointers should be expected in a buffer. Height 0 is the dinode
which has fewer pointers than the others.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-04-19 08:25:43 -04:00
Linus Torvalds
9763dd6f81 We've got eight GFS2 patches for this merge window:
1. Andy Price submitted a patch to make gfs2_write_full_page a
    static function.
 2. Dan Carpenter submitted a patch to fix a ERR_PTR thinko.
 
 I've also got a few patches, three of which fix bugs related to
 deleting very large files, which cause GFS2 to run out of
 journal space:
 
 3. The first one prevents GFS2 delete operation from requesting too
    much journal space.
 4. The second one fixes a problem whereby GFS2 can hang because it
    wasn't taking journal space demand into its calculations.
 5. The third one wakes up IO waiters when a flush is done to restart
    processes stuck waiting for journal space to become available.
 
 The other three patches are a performance improvement related to
 spin_lock contention between multiple writers:
 
 6. The "tr_touched" variable was switched to a flag to be more atomic
    and eliminate the possibility of some races.
 7. Function meta_lo_add was moved inline with its only caller to make
    the code more readable and efficient.
 8. Contention on the gfs2_log_lock spinlock was greatly reduced by
    avoiding the lock altogether in cases where we don't really need
    it: buffers that already appear in the appropriate metadata list
    for the journal. Many thanks to Steve Whitehouse for the ideas and
    principles behind these patches.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYrEEEAAoJENeLYdPf93o7bjoIAIqPG/EAzi+idgMWDPQa9Eit
 53dPy16snkrbWwtaK6spSWlH6bGYuHeanXORYon9bvtVjKYaa4NQclGihN2IE6uB
 O8zT+MGwP45LDhNplVJpumaOALZ9ZDqQSe+3tHeNK3FhNirLyiIjSqrHt/7Yi1qi
 fPLlT4Jx0TBo5rhvEGa7Yg01WhWVtnmVSMqJXj/7ZtC50s1aPyDUikdNIDfDCN2X
 LxfKGDXuk6p63VQ6JKqYSBVATCR0/bbKfkuk/kBUTYLoHoapImxB8d0HgIdsh1Mv
 9PlbZnnNW8k5oapuhVxjl0T5G0JsQgCkPb/wlte+ryOCjBoc2L2fCUV5qc0QxWc=
 =xQyl
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull GFS2 updates from Robert Peterson:
 "We've got eight GFS2 patches for this merge window:

   - Andy Price submitted a patch to make gfs2_write_full_page a static
     function.

   - Dan Carpenter submitted a patch to fix a ERR_PTR thinko.

  Three patches fix bugs related to deleting very large files, which
  cause GFS2 to run out of journal space:

   - The first one prevents GFS2 delete operation from requesting too
     much journal space.

   - The second one fixes a problem whereby GFS2 can hang because it
     wasn't taking journal space demand into its calculations.

   - The third one wakes up IO waiters when a flush is done to restart
     processes stuck waiting for journal space to become available.

  The final three patches are a performance improvement related to
  spin_lock contention between multiple writers:

   - The "tr_touched" variable was switched to a flag to be more atomic
     and eliminate the possibility of some races.

   - Function meta_lo_add was moved inline with its only caller to make
     the code more readable and efficient.

   - Contention on the gfs2_log_lock spinlock was greatly reduced by
     avoiding the lock altogether in cases where we don't really need
     it: buffers that already appear in the appropriate metadata list
     for the journal. Many thanks to Steve Whitehouse for the ideas and
     principles behind these patches"

* tag 'gfs2-4.11.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Make gfs2_write_full_page static
  GFS2: Reduce contention on gfs2_log_lock
  GFS2: Inline function meta_lo_add
  GFS2: Switch tr_touched to flag in transaction
  GFS2: Wake up io waiters whenever a flush is done
  GFS2: Made logd daemon take into account log demand
  GFS2: Limit number of transaction blocks requested for truncates
  GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
2017-02-21 07:46:34 -08:00
Bob Peterson
2fcf5cc3be GFS2: Limit number of transaction blocks requested for truncates
This patch limits the number of transaction blocks requested during
file truncates. If we have very large multi-terabyte files, and want
to delete or truncate them, they might span so many resource groups
that we overflow the journal blocks, and cause an assert failure.
By limiting the number of blocks in the transaction, we prevent this
overflow and give other running processes time to do transactions.

The limiting factor I chose is sd_log_thresh2 which is currently
set to 4/5ths of the journal. This same ratio is used in function
gfs2_ail_flush_reqd to determine when a log flush is required.
If we make the maximum value less than this, we can get into a
infinite hang whereby the log stops moving because the number of
used blocks is less than the threshold and the iterative loop
needs more, but since we're under the threshold, the log daemon
never starts any IO on the log.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-01-05 14:47:36 -05:00
Linus Torvalds
101105b171 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 ">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: Replace current_fs_time() with current_time()
  fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  fs: Replace CURRENT_TIME with current_time() for inode timestamps
  fs: proc: Delete inode time initializations in proc_alloc_inode()
  vfs: Add current_time() api
  vfs: add note about i_op->rename changes to porting
  fs: rename "rename2" i_op to "rename"
  vfs: remove unused i_op->rename
  fs: make remaining filesystems use .rename2
  libfs: support RENAME_NOREPLACE in simple_rename()
  fs: support RENAME_NOREPLACE for local filesystems
  ncpfs: fix unused variable warning
2016-10-10 20:16:43 -07:00
Deepa Dinamani
078cd8279e fs: Replace CURRENT_TIME with current_time() for inode timestamps
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_time() instead.

CURRENT_TIME is also not y2038 safe.

This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks. Hence, it is necessary for all
file system timestamps to use current_time(). Also,
current_time() will be transitioned along with vfs to be
y2038 safe.

Note that whenever a single call to current_time() is used
to change timestamps in different inodes, it is because they
share the same time granularity.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-27 21:06:21 -04:00
Fabian Frederick
47a9a52794 GFS2: use BIT() macro
Replace 1 << value shift by more explicit BIT() macro

Also fixes two bare unsigned definitions:

WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+		unsigned hsize = BIT(ip->i_depth);

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2016-08-02 12:05:27 -05:00
Christoph Hellwig
70246286e9 block: get rid of bio_rw and READA
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense.  Any check for READA is replaced with an
explicit check for REQ_RAHEAD.  Also remove the READA alias for
REQ_RAHEAD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-07-20 17:37:01 -06:00
Mike Christie
dfec8a14fc fs: have ll_rw_block users pass in op and flags separately
This has ll_rw_block users pass in the operation and flags separately,
so ll_rw_block can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie
2a222ca992 fs: have submit_bh users pass in op and flags separately
This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Kirill A. Shutemov
09cbfeaf1a mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-04 10:41:08 -07:00
Bob Peterson
a097dc7e24 GFS2: Make rgrp reservations part of the gfs2_inode structure
Before this patch, multi-block reservation structures were allocated
from a special slab. This patch folds the structure into the gfs2_inode
structure. The disadvantage is that the gfs2_inode needs more memory,
even when a file is opened read-only. The advantages are: (a) we don't
need the special slab and the extra time it takes to allocate and
deallocate from it. (b) we no longer need to worry that the structure
exists for things like quota management. (c) This also allows us to
remove the calls to get_write_access and put_write_access since we
know the structure will exist.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-12-14 12:16:38 -06:00
Bob Peterson
b54e9a0b92 GFS2: Extract quota data from reservations structure (revert 5407e24)
This patch basically reverts the majority of patch 5407e24.
That patch eliminated the gfs2_qadata structure in favor of just
using the reservations structure. The problem with doing that is that
it increases the size of the reservations structure. That is not an
issue until it comes time to fold the reservations structure into the
inode in memory so we know it's always there. By separating out the
quota structure again, we aren't punishing the non-quota users by
making all the inodes bigger, requiring more slab space. This patch
creates a new slab area to allocate the quota stuff so it's managed
a little more sanely.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2015-11-24 08:38:44 -06:00
Abhi Das
b8fbf471ed gfs2: perform quota checks against allocation parameters
Use struct gfs2_alloc_parms as an argument to gfs2_quota_check()
and gfs2_quota_lock_check() to check for quota violations while
accounting for the new blocks requested by the current operation
in ap->target.

Previously, the number of new blocks requested during an operation
were not accounted for during quota_check and would allow these
operations to exceed quota. This was not very apparent since most
operations allocated only 1 block at a time and quotas would get
violated in the next operation. i.e. quota excess would only be by
1 block or so. With fallocate, (where we allocate a bunch of blocks
at once) the quota excess is non-trivial and is addressed by this
patch.

Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2015-03-18 12:46:54 -05:00
Bob Peterson
b650738cd0 GFS2: Change maxlen variables to size_t
This patch changes some variables (especially maxlen in function
gfs2_block_map) from unsigned int to size_t. We need 64-bit arithmetic
for very large files (e.g. 1PB) where the variables otherwise get
shifted to all 0's.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-08-21 10:22:23 +01:00
Fabian Frederick
c62baf65bf GFS2: fs/gfs2/bmap.c: kernel-doc warning fixes
Fix 2 typos and move one definition which was between function
comments and function definition (yet another kernel-doc warning)

Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-05-16 09:34:21 +01:00
Steven Whitehouse
b50f227bdd GFS2: Clean up journal extent mapping
This patch fixes a long standing issue in mapping the journal
extents. Most journals will consist of only a single extent,
and although the cache took account of that by merging extents,
it did not actually map large extents, but instead was doing a
block by block mapping. Since the journal was only being mapped
on mount, this was not normally noticeable.

With the updated code, it is now possible to use the same extent
mapping system during journal recovery (which will be added in a
later patch). This will allow checking of the integrity of the
journal before any reply of the journal content is attempted. For
this reason the code is moving to bmap.c, since it will be used
more widely in due course.

An exercise left for the reader is to compare the new function
gfs2_map_journal_extents() with gfs2_write_alloc_required()

Additionally, should there be a failure, the error reporting is
also updated to show more detail about what went wrong.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2014-03-03 13:50:12 +00:00
Steven Whitehouse
7b9cff4671 GFS2: Add allocation parameters structure
This patch adds a structure to contain allocation parameters with
the intention of future expansion of this structure. The idea is
that we should be able to add more information about the allocation
in the future in order to allow the allocator to make a better job
of placing the requests on-disk.

There is no functional difference from applying this patch.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-10-02 11:13:25 +01:00
Steven Whitehouse
af5c269799 GFS2: Clean up reservation removal
The reservation for an inode should be cleared when it is truncated so
that we can start again at a different offset for future allocations.
We could try and do better than that, by resetting the search based on
where the truncation started from, but this is only a first step.

In addition, there are three callers of gfs2_rs_delete() but only one
of those should really be testing the value of i_writecount. While
we get away with that in the other cases currently, I think it would
be better if we made that test specific to the one case which
requires it.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-09-27 12:49:33 +01:00
Kirill A. Shutemov
7caef26767 truncate: drop 'oldsize' truncate_pagecache() parameter
truncate_pagecache() doesn't care about old size since commit
cedabed49b ("vfs: Fix vmtruncate() regression").  Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:02 -07:00
Bob Peterson
a01aedfe21 GFS2: Reserve journal space for quota change in do_grow
If a GFS2 file system is mounted with quotas and a file is grown
in such a way that its free blocks for the allocation are represented
in a secondary bitmap, GFS2 ran out of blocks in the transaction.
That resulted in "fatal: assertion "tr->tr_num_buf <= tr->tr_blocks".
This patch reserves extra blocks for the quota change so the
transaction has enough space.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-27 18:16:27 +01:00
Bob Peterson
2b3dcf3581 GFS2: Increase i_writecount during gfs2_setattr_size
This patch calls get_write_access in a few functions. This
merely increases inode->i_writecount for the duration of the function.
That will ensure that any file closes won't delete the inode's
multi-block reservation while the function is running.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-06-03 16:38:58 +01:00
Bob Peterson
20095218fb GFS2: Remove vestigial parameter ip from function rs_deltree
The functions that delete block reservations from the rgrp block
reservations rbtree no longer use the ip parameter. This patch
eliminates the parameter.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-04-08 08:41:04 +01:00
Linus Torvalds
94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Eric W. Biederman
f4108a607f gfs2: Split NO_QUOTA_CHANGE inot NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE
Split NO_QUOTA_CHANGE into NO_UID_QUTOA_CHANGE and NO_GID_QUTOA_CHANGE
so the constants may be well typed.

Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13 06:15:01 -08:00
Bob Peterson
d2b47cfb26 GFS2: Get a block reservation before resizing a file
This patch allocates a block reservation structure before growing
or shrinking a file. Without this structure, the grow or shink code
can reference the bad pointer.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-02-01 20:37:33 +00:00
Steven Whitehouse
4513899092 GFS2: Use ->writepages for ordered writes
Instead of using a list of buffers to write ahead of the journal
flush, this now uses a list of inodes and calls ->writepages
via filemap_fdatawrite() in order to achieve the same thing. For
most use cases this results in a shorter ordered write list,
as well as much larger i/os being issued.

The ordered write list is sorted by inode number before writing
in order to retain the disk block ordering between inodes as
per the previous code.

The previous ordered write code used to conflict in its assumptions
about how to write out the disk blocks with mpage_writepages()
so that with this updated version we can also use mpage_writepages()
for GFS2's ordered write, writepages implementation. So we will
also send larger i/os from writeback too.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:29:17 +00:00
Steven Whitehouse
350a9b0a72 GFS2: Split gfs2_trans_add_bh() into two
There is little common content in gfs2_trans_add_bh() between the data
and meta classes by the time that the functions which it calls are
taken into account. The intent here is to split this into two
separate functions. Stage one is to introduce gfs2_trans_add_data()
and gfs2_trans_add_meta() and update the callers accordingly.

Later patches will then pull in the content of gfs2_trans_add_bh()
and its dependent functions in order to clean up the code in this
area.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2013-01-29 10:28:04 +00:00
Steven Whitehouse
fa731fc4e0 GFS2: Fix truncation of journaled data files
This patch fixes an issue relating to not having enough revokes
available when truncating journaled data files. In order to ensure
that we do no run out, the truncation is broken into separate pieces
if it is large enough.

Tested using fsx on a journaled data file.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-13 09:50:28 +00:00
Steven Whitehouse
9dbe9610b9 GFS2: Add Orlov allocator
Just like ext3, this works on the root directory and any directory
with the +T flag set. Also, just like ext3, any subdirectory created
in one of the just mentioned cases will be allocated to a random
resource group (GFS2 equivalent of a block group).

If you are creating a set of directories, each of which will contain a
job running on a different node, then by setting +T on the parent
directory before creating the subdirectories, each will land up in a
different resource group, and thus resource group contention between
nodes will be kept to a minimum.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-07 13:33:17 +00:00
Steven Whitehouse
4a993fb150 GFS2: Add structure to contain rgrp, bitmap, offset tuple
This patch introduces a new structure, gfs2_rbm, which is a
tuple of a resource group, a bitmap within the resource group
and an offset within that bitmap. This is designed to make
manipulating these sets of variables easier. There is also a
new helper function which converts this representation back
to a disk block address.

In addition, the rbtree nodes which are used for the reservations
were not being correctly initialised, which is now fixed. Also,
the tracing was not passing through the inode where it should
have been. That is mostly fixed aside from one corner case. This
needs to be revisited since there can also be a NULL rgrp in
some cases which results in the device being incorrect in the
trace.

This is intended to be the first step towards cleaning up some
of the allocation code, and some further bug fixes.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-09-24 10:46:56 +01:00
Bob Peterson
8e2e004735 GFS2: Reduce file fragmentation
This patch reduces GFS2 file fragmentation by pre-reserving blocks. The
resulting improved on disk layout greatly speeds up operations in cases
which would have resulted in interlaced allocation of blocks previously.
A typical example of this is 10 parallel dd processes, each writing to a
file in a common dirctory.

The implementation uses an rbtree of reservations attached to each
resource group (and each inode).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-07-19 14:51:08 +01:00
Bob Peterson
5407e24229 GFS2: Fold quota data into the reservations struct
This patch moves the ancillary quota data structures into the
block reservations structure. This saves GFS2 some time and
effort in allocating and deallocating the qadata structure.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-06-06 11:20:22 +01:00
Bob Peterson
f2f9c81244 GFS2: Eliminate unused "new" parameter to gfs2_meta_indirect_buffer
It turns out that the "new" parameter to function gfs2_meta_indirect_buffer
was always being passed in as zero. Therefore, this patch eliminates it
and simplifies the function.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-05-11 10:19:23 +01:00
Bob Peterson
2f7ee358e5 GFS2: Use variable rather than qa to determine if unstuff necessary
In the future, the qadata structure will be eliminated and merged
back in with the block reservation structure, after we extend the
lifespan of that. This patch is a step forward in eliminating the
qadata structure. It adds a variable to the do_grow function to
determine when unstuffing is necessary, and has been done.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-24 16:44:33 +01:00
Bob Peterson
5e2f7d617b GFS2: Make sure rindex is uptodate before starting transactions
This patch removes the call from gfs2_blk2rgrd to function
gfs2_rindex_update and replaces it with individual calls.
The former way turned out to be too problematic.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-04-05 10:20:10 +01:00
Bob Peterson
220cca2a4f GFS2: Change truncate page allocation to be GFP_NOFS
This patch changes the page allocation in gfs2_block_truncate_page
and two others to GFP_NOFS to avoid deadlock in low-memory conditions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-03-20 11:05:00 +00:00
Bob Peterson
564e12b115 GFS2: decouple quota allocations from block allocations
This patch separates the code pertaining to allocations into two
parts: quota-related information and block reservations.
This patch also moves all the block reservation structure allocations to
function gfs2_inplace_reserve to simplify the code, and moves
the frees to function gfs2_inplace_release.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22 10:25:21 +00:00
Bob Peterson
6e87ed0fc9 GFS2: move toward a generic multi-block allocator
This patch is a revision of the one I previously posted.
I tried to integrate all the suggestions Steve gave.
The purpose of the patch is to change function gfs2_alloc_block
(allocate either a dinode block or an extent of data blocks)
to a more generic gfs2_alloc_blocks function that can
allocate both a dinode _and_ an extent of data blocks in the
same call. This will ultimately help us create a multi-block
reservation scheme to reduce file fragmentation.

This patch moves more toward a generic multi-block allocator that
takes a pointer to the number of data blocks to allocate, plus whether
or not to allocate a dinode. In theory, it could be called to allocate
(1) a single dinode block, (2) a group of one or more data blocks, or
(3) a dinode plus several data blocks.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21 10:04:09 +00:00