linux

Author	SHA1	Message	Date
Dave Chinner	bb80c6d79a	xfs: verify AGFL blocks as they are read from disk Add an AGFL block verify callback function and pass it into the buffer read functions. While this commit adds verification code to the AGFL, it cannot be used reliably until the CRC format change comes along as mkfs does not initialise the full AGFL. Hence it can be full of garbage at the first mount and will fail verification right now. CRC enabled filesystems won't have this problem, so leave the code that has already been written ifdef'd out until the proper time. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-15 21:34:14 -06:00
Dave Chinner	3702ce6ed7	xfs: verify AGI blocks as they are read from disk Add an AGI block verify callback function and pass it into the buffer read functions. Remove the now redundant verification code that is currently in use. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-15 21:34:12 -06:00
Dave Chinner	5d5f527d13	xfs: verify AGF blocks as they are read from disk Add an AGF block verify callback function and pass it into the buffer read functions. This replaces the existing verification that is done after the read completes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-15 21:34:10 -06:00
Dave Chinner	98021821a5	xfs: verify superblocks as they are read from disk Add a superblock verify callback function and pass it into the buffer read functions. Remove the now redundant verification code that is currently in use. Adding verification shows that secondary superblocks never have their "sb_inprogress" flag cleared by mkfs.xfs, so when validating the secondary superblocks during a grow operation we have to avoid checking this field. Even if we fix mkfs, we will still have to ignore this field for verification purposes unless a version of mkfs that does not have this bug was used. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-15 21:34:07 -06:00
Dave Chinner	eab4e63368	xfs: uncached buffer reads need to return an error With verification being done as an IO completion callback, different errors can be returned from a read. Uncached reads only return a buffer or NULL on failure, which means the verification error cannot be returned to the caller. Split the error handling for these reads into two - a failure to get a buffer will still return NULL, but a read error will return a referenced buffer with b_error set rather than NULL. The caller is responsible for checking the error state of the buffer returned. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-15 21:34:05 -06:00
Dave Chinner	c3f8fc73ac	xfs: make buffer read verication an IO completion function Add a verifier function callback capability to the buffer read interfaces. This will be used by the callers to supply a function that verifies the contents of the buffer when it is read from disk. This patch does not provide callback functions, but simply modifies the interfaces to allow them to be called. The reason for adding this to the read interfaces is that it is very difficult to tell fom the outside is a buffer was just read from disk or whether we just pulled it out of cache. Supplying a callbck allows the buffer cache to use it's internal knowledge of the buffer to execute it only when the buffer is read from disk. It is intended that the verifier functions will mark the buffer with an EFSCORRUPTED error when verification fails. This allows the reading context to distinguish a verification error from an IO error, and potentially take further actions on the buffer (e.g. attempt repair) based on the error reported. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Phil White <pwhite@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-15 21:34:02 -06:00
Yan Hong	7dd2517c39	fs/debugsfs: remove unnecessary inode->i_private initialization inode->i_private is promised to be NULL on allocation, no need to set it explicitly. Signed-off-by: Yan Hong <clouds.yan@gmail.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2012-11-15 17:46:42 -08:00
Linus Torvalds	ce95a36bb9	Merge tag 'upstream-3.7-rc6' of git://git.infradead.org/linux-ubifs Pull UBIFS fixes from Artem Bityutskiy: "Two patches which fix a problem reported by several people in the past, but only fixed now because no one gave enough material for debugging. Anyway, these fix the problem that sometimes after a power cut the file-system is not mountable with the following symptom: grab_empty_leb: could not find an empty LEB The fixes make the file-system mountable again." * tag 'upstream-3.7-rc6' of git://git.infradead.org/linux-ubifs: UBIFS: fix mounting problems after power cuts UBIFS: introduce categorized lprops counter	2012-11-15 11:28:43 -08:00
Stanislav Kinsbursky	0912128149	nfsd: make laundromat network namespace aware This patch moves laundromat_work to nfsd per-net context, thus allowing to run multiple laundries. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:51 -05:00
Stanislav Kinsbursky	12760c6685	nfsd: pass nfsd_net instead of net to grace enders Passing net context looks as overkill. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:50 -05:00
Stanislav Kinsbursky	3320fef19b	nfsd: use service net instead of hard-coded init_net This patch replaces init_net by SVC_NET(), where possible and also passes proper context to nested functions where required. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:50 -05:00
Stanislav Kinsbursky	73758fed71	nfsd: make close_lru list per net This list holds nfs4 clients (open) stateowner queue for last close replay, which are network namespace aware. So let's make this list per network namespace too. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:49 -05:00
Stanislav Kinsbursky	5ed58bb243	nfsd: make client_lru list per net This list holds nfs4 clients queue for lease renewal, which are network namespace aware. So let's make this list per network namespace too. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:48 -05:00
Stanislav Kinsbursky	1872de0e81	nfsd: make sessionid_hashtbl allocated per net This hash holds established sessions state and closely associated with nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace too. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:47 -05:00
Stanislav Kinsbursky	20e9e2bc98	nfsd: make lockowner_ino_hashtbl allocated per net This hash holds file lock owners and closely associated with nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace too. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:47 -05:00
Stanislav Kinsbursky	9b53113740	nfsd: make ownerstr_hashtbl allocated per net This hash holds open owner state and closely associated with nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace too. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:46 -05:00
Stanislav Kinsbursky	a99454aa4f	nfsd: make unconf_name_tree per net This hash holds nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:45 -05:00
Stanislav Kinsbursky	0a7ec37727	nfsd: make unconf_id_hashtbl allocated per net This hash holds nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:45 -05:00
Stanislav Kinsbursky	382a62e76c	nfsd: make conf_name_tree per net This tree holds nfs4_clients info, which are network namespace aware. So let's make it per network namespace. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:44 -05:00
Stanislav Kinsbursky	8daae4dc0d	nfsd: make conf_id_hashtbl allocated per net This hash holds nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace. Note: this hash can be allocated in per-net operations. But it looks better to allocate it on nfsd state start and thus don't waste resources if server is not running. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:43 -05:00
Stanislav Kinsbursky	52e19c09a1	nfsd: make reclaim_str_hashtbl allocated per net This hash holds nfs4_clients info, which are network namespace aware. So let's make it allocated per network namespace. Note: this hash is used only by legacy tracker. So let's allocate hash in tracker init. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:43 -05:00
Stanislav Kinsbursky	c212cecfa2	nfsd: make nfs4_client network namespace dependent And use it's net where possible. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:42 -05:00
Stanislav Kinsbursky	7f2210fa6b	nfsd: use service net instead of hard-coded net where possible Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-15 07:40:41 -05:00
David Teigland	4e2f8849de	GFS2: remove redundant lvb pointer The lksb struct already contains a pointer to the lvb, so another directly from the glock struct is not needed. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-15 10:17:22 +00:00
David Teigland	dba2d70c5d	GFS2: only use lvb on glocks that need it Save the effort of allocating, reading and writing the lvb for most glocks that do not use it. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-15 10:16:59 +00:00
Eric W. Biederman	499dcf2024	userns: Support fuse interacting with multiple user namespaces Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data. The connection between between a fuse filesystem and a fuse daemon is established when a fuse filesystem is mounted and provided with a file descriptor the fuse daemon created by opening /dev/fuse. For now restrict the communication of uids and gids between the fuse filesystem and the fuse daemon to the initial user namespace. Enforce this by verifying the file descriptor passed to the mount of fuse was opened in the initial user namespace. Ensuring the mount happens in the initial user namespace is not necessary as mounts from non-initial user namespaces are not yet allowed. In fuse_req_init_context convert the currrent fsuid and fsgid into the initial user namespace for the request that will be sent to the fuse daemon. In fuse_fill_attr convert the uid and gid passed from the fuse daemon from the initial user namespace into kuids and kgids. In iattr_to_fattr called from fuse_setattr convert kuids and kgids into the uids and gids in the initial user namespace before passing them to the fuse filesystem. In fuse_change_attributes_common called from fuse_dentry_revalidate, fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert the uid and gid from the fuse daemon into a kuid and a kgid to store on the fuse inode. By default fuse mounts are restricted to task whose uid, suid, and euid matches the fuse user_id and whose gid, sgid, and egid matches the fuse group id. Convert the user_id and group_id mount options into kuids and kgids at mount time, and use uid_eq and gid_eq to compare the in fuse_allow_task. Cc: Miklos Szeredi <miklos@szeredi.hu> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-14 22:05:33 -08:00
Eric W. Biederman	45634cd8cb	userns: Support autofs4 interacing with multiple user namespaces Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue. When creating directories and symlinks default the uid and gid of the mount requester to the global root uid and gid. autofs4_wait will update these fields when a mount is requested. When generating autofsv5 packets report the uid and gid of the mount requestor in user namespace of the process that opened the pipe, reporting unmapped uids and gids as overflowuid and overflowgid. In autofs_dev_ioctl_requester return the uid and gid of the last mount requester converted into the calling processes user namespace. When the uid or gid don't map return overflowuid and overflowgid as appropriate, allowing failure to find a mount requester to be distinguished from failure to map a mount requester. The uid and gid mount options specifying the user and group of the root autofs inode are converted into kuid and kgid as they are parsed defaulting to the current uid and current gid of the process that mounts autofs. Mounting of autofs for the present remains confined to processes in the initial user namespace. Cc: Ian Kent <raven@themaw.net> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-11-14 22:05:32 -08:00
Eric Sandeen	66bea92c69	ext4: init pagevec in ext4_da_block_invalidatepages ext4_da_block_invalidatepages is missing a pagevec_init(), which means that pvec->cold contains random garbage. This affects whether the page goes to the front or back of the LRU when ->cold makes it to free_hot_cold_page() Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2012-11-14 22:22:05 -05:00
Colin Ian King	70a6f46d7b	pstore: Fix NULL pointer dereference in console writes Passing a NULL id causes a NULL pointer deference in writers such as erst_writer and efi_pstore_write because they expect to update this id. Pass a dummy id instead. This avoids a cascade of oopses caused when the initial pstore_console_write passes a null which in turn causes writes to the console causing further oopses in subsequent pstore_console_write calls. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>	2012-11-14 18:30:21 -08:00
Dave Chinner	fb59581404	xfs: remove xfs_flushinval_pages It's just a simple wrapper around VFS functionality, and is actually bugging in that it doesn't remove mappings before invalidating the page cache. Remove it and replace it with the correct VFS functionality. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Andrew Dahl <adahl@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-14 15:15:08 -06:00
Dave Chinner	4bc1ea6b8d	xfs: remove xfs_flush_pages It is a complex wrapper around VFS functions, but there are VFS functions that provide exactly the same functionality. Call the VFS functions directly and remove the unnecessary indirection and complexity. We don't need to care about clearing the XFS_ITRUNCATED flag, as that is done during .writepages. Hence is cleared by the VFS writeback path if there is anything to write back during the flush. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Andrew Dahl <adahl@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-14 15:12:45 -06:00
Dave Chinner	95eacf0f71	xfs: remove xfs_wait_on_pages() It's just a simple wrapper around a VFS function that is only called by another function in xfs_fs_subr.c. Remove it and call the VFS function directly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Andrew Dahl <adahl@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-14 15:12:20 -06:00
Andrew Dahl	d6638ae244	xfs: reverse the check on XFS_IOC_ZERO_RANGE Reversing the check on XFS_IOC_ZERO_RANGE. Range should be zeroed if the start is less than or equal to the end. Signed-off-by: Andrew Dahl <adahl@sgi.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-14 15:11:52 -06:00
Dave Chinner	f5b8911b67	xfs: remove xfs_tosspages It's a buggy, unnecessary wrapper that is duplicating truncate_pagecache_range(). When replacing the call in xfs_change_file_space(), also ensure that the length being allocated/freed is always positive before making any changes. These checks are done in the lower extent manipulation functions, too, but we need to do them before any page cache operations. Reported-by: Andrew Dahl <adahl@sgi.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-By: Andrew Dahl <adahl@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-14 15:11:19 -06:00
Greg Kroah-Hartman	54d5f88f25	Merge v3.7-rc5 into tty-next This pulls in the 3.7-rc5 fixes into tty-next to make it easier to test.	2012-11-14 12:30:12 -08:00
Fengguang Wu	2b4cf668a7	nfsd4: get_backchannel_cred should be static Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-14 11:23:00 -05:00
Fengguang Wu	135ae8270d	nfsd4: init_session should be declared static Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2012-11-14 11:23:00 -05:00
David Teigland	fb6791d100	GFS2: skip dlm_unlock calls in unmount When unmounting, gfs2 does a full dlm_unlock operation on every cached lock. This can create a very large amount of work and can take a long time to complete. However, the vast majority of these dlm unlock operations are unnecessary because after all the unlocks are done, gfs2 leaves the dlm lockspace, which automatically clears the locks of the leaving node, without unlocking each one individually. So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to remove the locks implicitly. The one exception is when the lock's lvb is being used. In this case, dlm_unlock is called because it may update the lvb of the resource. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-14 09:37:04 +00:00
Dave Chinner	de497688da	xfs: make growfs initialise the AGFL header For verification purposes, AGFLs need to be initialised to a known set of values. For upcoming CRC changes, they are also headers that need to be initialised. Currently, growfs does neither for the AGFLs - it ignores them completely. Add initialisation of the AGFL to be full of invalid block numbers (NULLAGBLOCK) to put the infrastructure in place needed for CRC support. Includes a comment clarification from Jeff Liu. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-13 16:40:59 -06:00
Dave Chinner	fd23683c3b	xfs: growfs: use uncached buffers for new headers When writing the new AG headers to disk, we can't attach write verifiers because they have a dependency on the struct xfs-perag being attached to the buffer to be fully initialised and growfs can't fully initialise them until later in the process. The simplest way to avoid this problem is to use uncached buffers for writing the new headers. These buffers don't have the xfs-perag attached to them, so it's simple to detect in the write verifier and be able to skip the checks that need the xfs-perag. This enables us to attach the appropriate buffer ops to the buffer and hence calculate CRCs on the way to disk. IT also means that the buffer is torn down immediately, and so the first access to the AG headers will re-read the header from disk and perform full verification of the buffer. This way we also can catch corruptions due to problems that went undetected in growfs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-13 16:40:43 -06:00
Dave Chinner	b64f3a390d	xfs: use btree block initialisation functions in growfs Factor xfs_btree_init_block() to be independent of the btree cursor, and use the function to initialise btree blocks in the growfs code. This makes adding support for different format btree blocks simple. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by Rich Johnston <rjohnston@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-13 16:40:27 -06:00
Dave Chinner	ee73259b40	xfs: add more attribute tree trace points. Added when debugging recent attribute tree problems to more finely trace code execution through the maze of twisty passages that makes up the attr code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-13 14:47:00 -06:00
Dave Chinner	37eb17e604	xfs: drop buffer io reference when a bad bio is built Error handling in xfs_buf_ioapply_map() does not handle IO reference counts correctly. We increment the b_io_remaining count before building the bio, but then fail to decrement it in the failure case. This leads to the buffer never running IO completion and releasing the reference that the IO holds, so at unmount we can leak the buffer. This leak is captured by this assert failure during unmount: XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273 This is not a new bug - the b_io_remaining accounting has had this problem for a long, long time - it's just very hard to get a zero length bio being built by this code... Further, the buffer IO error can be overwritten on a multi-segment buffer by subsequent bio completions for partial sections of the buffer. Hence we should only set the buffer error status if the buffer is not already carrying an error status. This ensures that a partial IO error on a multi-segment buffer will not be lost. This part of the problem is a regression, however. cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-13 14:45:57 -06:00
Dave Chinner	7bf7f35219	xfs: fix broken error handling in xfs_vm_writepage When we shut down the filesystem, it might first be detected in writeback when we are allocating a inode size transaction. This happens after we have moved all the pages into the writeback state and unlocked them. Unfortunately, if we fail to set up the transaction we then abort writeback and try to invalidate the current page. This then triggers are BUG() in block_invalidatepage() because we are trying to invalidate an unlocked page. Fixing this is a bit of a chicken and egg problem - we can't allocate the transaction until we've clustered all the pages into the IO and we know the size of it (i.e. whether the last block of the IO is beyond the current EOF or not). However, we don't want to hold pages locked for long periods of time, especially while we lock other pages to cluster them into the write. To fix this, we need to make a clear delineation in writeback where errors can only be handled by IO completion processing. That is, once we have marked a page for writeback and unlocked it, we have to report errors via IO completion because we've already started the IO. We may not have submitted any IO, but we've changed the page state to indicate that it is under IO so we must now use the IO completion path to report errors. To do this, add an error field to xfs_submit_ioend() to pass it the error that occurred during the building on the ioend chain. When this is non-zero, mark each ioend with the error and call xfs_finish_ioend() directly rather than building bios. This will immediately push the ioends through completion processing with the error that has occurred. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-13 14:45:45 -06:00
Dave Chinner	07428d7f0c	xfs: fix attr tree double split corruption In certain circumstances, a double split of an attribute tree is needed to insert or replace an attribute. In rare situations, this can go wrong, leaving the attribute tree corrupted. In this case, the attr being replaced is the last attr in a leaf node, and the replacement is larger so doesn't fit in the same leaf node. When we have the initial condition of a node format attribute btree with two leaves at index 1 and 2. Call them L1 and L2. The leaf L1 is completely full, there is not a single byte of free space in it. L2 is mostly empty. The attribute being replaced - call it X - is the last attribute in L1. The way an attribute replace is executed is that the replacement attribute - call it Y - is first inserted into the tree, but has an INCOMPLETE flag set on it so that list traversals ignore it. Once this transaction is committed, a second transaction it run to atomically mark Y as COMPLETE and X as INCOMPLETE, so that a traversal will now find Y and skip X. Once that transaction is committed, attribute X is then removed. So, the initial condition is: +--------+ +--------+ \| L1 \| \| L2 \| \| fwd: 2 \|---->\| fwd: 0 \| \| bwd: 0 \|<----\| bwd: 1 \| \| fsp: 0 \| \| fsp: N \| \|--------\| \|--------\| \| attr A \| \| attr 1 \| \|--------\| \|--------\| \| attr B \| \| attr 2 \| \|--------\| \|--------\| .......... .......... \|--------\| \|--------\| \| attr X \| \| attr n \| +--------+ +--------+ So now we go to replace X, and see that L1:fsp = 0 - it is full so we can't insert Y in the same leaf. So we record the the location of attribute X so we can track it for later use, then we split L1 into L1 and L3 and reblance across the two leafs. We end with: +--------+ +--------+ +--------+ \| L1 \| \| L3 \| \| L2 \| \| fwd: 3 \|---->\| fwd: 2 \|---->\| fwd: 0 \| \| bwd: 0 \|<----\| bwd: 1 \|<----\| bwd: 3 \| \| fsp: M \| \| fsp: J \| \| fsp: N \| \|--------\| \|--------\| \|--------\| \| attr A \| \| attr X \| \| attr 1 \| \|--------\| +--------+ \|--------\| \| attr B \| \| attr 2 \| \|--------\| \|--------\| .......... .......... \|--------\| \|--------\| \| attr W \| \| attr n \| +--------+ +--------+ And we track that the original attribute is now at L3:0. We then try to insert Y into L1 again, and find that there isn't enough room because the new attribute is larger than the old one. Hence we have to split again to make room for Y. We end up with this: +--------+ +--------+ +--------+ +--------+ \| L1 \| \| L4 \| \| L3 \| \| L2 \| \| fwd: 4 \|---->\| fwd: 3 \|---->\| fwd: 2 \|---->\| fwd: 0 \| \| bwd: 0 \|<----\| bwd: 1 \|<----\| bwd: 4 \|<----\| bwd: 3 \| \| fsp: M \| \| fsp: J \| \| fsp: J \| \| fsp: N \| \|--------\| \|--------\| \|--------\| \|--------\| \| attr A \| \| attr Y \| \| attr X \| \| attr 1 \| \|--------\| + INCOMP + +--------+ \|--------\| \| attr B \| +--------+ \| attr 2 \| \|--------\| \|--------\| .......... .......... \|--------\| \|--------\| \| attr W \| \| attr n \| +--------+ +--------+ And now we have the new (incomplete) attribute @ L4:0, and the original attribute at L3:0. At this point, the first transaction is committed, and we move to the flipping of the flags. This is where we are supposed to end up with this: +--------+ +--------+ +--------+ +--------+ \| L1 \| \| L4 \| \| L3 \| \| L2 \| \| fwd: 4 \|---->\| fwd: 3 \|---->\| fwd: 2 \|---->\| fwd: 0 \| \| bwd: 0 \|<----\| bwd: 1 \|<----\| bwd: 4 \|<----\| bwd: 3 \| \| fsp: M \| \| fsp: J \| \| fsp: J \| \| fsp: N \| \|--------\| \|--------\| \|--------\| \|--------\| \| attr A \| \| attr Y \| \| attr X \| \| attr 1 \| \|--------\| +--------+ + INCOMP + \|--------\| \| attr B \| +--------+ \| attr 2 \| \|--------\| \|--------\| .......... .......... \|--------\| \|--------\| \| attr W \| \| attr n \| +--------+ +--------+ But that doesn't happen properly - the attribute tracking indexes are not pointing to the right locations. What we end up with is both the old attribute to be removed pointing at L4:0 and the new attribute at L4:1. On a debug kernel, this assert fails like so: XFS: Assertion failed: args->index2 < be16_to_cpu(leaf2->hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725 because the new attribute location does not exist. On a production kernel, this goes unnoticed and the code proceeds ahead merrily and removes L4 because it thinks that is the block that is no longer needed. This leaves the hash index node pointing to entries L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the leaf level sibling list is L1 <-> L4 <-> L2, but L4 is now free space, and so everything is busted. This corruption is caused by the removal of the old attribute triggering a join - it joins everything correctly but then frees the wrong block. xfs_repair will report something like: bad sibling back pointer for block 4 in attribute fork for inode 131 problem with attribute contents in inode 131 would clear attr fork bad nblocks 8 for inode 131, would reset to 3 bad anextents 4 for inode 131, would reset to 0 The problem lies in the assignment of the old/new blocks for tracking purposes when the double leaf split occurs. The first split tries to place the new attribute inside the current leaf (i.e. "inleaf == true") and moves the old attribute (X) to the new block. This sets up the old block/index to L1:X, and newly allocated block to L3:0. It then moves attr X to the new block and tries to insert attr Y at the old index. That fails, so it splits again. With the second split, the rebalance ends up placing the new attr in the second new block - L4:0 - and this is where the code goes wrong. What is does is it sets both the new and old block index to the second new block. Hence it inserts attr Y at the right place (L4:0) but overwrites the current location of the attr to replace that is held in the new block index (currently L3:0). It over writes it with L4:1 - the index we later assert fail on. Hopefully this table will show this in a foramt that is a bit easier to understand: Split old attr index new attr index vanilla patched vanilla patched before 1st L1:26 L1:26 N/A N/A after 1st L3:0 L3:0 L1:26 L1:26 after 2nd L4:0 L3:0 L4:1 L4:0 ^^^^ ^^^^ wrong wrong The fix is surprisingly simple, for all this analysis - just stop the rebalance on the out-of leaf case from overwriting the new attr index - it's already correct for the double split case. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>	2012-11-13 14:45:29 -06:00
Steven Whitehouse	aa8920c968	GFS2: Fix one RG corner case For filesystems with only a single resource group, we need to be careful that the allocation loop will not land up with a NULL resource group. This fixes a bug in a previous patch where the gfs2_rgrpd_get_next() function was being used instead of gfs2_rgrpd_get_first() Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-13 14:50:35 +00:00
Bob Peterson	4327a9bf71	GFS2: Eliminate redundant buffer_head manipulation in gfs2_unlink_inode Since we now have a dirty_inode that takes care of manipulating the inode buffer and writing from the inode to the buffer, we can eliminate some unnecessary buffer manipulations in gfs2_unlink_inode that are now redundant. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-13 09:55:26 +00:00
Bob Peterson	343cd8f0d7	GFS2: Use dirty_inode in gfs2_dir_add This patch changes the gfs2_dir_add function so that it uses the dirty_inode function (via mark_inode_dirty) rather than manually updating the dinode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-13 09:54:54 +00:00
Steven Whitehouse	fa731fc4e0	GFS2: Fix truncation of journaled data files This patch fixes an issue relating to not having enough revokes available when truncating journaled data files. In order to ensure that we do no run out, the truncation is broken into separate pieces if it is large enough. Tested using fsx on a journaled data file. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2012-11-13 09:50:28 +00:00
Darrick J. Wong	c6af8803cd	ext4: don't verify checksums of dx non-leaf nodes during fallback scan During a directory entry lookup of a hashed directory, if the hash-based lookup functions fail and we fall back to a linear scan, don't try to verify the dirent checksum on the internal nodes of the hash tree because they don't store a checksum in a hidden dirent like the leaf nodes do. Reported-by: George Spelvin <linux@horizon.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2012-11-12 23:51:02 -05:00

... 206 207 208 209 210 ...

39703 Commits