linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-28 23:21:31 +00:00

Author	SHA1	Message	Date
Miklos Szeredi	56656e960b	ovl: rename is_merge to is_lowest The 'is_merge' is an historical naming from when only a single lower layer could exist. With the introduction of multiple lower layers the meaning of this flag was changed to mean only the "lowest layer" (while all lower layers were being merged). So now 'is_merge' is inaccurate and hence renaming to 'is_lowest' Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:46 +01:00
Sohom Bhattacharjee	f134f24465	ovl: fixed coding style warning This patch fixes a newline warning found by the checkpatch.pl tool Signed-off-by: Sohom-Bhattacharjee <soham.bhattacharjee15@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:45 +01:00
Vivek Goyal	45aebeaf4f	ovl: Ensure upper filesystem supports d_type In some instances xfs has been created with ftype=0 and there if a file on lower fs is removed, overlay leaves a whiteout in upper fs but that whiteout does not get filtered out and is visible to overlayfs users. And reason it does not get filtered out because upper filesystem does not report file type of whiteout as DT_CHR during iterate_dir(). So it seems to be a requirement that upper filesystem support d_type for overlayfs to work properly. Do this check during mount and fail if d_type is not supported. Suggested-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:45 +01:00
David Howells	fb5bb2c3b7	ovl: Warn on copy up if a process has a R/O fd open to the lower file Print a warning when overlayfs copies up a file if the process that triggered the copy up has a R/O fd open to the lower file being copied up. This can help catch applications that do things like the following: fd1 = open("foo", O_RDONLY); fd2 = open("foo", O_RDWR); where they expect fd1 and fd2 to refer to the same file - which will no longer be the case post-copy up. With this patch, the following commands: bash 5</mnt/a/foo128 6<>/mnt/a/foo128 assuming /mnt/a/foo128 to be an un-copied up file on an overlay will produce the following warning in the kernel log: overlayfs: Copying up foo129, but open R/O on fd 5 which will cease to be coherent [pid=3818 bash] This is enabled by setting: /sys/module/overlay/parameters/check_copy_up to 1. The warnings are ratelimited and are also limited to one warning per file - assuming the copy up completes in each case. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:45 +01:00
Konstantin Khlebnikov	07f2af7bfd	ovl: honor flag MS_SILENT at mount This patch hides error about missing lowerdir if MS_SILENT is set. We use mount(NULL, "/", "overlay", MS_SILENT, NULL) for testing support of overlayfs: syscall returns -ENODEV if it's not supported. Otherwise kernel automatically loads module and returns -EINVAL because lowerdir is missing. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:45 +01:00
Miklos Szeredi	11f3710417	ovl: verify upper dentry before unlink and rename Unlink and rename in overlayfs checked the upper dentry for staleness by verifying upper->d_parent against upperdir. However the dentry can go stale also by being unhashed, for example. Expand the verification to actually look up the name again (under parent lock) and check if it matches the upper dentry. This matches what the VFS does before passing the dentry to filesytem's unlink/rename methods, which excludes any inconsistency caused by overlayfs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:44 +01:00
Chris Mason	389f239c53	btrfs: make sure we stay inside the bvec during __btrfs_lookup_bio_sums Commit `c40a3d38af` (Btrfs: Compute and look up csums based on sectorsized blocks) changes around how we walk the bios while looking up crcs. There's an inner loop that is jumping to the next bvec based on sectors and before it derefs the next bvec, it needs to make sure we're still in the bio. In this case, the outer loop would have decided to stop moving forward too, and the bvec deref is never actually used for anything. But CONFIG_DEBUG_PAGEALLOC catches it because we're outside our bio. Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: David Sterba <dsterba@suse.com>	2016-03-21 07:25:44 -07:00
Linus Torvalds	643ad15d47	Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 protection key support from Ingo Molnar: "This tree adds support for a new memory protection hardware feature that is available in upcoming Intel CPUs: 'protection keys' (pkeys). There's a background article at LWN.net: https://lwn.net/Articles/643797/ The gist is that protection keys allow the encoding of user-controllable permission masks in the pte. So instead of having a fixed protection mask in the pte (which needs a system call to change and works on a per page basis), the user can map a (handful of) protection mask variants and can change the masks runtime relatively cheaply, without having to change every single page in the affected virtual memory range. This allows the dynamic switching of the protection bits of large amounts of virtual memory, via user-space instructions. It also allows more precise control of MMU permission bits: for example the executable bit is separate from the read bit (see more about that below). This tree adds the MM infrastructure and low level x86 glue needed for that, plus it adds a high level API to make use of protection keys - if a user-space application calls: mmap(..., PROT_EXEC); or mprotect(ptr, sz, PROT_EXEC); (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice this special case, and will set a special protection key on this memory range. It also sets the appropriate bits in the Protection Keys User Rights (PKRU) register so that the memory becomes unreadable and unwritable. So using protection keys the kernel is able to implement 'true' PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies PROT_READ as well. Unreadable executable mappings have security advantages: they cannot be read via information leaks to figure out ASLR details, nor can they be scanned for ROP gadgets - and they cannot be used by exploits for data purposes either. We know about no user-space code that relies on pure PROT_EXEC mappings today, but binary loaders could start making use of this new feature to map binaries and libraries in a more secure fashion. There is other pending pkeys work that offers more high level system call APIs to manage protection keys - but those are not part of this pull request. Right now there's a Kconfig that controls this feature (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled (like most x86 CPU feature enablement code that has no runtime overhead), but it's not user-configurable at the moment. If there's any serious problem with this then we can make it configurable and/or flip the default" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/mm/pkeys: Fix mismerge of protection keys CPUID bits mm/pkeys: Fix siginfo ABI breakage caused by new u64 field x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA mm/core, x86/mm/pkeys: Add execute-only protection keys support x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags x86/mm/pkeys: Allow kernel to modify user pkey rights register x86/fpu: Allow setting of XSAVE state x86/mm: Factor out LDT init from context init mm/core, x86/mm/pkeys: Add arch_validate_pkey() mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits() x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU x86/mm/pkeys: Add Kconfig prompt to existing config option x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps x86/mm/pkeys: Dump PKRU with other kernel registers mm/core, x86/mm/pkeys: Differentiate instruction fetches x86/mm/pkeys: Optimize fault handling in access_error() mm/core: Do not enforce PKEY permissions on remote mm access um, pkeys: Add UML arch_*_access_permitted() methods mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys x86/mm/gup: Simplify get_user_pages() PTE bit handling ...	2016-03-20 19:08:56 -07:00
Andreas Gruenbacher	c27cb97218	ubifs: Remove unused header UBIFS does not support POSIX ACLs, so there is no need for including any POSIX ACL hesders. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2016-03-20 21:37:46 +01:00
Joe Perches	3e7f2c5104	ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warn The existing logging macros are fairly large and converting the macros to functions make the object code smaller. Use %pV and __builtin_return_address(0) as appropriate. $ size fs/ubifs/built-in.o* text data bss dec hex filename 575831 309688 161312 1046831 ff92f fs/ubifs/built-in.o.allyesconfig.new 622457 312872 161120 1096449 10bb01 fs/ubifs/built-in.o.allyesconfig.old 223785 640 644 225069 36f2d fs/ubifs/built-in.o.defconfig.new 251873 640 644 253157 3dce5 fs/ubifs/built-in.o.defconfig.old Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2016-03-20 21:36:05 +01:00
Tejun Heo	aaf2559332	writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode When cgroup writeback is in use, there can be multiple wb's (bdi_writeback's) per bdi and an inode may switch among them dynamically. In a couple places, the wrong wb was used leading to performing operations on the wrong list under the wrong lock corrupting the io lists. * writeback_single_inode() was taking @wb parameter and used it to remove the inode from io lists if it becomes clean after writeback. The callers of this function were always passing in the root wb regardless of the actual wb that the inode was associated with, which could also change while writeback is in progress. Fix it by dropping the @wb parameter and using inode_to_wb_and_lock_list() to determine and lock the associated wb. * After writeback_sb_inodes() writes out an inode, it re-locks @wb and inode to remove it from or move it to the right io list. It assumes that the inode is still associated with @wb; however, the inode may have switched to another wb while writeback was in progress. Fix it by using inode_to_wb_and_lock_list() to determine and lock the associated wb after writeback is complete. As the function requires the original @wb->list_lock locked for the next iteration, in the unlikely case where the inode has changed association, switch the locks. Kudos to Tahsin for pinpointing these subtle breakages. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `d10c809552` ("writeback: implement foreign cgroup inode bdi_writeback switching") Link: http://lkml.kernel.org/g/CAAeU0aMYeM_39Y2+PaRvyB1nqAPYZSNngJ1eBRmrxn7gKAt2Mg@mail.gmail.com Reported-and-diagnosed-by: Tahsin Erdogan <tahsin@google.com> Tested-by: Tahsin Erdogan <tahsin@google.com> Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-20 09:44:20 -06:00
Tejun Heo	614a4e3773	writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list() locked_inode_to_wb_and_lock_list() wb_get()'s the wb associated with the target inode, unlocks inode, locks the wb's list_lock and verifies that the inode is still associated with the wb. To prevent the wb going away between dropping inode lock and acquiring list_lock, the wb is pinned while inode lock is held. The wb reference is put right after acquiring list_lock citing that the wb won't be dereferenced anymore. This isn't true. If the inode is still associated with the wb, the inode has reference and it's safe to return the wb; however, if inode has been switched, the wb still needs to be unlocked which is a dereference and can lead to use-after-free if it it races with wb destruction. Fix it by putting the reference after releasing list_lock. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: `87e1d789bf` ("writeback: implement [locked_]inode_to_wb_and_lock_list()") Cc: stable@vger.kernel.org # v4.2+ Tested-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-20 09:44:18 -06:00
Linus Torvalds	3c2de27d79	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: - Preparations of parallel lookups (the remaining main obstacle is the need to move security_d_instantiate(); once that becomes safe, the rest will be a matter of rather short series local to fs/.c - preadv2/pwritev2 series from Christoph - assorted fixes 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (32 commits) splice: handle zero nr_pages in splice_to_pipe() vfs: show_vfsstat: do not ignore errors from show_devname method dcache.c: new helper: __d_add() don't bother with __d_instantiate(dentry, NULL) untangle fsnotify_d_instantiate() a bit uninline d_add() replace d_add_unique() with saner primitive quota: use lookup_one_len_unlocked() cifs_get_root(): use lookup_one_len_unlocked() nfs_lookup: don't bother with d_instantiate(dentry, NULL) kill dentry_unhash() ceph_fill_trace(): don't bother with d_instantiate(dn, NULL) autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup() configfs: move d_rehash() into configfs_create() for regular files ceph: don't bother with d_rehash() in splice_dentry() namei: teach lookup_slow() to skip revalidate namei: massage lookup_slow() to be usable by lookup_one_len_unlocked() lookup_one_len_unlocked(): use lookup_dcache() namei: simplify invalidation logics in lookup_dcache() namei: change calling conventions for lookup_{fast,slow} and follow_managed() ...	2016-03-19 18:52:29 -07:00
Linus Torvalds	814a2bf957	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - a couple of hotfixes - the rest of MM - a new timer slack control in procfs - a couple of procfs fixes - a few misc things - some printk tweaks - lib/ updates, notably to radix-tree. - add my and Nick Piggin's old userspace radix-tree test harness to tools/testing/radix-tree/. Matthew said it was a godsend during the radix-tree work he did. - a few code-size improvements, switching to __always_inline where gcc screwed up. - partially implement character sets in sscanf * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) sscanf: implement basic character sets lib/bug.c: use common WARN helper param: convert some "on"/"off" users to strtobool lib: add "on"/"off" support to kstrtobool lib: update single-char callers of strtobool() lib: move strtobool() to kstrtobool() include/linux/unaligned: force inlining of byteswap operations include/uapi/linux/byteorder, swab: force inlining of some byteswap operations include/asm-generic/atomic-long.h: force inlining of some atomic_long operations usb: common: convert to use match_string() helper ide: hpt366: convert to use match_string() helper ata: hpt366: convert to use match_string() helper power: ab8500: convert to use match_string() helper power: charger_manager: convert to use match_string() helper drm/edid: convert to use match_string() helper pinctrl: convert to use match_string() helper device property: convert to use match_string() helper lib/string: introduce match_string() helper radix-tree tests: add test for radix_tree_iter_next radix-tree tests: add regression3 test ...	2016-03-18 19:26:54 -07:00
Linus Torvalds	35d88d97be	Merge branch 'for-4.6/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: "Here are the core block changes for this merge window. Not a lot of exciting stuff going on in this round, most of the changes have been on the driver side of things. That pull request is coming next. This pull request contains: - A set of fixes for chained bio handling from Christoph. - A tag bounds check for blk-mq from Hannes, ensuring that we don't do something stupid if a device reports an invalid tag value. - A set of fixes/updates for the CFQ IO scheduler from Jan Kara. - A set of blk-mq fixes from Keith, adding support for dynamic hardware queues, and fixing init of max_dev_sectors for stacking devices. - A fix for the dynamic hw context from Ming. - Enabling of cgroup writeback support on a block device, from Shaohua" * 'for-4.6/core' of git://git.kernel.dk/linux-block: blk-mq: add bounds check on tag-to-rq conversion block: bio_remaining_done() isn't unlikely block: cleanup bio_endio block: factor out chained bio completion block: don't unecessarily clobber bi_error for chained bios block-dev: enable writeback cgroup support blk-mq: Fix NULL pointer updating nr_requests blk-mq: mark request queue as mq asap block: Initialize max_dev_sectors to 0 blk-mq: dynamic h/w context count cfq-iosched: Allow parent cgroup to preempt its child cfq-iosched: Allow sync noidle workloads to preempt each other cfq-iosched: Reorder checks in cfq_should_preempt() cfq-iosched: Don't group_idle if cfqq has big thinktime	2016-03-18 16:43:11 -07:00
Al Viro	8b23a8ce10	Merge branches 'work.lookups', 'work.misc' and 'work.preadv2' into for-next	2016-03-18 16:07:38 -04:00
Rabin Vincent	d6785d9152	splice: handle zero nr_pages in splice_to_pipe() Running the following command: busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null with any tracing enabled pretty very quickly leads to various NULL pointer dereferences and VM BUG_ON()s, such as these: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0) kernel BUG at include/linux/mm.h:367! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40 Call Trace: [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0 [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10 [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0 [<ffffffff81196869>] do_sendfile+0x199/0x380 [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0 [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89 (busybox's cat uses sendfile(2), unlike the coreutils version) This is because tracing_splice_read_pipe() can call splice_to_pipe() with spd->nr_pages == 0. spd_pages underflows in splice_to_pipe() and we fill the page pointers and the other fields of the pipe_buffers with garbage. All other callers of splice_to_pipe() avoid calling it when nr_pages == 0, and we could make tracing_splice_read_pipe() do that too, but it seems reasonable to have splice_to_page() handle this condition gracefully. Cc: stable@vger.kernel.org Signed-off-by: Rabin Vincent <rabin@rab.in> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-18 16:06:44 -04:00
Christoph Hellwig	10c4de10b2	nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:42:54 -04:00
Christoph Hellwig	f99d4fbdae	nfsd: add SCSI layout support This is a simple extension to the block layout driver to use SCSI persistent reservations for access control and fencing, as well as SCSI VPD pages for device identification. For this we need to pass the nfs4_client to the proc_getdeviceinfo method to generate the reservation key, and add a new fence_client method to allow for fence actions in the layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:42:53 -04:00
Christoph Hellwig	368248eeb1	nfsd: move some blocklayout code Trivial reorganization, no change in behavior. Move some code around, pull some code out of block layoutcommit that will be useful for the scsi layout. [bfields@redhat.com: split off from "nfsd: add SCSI layout support"] Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:41:17 -04:00
Christoph Hellwig	81c3932901	nfsd: add a new config option for the block layout driver Split the config symbols into a generic pNFS one, which is invisible and gets selected by the layout drivers, and one for the block layout driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:40:57 -04:00
Christoph Hellwig	d9186c0397	nfs/blocklayout: add SCSI layout support This is a trivial extension to the block layout driver to support the new SCSI layouts draft. There are three changes: - device identifcation through the SCSI VPD page. This allows us to directly use the udev generated persistent device names instead of requiring an expensive lookup by crawling every block device node in /dev and reading a signature for it. - use of SCSI persistent reservations to protect device access and allow for robust fencing. On the client sides this just means registering and unregistering a server supplied key. - an optimized LAYOUTCOMMIT payload that doesn't send unessecary fields to the server. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-18 11:38:17 -04:00
Jaegeuk Kim	12bb0a8fd4	f2fs: submit node page write bios when really required If many threads calls fsync with data writes, we don't need to flush every bios having node page writes. The f2fs_wait_on_page_writeback will flush its bios when the page is really needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:47 -07:00
Arnd Bergmann	fff4c55d36	f2fs: add missing argument to f2fs_setxattr stub The f2fs_setxattr() prototype for CONFIG_F2FS_FS_XATTR=n has been wrong for a long time, since `8ae8f1627f` ("f2fs: support xattr security labels"), but there have never been any callers, so it did not matter. Now, the function gets called from f2fs_ioc_keyctl(), which causes a build failure: fs/f2fs/file.c: In function 'f2fs_ioc_keyctl': include/linux/stddef.h:7:14: error: passing argument 6 of 'f2fs_setxattr' makes integer from pointer without a cast [-Werror=int-conversion] #define NULL ((void )0) ^ fs/f2fs/file.c:1599:27: note: in expansion of macro 'NULL' value, F2FS_KEY_SIZE, NULL, type); ^ In file included from ../fs/f2fs/file.c:29:0: fs/f2fs/xattr.h:129:19: note: expected 'int' but argument is of type 'void ' static inline int f2fs_setxattr(struct inode inode, int index, ^ fs/f2fs/file.c:1597:9: error: too many arguments to function 'f2fs_setxattr' return f2fs_setxattr(inode, F2FS_XATTR_INDEX_KEY, ^ In file included from ../fs/f2fs/file.c:29:0: fs/f2fs/xattr.h:129:19: note: declared here static inline int f2fs_setxattr(struct inode inode, int index, Thsi changes the prototype of the empty stub function to match that of the actual implementation. This will not make the key management work when F2FS_FS_XATTR is disabled, but it gets it to build at least. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:47 -07:00
Chao Yu	d726732c7c	f2fs: fix to avoid unneeded unlock_new_inode During ->lookup, I_NEW state of inode was been cleared in f2fs_iget, so in error path, we don't need to clear it again. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:46 -07:00
Chao Yu	291bf80bec	f2fs: clean up opened code with f2fs_update_dentry Just clean up opened code with existing function, no logic change. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:45 -07:00
Jaegeuk Kim	17a0ee552c	f2fs: declare static functions Just to avoid sparse warnings. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:44 -07:00
Keith Mok	43b6573bac	f2fs: use cryptoapi crc32 functions The crc function is done bit by bit. Optimize this by use cryptoapi crc32 function which is backed by h/w acceleration. Signed-off-by: Keith Mok <ek9852@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:43 -07:00
Fan Li	999270de31	f2fs: modify the readahead method in ra_node_page() ra_node_page() is used to read ahead one node page. Comparing to regular read, it's faster because it doesn't wait for IO completion. But if it is called twice for reading the same block, and the IO request from the first call hasn't been completed before the second call, the second call will have to wait until the read is over. Here use the code in __do_page_cache_readahead() to solve this problem. It does nothing when someone else already puts the page in mapping. The status of page should be assured by whoever puts it there. This implement also prevents alteration of page reference count. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:43 -07:00
Jaegeuk Kim	8074bb5150	f2fs crypto: sync ext4_lookup and ext4_file_open This patch tries to catch up with lookup and open policies in ext4. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:42 -07:00
Jaegeuk Kim	0b81d07790	fs crypto: move per-file encryption from f2fs tree to fs/crypto This patch adds the renamed functions moved from the f2fs crypto files. 1. definitions for per-file encryption used by ext4 and f2fs. 2. crypto.c for encrypt/decrypt functions a. IO preparation: - fscrypt_get_ctx / fscrypt_release_ctx b. before IOs: - fscrypt_encrypt_page - fscrypt_decrypt_page - fscrypt_zeroout_range c. after IOs: - fscrypt_decrypt_bio_pages - fscrypt_pullback_bio_page - fscrypt_restore_control_page 3. policy.c supporting context management. a. For ioctls: - fscrypt_process_policy - fscrypt_get_policy b. For context permission - fscrypt_has_permitted_context - fscrypt_inherit_context 4. keyinfo.c to handle permissions - fscrypt_get_encryption_info - fscrypt_free_encryption_info 5. fname.c to support filename encryption a. general wrapper functions - fscrypt_fname_disk_to_usr - fscrypt_fname_usr_to_disk - fscrypt_setup_filename - fscrypt_free_filename b. specific filename handling functions - fscrypt_fname_alloc_buffer - fscrypt_fname_free_buffer 6. Makefile and Kconfig Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Uday Savagaonkar <savagaon@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:33 -07:00
Linus Torvalds	5cd0911a9e	Allow ram backend to be configured with addresses above 4GB -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW6yKvAAoJEKurIx+X31iBgkkQAJ0GM0zDD3jQx3G95k/KiRZW H70+sZZDnfYGXv22Vv5/u5eGlRGMkmZ6xBX6cwDOsNaUIh3YspjKmrHvcO8mNAcE MpEyMc6aKPmPCQPvNQxSegYEHzLZdN/OIbPYWxihnQ613iSNoYy/Gdgk3bqxWHDU 2p5gvAq5lalTTBz5/nViC65op7qeziIKzzCvUrn1rkycZ7fkPhDaqeKqW6gAQO88 Do4h9rItDL6NDEQR9S2ihbEWRKGZSTYuN0SqFvIYRuly8/aYVy2DCpDbh3aN63zp lIfLhvLPLFTWzFmTbkltD5ZG5qYYRlWZy2vsXKrc1ya+qKiQHn/BTe/9/65pCzP0 IeV9h6JMP7cfAGD+vAKD9PeHaePqL9RYg5F9lbos68IvxkVFF8osaLbMCghomtMJ dtj3ORDh6KSFUOtCsudlc/S9xz3OLL9hmi4QTvF+qDEUEpHY9aY4SqqPcXm/8/Xy babWOBu23V3WCOlLEs9gRsKbZM3gSGT1jlgvSYBmDLJY43bpkcmN7E6WdX8jjPRK vXGfzYp6ZdCJNlBpytLDri34yGDcyKqc3J7c6Wj/D55v4PsQo3mWnPUNTeaeHf3q 4SD6JFvXHSttrquA0G2fkJQ0ksvaWD3kJBRwzzlfqYZ0SbtUc0hjFH6/OOGQD9fg cJ2KH9bgpafvyG2yqF7w =vPCf -----END PGP SIGNATURE----- Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore update from Tony Luck: "Allow ram backend to be configured with addresses above 4GB" * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore: Add support for 64 Bit address space	2016-03-17 16:57:55 -07:00
Linus Torvalds	1ca80a0a3e	GFS2: merge window We only have six patches ready for this merge window. - Arnd Bergmann contributed a patch that fixes an uninitialized variable warning. - The second patch avoids a kernel panic due to referencing an iopen glock that may not be held, in an error path. - The third patch fixes a rounding error that caused xfs_tests direct IO write "fsx" tests to fail on GFS2. - The fourth patch tidies up the code path when glocks are being reused to recreate a dinode that was recently deleted. - The fifth reverts an ages-old patch that should no longer be needed, and which interfered with the transition of dinodes from unlinked to free. - And lastly, a patch to eliminate a function parameter that's not needed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW6qcKAAoJENeLYdPf93o70SwH/0czrFBIWaPbRgyuGLNvye4G qvfIk2Yky3UKfgUA+ZW+cAWkxeKubc9scMwce0elKxm0e03rNwIX0slIOx8hymj3 19AgMnj3kcCJvRLdkBITNqnd6vTY2quadLN3j8I2cCNbHOV0GelEkP4jWcTHB+2F AG0ZJOsvvrcD1ClgdIvGdV52qipZApS/kgZjLpJBEyzxq8SpRe9vNqMDsDyoKWgi yjp0+aJ0IJAWA24fzdT5HE4fb5yGRWehg51l6Z2mbfXAvT+oeKcYrQiq1zhUutHw dT6SUHbjt+y0EulPClsf3r5zjRjVCCbpj0wkUY3kPB98lgnpAD2QnUP/JDA+CD0= =xRmX -----END PGP SIGNATURE----- Merge tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "We only have six patches ready for this merge window: - Arnd Bergmann contributed a patch that fixes an uninitialized variable warning. - The second patch avoids a kernel panic due to referencing an iopen glock that may not be held, in an error path. - The third patch fixes a rounding error that caused xfs_tests direct IO write "fsx" tests to fail on GFS2. - The fourth patch tidies up the code path when glocks are being reused to recreate a dinode that was recently deleted. - The fifth reverts an ages-old patch that should no longer be needed, and which interfered with the transition of dinodes from unlinked to free. - And lastly, a patch to eliminate a function parameter that's not needed" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Eliminate parameter non_block on gfs2_inode_lookup GFS2: Don't filter out I_FREEING inodes anymore GFS2: Prevent delete work from occurring on glocks used for create GFS2: Fix direct IO write rounding error gfs2: avoid uninitialized variable warning GFS2: Check if iopen is held when deleting inode	2016-03-17 16:51:32 -07:00
Linus Torvalds	d77bed0d4c	dlm for 4.6 Previous changes introduced the use of socket error reporting for dlm sockets. This set includes two fixes in how the socket error callbacks are used. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW4ISmAAoJEDgbc8f8gGmqnWgQANnS13rXt4NxnRtNqzrEUmjk NbvP4BxeWKinxhc9ObUnV0CzDllYa2A6Te2AageCQr/qhRKfnnaQJczJ39Xp+289 oSgJJOo4HGCPjshrq9BwGo8ufijUIc0MsT64TzeI3ww58b1eK2CLoC9uiLDyYwjM Hw0PRXU/MAxzOWJIIWgkh78FQmx8fswOSNyK49/p3/INMVNFxn75bd+shtxUOuCp 50gmI6DG4gGJDK3vtIjZStJhW4lcaM3tjGZ9+mcLQF2PZK5zIeHSr3nEfzJ4Qwps 0p55JeiXdfff6RTrxqnJewc+xysmD9594wG0G0VqLsWaLWulDrHZMFsVg1J10frk bk0WwLjsYG/wLVZtKRe+3QwOyqgTxx1Cea8ZYB2yMcFBkDYxFmh8a00kJcVuucnS W+w4rhI9blk1cc4eHhnuBIi5m2jbelu4NPG5722ORtv+gNpBl2ptecqIjfuhr8xE IIF5tnkZb8lBuLyhCmg8in2mKnY5aKSk3kuQ98rDXZUMCLT0PKG2ZNsXJjpX6G38 uQ+sB9rH6c5pIe0S3keS2f2Ly3V7gtBErA0otyxaq/XlxnJeFlLU5G1chHUeW8VP qxhtjDShPuIA97MxE2GA3ehr7r3jbOb+8qOc13E9ygXDyXNN0tb4JIeB0beNLdRx db8Lt+OR10IUawNyuFB9 =qvzY -----END PGP SIGNATURE----- Merge tag 'dlm-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: "Previous changes introduced the use of socket error reporting for dlm sockets. This set includes two fixes in how the socket error callbacks are used" * tag 'dlm-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: DLM: Save and restore socket callbacks properly DLM: Replace nodeid_to_addr with kernel_getpeername	2016-03-17 16:38:36 -07:00
Linus Torvalds	faeb20ecfa	Performance improvements in SEEK_DATA and xattr scalability improvements, plus a lot of clean ups and bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJW6c9mAAoJEPL5WVaVDYGjWsEIAJkWUvKB3GgGgP82sKDBP2P8 IbWegO1ICMrSY78BqLI7mLCqggH5JClBgYU3O4VFv8Brj1L9mS5X+vflaDE1j9jj Ik1KZKtZl1opOwO1L3D4l/ipZAiENUp7NehTtpsFousmz6nMZ5vo6x4t3QSwbUIm YXpxUIxHEhBcW5i3EDkfYG8305V5oj8HsVf6T98OlWGpBO5VGNMAHvA7CQdQe7Rd chv70rij5V684bJAEoosEFXVAuOUrxcBqbFA3Nlb432YOPj0ISLx76kw0GIjUYtf yjoSClbRgwxGzh0jm+yaoYjjm83xbsYbHSsBmh3+/QLMbKTLXeCqR/BiqJavmcM= =bWpz -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Performance improvements in SEEK_DATA and xattr scalability improvements, plus a lot of clean ups and bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (38 commits) ext4: clean up error handling in the MMP support jbd2: do not fail journal because of frozen_buffer allocation failure ext4: use __GFP_NOFAIL in ext4_free_blocks() ext4: fix compile error while opening the macro DOUBLE_CHECK ext4: print ext4 mount option data_err=abort correctly ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry() ext4: fix misspellings in comments. jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path ext4: more efficient SEEK_DATA implementation ext4: cleanup handling of bh->b_state in DAX mmap ext4: return hole from ext4_map_blocks() ext4: factor out determining of hole size ext4: fix setting of referenced bit in ext4_es_lookup_extent() ext4: remove i_ioend_count ext4: simplify io_end handling for AIO DIO ext4: move trans handling and completion deferal out of _ext4_get_block ext4: rename and split get blocks functions ext4: use i_mutex to serialize unaligned AIO DIO ext4: pack ioend structure better ...	2016-03-17 16:31:18 -07:00
Linus Torvalds	364e8dd9d6	Configfs changes for the 4.6 merge window: - A large patch from me to simplify setting up the list of default groups by actually implementing it as a list instead of an array. - a small Y2083 prep patch from Deepa Dinamani. Probably doesn't matter on it's own, but it seems like he is trying to get rid of all CURRENT_TIME uses in file systems, which is a worthwhile goal. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW6Cz6AAoJEA+eU2VSBFGDmNYP/AzJuVdkXjOkzmAl0SjwS0UC b/gTF0Z0jAmXX8QTf0NtdNajHweYyY4PVvyuUYojO/Y9bgJigRC6gHIUviq8TLhO JR1EUJ3RNoWFZSHeEGTM4q+kSg3GkZ83WixeBiMkIZo7QgPXU2YB0mzErpdcID3N +KVnoVU+asVQi656UIDNZ1SawTAGog+tIMIgnM4vmL0Dd+9yN4pYhAmRLLS0C83P DPci/oVx1a3IjWAkmz24qtb9ht/SA+IBwyFPltg/gdn5OgJL9Vr1naW5mkqMhoPF PUBfX9YYizMwNMYuchng6JqyWlZBjXFr6iqi401vFJcILeq27As5Kc9adfDOEvVC V/dWCmTyMlHX507t+lC7kTa6OaHAZKA5scCHA6dgpQIvGfiaMNNu7MW8C6p0HqwY rf7na7S2fAu5zCyIRVPK//YMNbRHh2AoclzpK7Sw0NCV5jBlXZOdDJcSb4jQsVF7 Yy84EqcebvF4ocaFRzwA/ZHNxz65l5Qu7brmOu6pTliQuQED1fop5z92RXkw2e9y rSIgzMCL5IoAUkYtoO1jzAQXzyySAb3QDpwCaBdZLzN4MbRF/dUxZDkOePKTaVft ckNXj5AVzvLYlpkmkhQ+bqsh91ayFH2/gw9Kt38i1yjzNLhsccZwq9ja5ifPlHLQ nOFiane31yp3Zhac8drb =9HqT -----END PGP SIGNATURE----- Merge tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs Pull configfs updates from Christoph Hellwig: - A large patch from me to simplify setting up the list of default groups by actually implementing it as a list instead of an array. - a small Y2083 prep patch from Deepa Dinamani. Probably doesn't matter on it's own, but it seems like he is trying to get rid of all CURRENT_TIME uses in file systems, which is a worthwhile goal. * tag 'configfs-for-linus' of git://git.infradead.org/users/hch/configfs: configfs: switch ->default groups to a linked list configfs: Replace CURRENT_TIME by current_fs_time()	2016-03-17 16:25:46 -07:00
Kees Cook	1404297ebf	lib: update single-char callers of strtobool() Some callers of strtobool() were passing a pointer to unterminated strings. In preparation of adding multi-character processing to kstrtobool(), update the callers to not pass single-character pointers, and switch to using the new kstrtobool_from_user() helper where possible. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Amitkumar Karwar <akarwar@marvell.com> Cc: Nishant Sarmukadam <nishants@marvell.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Steve French <sfrench@samba.org> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Joe Perches <joe@perches.com> Cc: Kees Cook <keescook@chromium.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Matthew Wilcox	c28f242063	btrfs: use radix_tree_iter_retry() Even though this is a 'can't happen' situation, use the new radix_tree_iter_retry() pattern to eliminate a goto. [akpm@linux-foundation.org: fix btrfs build] Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Dave Young	0b50a2d86d	proc-vmcore: wrong data type casting fix On i686 PAE enabled machine the contiguous physical area could be large and it can cause trimming down variables in below calculation in read_vmcore() and mmap_vmcore(): tsz = min_t(size_t, m->offset + m->size - fpos, buflen); That is, the types being used is like below on i686: m->offset: unsigned long long int m->size: unsigned long long int fpos: loff_t (long long int) buflen: size_t (unsigned int) So casting (m->offset + m->size - fpos) by size_t means truncating a given value by 4GB. Suppose (m->offset + m->size - fpos) being truncated to 0, buflen >0 then we will get tsz = 0. It is of course not an expected result. Similarly we could also get other truncated values less than buflen. Then the real size passed down is not correct any more. If (m->offset + m->size - *fpos) is above 4GB, read_vmcore or mmap_vmcore use the min_t result with truncated values being compared to buflen. Then, fpos proceeds with the wrong value so that we reach below bugs: 1) read_vmcore will refuse to continue so makedumpfile fails. 2) mmap_vmcore will trigger BUG_ON() in remap_pfn_range(). Use unsigned long long in min_t instead so that the variables in are not truncated. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Dave Young <dyoung@redhat.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Jianyu Zhan <nasa4836@gmail.com> Cc: Minfei Huang <mhuang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Minfei Huang	7e2bc81da3	proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan" It is not elegant that prompt shell does not start from new line after executing "cat /proc/$pid/wchan". Make prompt shell start from new line. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Eric Engestrom	b5946beaa9	procfs: add conditional compilation check `proc_timers_operations` is only used when CONFIG_CHECKPOINT_RESTORE is enabled. Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
John Stultz	5de23d435e	proc: add /proc/<pid>/timerslack_ns interface This patch provides a proc/PID/timerslack_ns interface which exposes a task's timerslack value in nanoseconds and allows it to be changed. This allows power/performance management software to set timer slack for other threads according to its policy for the thread (such as when the thread is designated foreground vs. background activity) If the value written is non-zero, slack is set to that value. Otherwise sets it to the default for the thread. This interface checks that the calling task has permissions to to use PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure arbitrary apps do not change the timer slack for other apps. Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
John Stultz	da8b44d5a9	timer: convert timer_slack_ns from unsigned long to u64 This patchset introduces a /proc/<pid>/timerslack_ns interface which would allow controlling processes to be able to set the timerslack value on other processes in order to save power by avoiding wakeups (Something Android currently does via out-of-tree patches). The first patch tries to fix the internal timer_slack_ns usage which was defined as a long, which limits the slack range to ~4 seconds on 32bit systems. It converts it to a u64, which provides the same basically unlimited slack (500 years) on both 32bit and 64bit machines. The second patch introduces the /proc/<pid>/timerslack_ns interface which allows the full 64bit slack range for a task to be read or set on both 32bit and 64bit machines. With these two patches, on a 32bit machine, after setting the slack on bash to 10 seconds: $ time sleep 1 real 0m10.747s user 0m0.001s sys 0m0.005s The first patch is a little ugly, since I had to chase the slack delta arguments through a number of functions converting them to u64s. Let me know if it makes sense to break that up more or not. Other than that things are fairly straightforward. This patch (of 2): The timer_slack_ns value in the task struct is currently a unsigned long. This means that on 32bit applications, the maximum slack is just over 4 seconds. However, on 64bit machines, its much much larger (~500 years). This disparity could make application development a little (as well as the default_slack) to a u64. This means both 32bit and 64bit systems have the same effective internal slack range. Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify the interface as a unsigned long, so we preserve that limitation on 32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is actually larger then what can be stored by an unsigned long. This patch also modifies hrtimer functions which specified the slack delta as a unsigned long. Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Joonsoo Kim	fe896d1878	mm: introduce page reference manipulation functions The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Igor Redko	d02bd27bd3	mm/page_alloc.c: calculate 'available' memory in a separate function Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory statistics protocol, corresponding to 'Available' in /proc/meminfo. It indicates to the hypervisor how big the balloon can be inflated without pushing the guest system to swap. This metric would be very useful in VM orchestration software to improve memory management of different VMs under overcommit. This patch (of 2): Factor out calculation of the available memory counter into a separate exportable function, in order to be able to use it in other parts of the kernel. In particular, it appears a relevant metric to report to the hypervisor via virtio-balloon statistics interface (in a followup patch). Signed-off-by: Igor Redko <redkoi@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Naoya Horiguchi	0a71649cb7	/proc/kpageflags: return KPF_SLAB for slab tail pages Currently /proc/kpageflags returns just KPF_COMPOUND_TAIL for slab tail pages, which is inconvenient when grasping how slab pages are distributed (userspace always needs to check which kind of tail pages by itself). This patch sets KPF_SLAB for such pages. With this patch: $ grep Slab /proc/meminfo ; tools/vm/page-types -b slab Slab: 64880 kB flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000080 16220 63 _______S__________________________________ slab total 16220 63 16220 pages equals to 64880 kB, so returned result is consistent with the global counter. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Naoya Horiguchi	832fc1de01	/proc/kpageflags: return KPF_BUDDY for "tail" buddy pages Currently /proc/kpageflags returns nothing for "tail" buddy pages, which is inconvenient when grasping how free pages are distributed. This patch sets KPF_BUDDY for such pages. With this patch: $ grep MemFree /proc/meminfo ; tools/vm/page-types -b buddy MemFree: 3134992 kB flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000400 779272 3044 __________B_______________________________ buddy 0x0000000000000c00 4385 17 __________BM______________________________ buddy,mmap total 783657 3061 783657 pages is 3134628 kB (roughly consistent with the global counter,) so it's OK. [akpm@linux-foundation.org: update comment, per Naoya] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-17 15:09:34 -07:00
Linus Torvalds	8eee93e257	Char/Misc patches for 4.6-rc1 Here is the big char/misc driver update for 4.6-rc1. The majority of the patches here is hwtracing and some new mic drivers, but there's a lot of other driver updates as well. Full details in the shortlog. All have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlbp9IcACgkQMUfUDdst+ykyJgCeLTC2QNGrh51kiJglkVJ0yD36 q4MAn0NkvSX2+iv5Jq8MaX6UQoRa4Nun =MNjR -----END PGP SIGNATURE----- Merge tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc updates from Greg KH: "Here is the big char/misc driver update for 4.6-rc1. The majority of the patches here is hwtracing and some new mic drivers, but there's a lot of other driver updates as well. Full details in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (238 commits) goldfish: Fix build error of missing ioremap on UM nvmem: mediatek: Fix later provider initialization nvmem: imx-ocotp: Fix return value of imx_ocotp_read nvmem: Fix dependencies for !HAS_IOMEM archs char: genrtc: replace blacklist with whitelist drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular drivers: char: mem: fix IS_ERROR_VALUE usage char: xillybus: Fix internal data structure initialization pch_phub: return -ENODATA if ROM can't be mapped Drivers: hv: vmbus: Support kexec on ws2012 r2 and above Drivers: hv: vmbus: Support handling messages on multiple CPUs Drivers: hv: utils: Remove util transport handler from list if registration fails Drivers: hv: util: Pass the channel information during the init call Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload() Drivers: hv: vmbus: remove code duplication in message handling Drivers: hv: vmbus: avoid wait_for_completion() on crash Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages misc: at24: replace memory_accessor with nvmem_device_read eeprom: 93xx46: extend driver to plug into the NVMEM framework eeprom: at25: extend driver to plug into the NVMEM framework ...	2016-03-17 13:47:50 -07:00
Linus Torvalds	1a4ab084af	Driver core patches for 4.6-rc1 Just a few patches this time around for the 4.6-rc1 merge window. Largest is a new firmware driver, but there are some other updates to the driver core in here as well, the shortlog has the details. All have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlbp9QoACgkQMUfUDdst+ynsOQCghpfAf3CJDr4PWGCKzDJzyQG9 rZYAn2VwKsqHzAxgLXZY5fQIjxSyaLek =Mcvl -----END PGP SIGNATURE----- Merge tag 'driver-core-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Just a few patches this time around for the 4.6-rc1 merge window. Largest is a new firmware driver, but there are some other updates to the driver core in here as well, the shortlog has the details. All have been in linux-next for a while with no reported issues" * tag 'driver-core-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Revert "driver-core: platform: probe of-devices only using list of compatibles" firmware: qemu config needs I/O ports firmware: qemu_fw_cfg.c: fix typo FW_CFG_DATA_OFF driver-core: platform: probe of-devices only using list of compatibles driver-core: platform: fix typo in documentation for multi-driver helper component: remove impossible condition drivers: dma-coherent: simplify dma_init_coherent_memory return value devicetree: update documentation for fw_cfg ARM bindings firmware: create directory hierarchy for sysfs fw_cfg entries firmware: introduce sysfs driver for QEMU's fw_cfg device kobject: export kset_find_obj() for module use driver core: bus: use to_subsys_private and to_device_private_bus driver core: bus: use list_for_each_entry* debugfs: Add stub function for debugfs_create_automount(). kernfs: make kernfs_walk_ns() use kernfs_pr_cont_buf[]	2016-03-17 13:38:00 -07:00
Sudip Mukherjee	956ccef3c9	nfsd: recover: fix memory leak nfsd4_cltrack_grace_start() will allocate the memory for grace_start but when we returned due to error we missed freeing it. Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-17 14:57:15 -04:00
Martin Brandenburg	2f83ace371	orangefs: put register_chrdev immediately before register_filesystem Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-17 14:34:10 -04:00
Martin Brandenburg	a4c680a027	orangefs: remove paranoia in orangefs_set_inode Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-17 14:33:56 -04:00
Martin Brandenburg	02a5cc537d	orangefs: sanitize listxattr and return EIO on impossible values Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-17 14:33:47 -04:00
Martin Brandenburg	5e06664f29	orangefs: remove unused reference to xattr key length Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-17 14:33:47 -04:00
Linus Torvalds	bb7aeae3d6	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer updates from James Morris: "There are a bunch of fixes to the TPM, IMA, and Keys code, with minor fixes scattered across the subsystem. IMA now requires signed policy, and that policy is also now measured and appraised" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (67 commits) X.509: Make algo identifiers text instead of enum akcipher: Move the RSA DER encoding check to the crypto layer crypto: Add hash param to pkcs1pad sign-file: fix build with CMS support disabled MAINTAINERS: update tpmdd urls MODSIGN: linux/string.h should be #included to get memcpy() certs: Fix misaligned data in extra certificate list X.509: Handle midnight alternative notation in GeneralizedTime X.509: Support leap seconds Handle ISO 8601 leap seconds and encodings of midnight in mktime64() X.509: Fix leap year handling again PKCS#7: fix unitialized boolean 'want' firmware: change kernel read fail to dev_dbg() KEYS: Use the symbol value for list size, updated by scripts/insert-sys-cert KEYS: Reserve an extra certificate symbol for inserting without recompiling modsign: hide openssl output in silent builds tpm_tis: fix build warning with tpm_tis_resume ima: require signed IMA policy ima: measure and appraise the IMA policy itself ima: load policy using path ...	2016-03-17 11:33:45 -07:00
Linus Torvalds	70477371dc	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 4.6: API: - Convert remaining crypto_hash users to shash or ahash, also convert blkcipher/ablkcipher users to skcipher. - Remove crypto_hash interface. - Remove crypto_pcomp interface. - Add crypto engine for async cipher drivers. - Add akcipher documentation. - Add skcipher documentation. Algorithms: - Rename crypto/crc32 to avoid name clash with lib/crc32. - Fix bug in keywrap where we zero the wrong pointer. Drivers: - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver. - Add PIC32 hwrng driver. - Support BCM6368 in bcm63xx hwrng driver. - Pack structs for 32-bit compat users in qat. - Use crypto engine in omap-aes. - Add support for sama5d2x SoCs in atmel-sha. - Make atmel-sha available again. - Make sahara hashing available again. - Make ccp hashing available again. - Make sha1-mb available again. - Add support for multiple devices in ccp. - Improve DMA performance in caam. - Add hashing support to rockchip" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: qat - remove redundant arbiter configuration crypto: ux500 - fix checks of error code returned by devm_ioremap_resource() crypto: atmel - fix checks of error code returned by devm_ioremap_resource() crypto: qat - Change the definition of icp_qat_uof_regtype hwrng: exynos - use __maybe_unused to hide pm functions crypto: ccp - Add abstraction for device-specific calls crypto: ccp - CCP versioning support crypto: ccp - Support for multiple CCPs crypto: ccp - Remove check for x86 family and model crypto: ccp - memset request context to zero during import lib/mpi: use "static inline" instead of "extern inline" lib/mpi: avoid assembler warning hwrng: bcm63xx - fix non device tree compatibility crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode. crypto: qat - The AE id should be less than the maximal AE number lib/mpi: Endianness fix crypto: rockchip - add hash support for crypto engine in rk3288 crypto: xts - fix compile errors crypto: doc - add skcipher API documentation crypto: doc - update AEAD AD handling ...	2016-03-17 11:22:54 -07:00
Mike Marshall	1a0ce16d71	Orangefs: adjust unwind on module init failure. Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-17 13:24:34 -04:00
Herbert Xu	d1558f4e95	eCryptfs: Use skcipher and shash eCryptfs: Fix null pointer dereference on kzalloc error path The conversion to skcipher and shash added a couple of null pointer dereference bugs on the kzalloc failure path. This patch fixes them. Fixes: `3095e8e366` ("eCryptfs: Use skcipher and shash") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-03-17 19:09:00 +08:00
Trond Myklebust	1425075e72	NFS: NFSoRDMA Client Side Changes These patches include several bugfixes and cleanups for the NFSoRDMA client. This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes from the recent workqueue swicchover. These patches also switch xprtrdma to use the new CQ API Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJW5wl+AAoJENfLVL+wpUDrH7sP/i3lZ7n6Pr+Flrb/Z+ywjmEA mvZ0u3O3eviFxDqr82J/WL7fgDUGbBd9urtUu5xZGscc0HNs3LQ8izm6Dy3gmrVF MFh69jNGL5djmfymYMWRbdoKLuOw4V70EBtGnYqH7Bh7gwpiIl3EiVcBBJup/vKV rgvx6NnSGYpYPYpBFC4Ql4qZx7m5j4cxsThRScbu7wMMjDknls+7ZDM1B5mGO00+ 1vYObpYGXOXovGZyHAHspQityWp6jvUcEMnJzWbMUFDqxkOmmGK3t54MfvRXZiFE vUkgg5nGxhMejEIfMywuf6czKGfXc4HZT2yF9eSZeA4IW+7QkeeXeCcIANBDY6Ga oKXu4fgM6T7SnCpefwkXRhinwtEM04tGAlxo1X3UcKrMz7Q3di3/NtgaWijfL8gy 9Nd1lt2kqI375h7+OZccURl33lnQBSjO4F2pt/pFk6wYwGh0F8co9bIp6QIEVQ0f N2l8KU9vgLofhrD9JzZeu1l3+TCdDU9YaJLSjbkZ71BTjNtkNdUcd9Tk+bMSxept mq7mNKe1oBHAGgR8+7zYBUKEt85rdpNovPoU+Hz/QKbV3zikGSPmL3e9gpEvhmH4 MKD7Vs61Fi8volfv7wHmzZF8Tk68qc1MAZXSbgzVQxwF/uBjqEuSO+n4Kii/gfkG MJjlzqmqlO01O1fn2XHu =aHja -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: NFSoRDMA Client Side Changes These patches include several bugfixes and cleanups for the NFSoRDMA client. This includes bugfixes for NFS v4.1, proper RDMA_ERROR handling, and fixes from the recent workqueue swicchover. These patches also switch xprtrdma to use the new CQ API Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-4.6-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (787 commits) xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs xprtrdma: Use an anonymous union in struct rpcrdma_mw xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs xprtrdma: Serialize credit accounting again xprtrdma: Properly handle RDMA_ERROR replies rpcrdma: Add RPCRDMA_HDRLEN_ERR xprtrdma: Do not wait if ib_post_send() fails xprtrdma: Segment head and tail XDR buffers on page boundaries xprtrdma: Clean up dprintk format string containing a newline xprtrdma: Clean up physical_op_map() xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro	2016-03-16 16:25:09 -04:00
Jeff Layton	849dc3244c	nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed I hit the following oops out of the blue while testing with flexfiles: BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8 IP: [<ffffffffa048f6b8>] nfs4_ff_find_or_create_ds_client+0x48/0x50 [nfs_layout_flexfiles] PGD 44031067 PUD 5062d067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: nfsv3 nfs_layout_flexfiles tun rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bonding ipmi_devintf ipmi_msghandler snd_hda_codec_generic virtio_balloon ppdev snd_hda_intel snd_hda_controller snd_hda_codec iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core parport_pc snd_hwdep parport snd_seq snd_seq_device snd_pcm snd_timer acpi_cpufreq snd soundcore i2c_piix4 xfs libcrc32c joydev virtio_net virtio_console qxl drm_kms_helper ttm crc32c_intel drm virtio_pci serio_raw ata_generic virtio_ring virtio pata_acpi CPU: 0 PID: 19138 Comm: test5 Not tainted 4.1.9-100.pd.90.el7.x86_64 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 task: ffff88007b70cf00 ti: ffff88004cc44000 task.ti: ffff88004cc44000 RIP: 0010:[<ffffffffa048f6b8>] [<ffffffffa048f6b8>] nfs4_ff_find_or_create_ds_client+0x48/0x50 [nfs_layout_flexfiles] RSP: 0018:ffff88004cc47890 EFLAGS: 00010246 RAX: 0000000000000003 RBX: ffff880050932300 RCX: ffff88006978f488 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003e0e8540 RBP: ffff88004cc47908 R08: 0000000000000000 R09: 0000000000000000 R10: ffff88007ff8c758 R11: 0000000000000005 R12: ffff88003e0e8540 R13: 0000000000000000 R14: ffff88006978f488 R15: ffff88004431cc80 FS: 00007fea40c7c740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000e8 CR3: 0000000044318000 CR4: 00000000000406f0 Stack: ffffffffa048c934 ffff880050932310 0000000100000001 ffff88006978f510 ffff88006978f3c8 ffff88003e56cd90 ffff88004cc479d0 00000020a052aff0 000000000004b000 ffff88004cc47908 ffff880050932300 ffff88004cc479d0 Call Trace: [<ffffffffa048c934>] ? ff_layout_write_pagelist+0x64/0x220 [nfs_layout_flexfiles] [<ffffffffa057a3bf>] pnfs_generic_pg_writepages+0xaf/0x1b0 [nfsv4] [<ffffffffa051ab57>] nfs_pageio_doio+0x27/0x60 [nfs] [<ffffffffa051bfe4>] nfs_pageio_complete_mirror+0x54/0xa0 [nfs] [<ffffffffa051c7ad>] nfs_pageio_complete+0x2d/0x90 [nfs] [<ffffffffa052032d>] nfs_writepage_locked+0x8d/0xe0 [nfs] [<ffffffff811e4630>] ? page_referenced_one+0x1a0/0x1a0 [<ffffffffa05210e7>] nfs_wb_single_page+0xf7/0x190 [nfs] [<ffffffffa05108d1>] nfs_launder_page+0x41/0x90 [nfs] [<ffffffff811b8930>] invalidate_inode_pages2_range+0x340/0x3a0 [<ffffffff811b89a7>] invalidate_inode_pages2+0x17/0x20 [<ffffffffa0513e1e>] nfs_release+0x9e/0xb0 [nfs] [<ffffffffa050fa1d>] nfs_file_release+0x3d/0x60 [nfs] [<ffffffff8122481c>] __fput+0xdc/0x1e0 [<ffffffff8122496e>] ____fput+0xe/0x10 [<ffffffff810bde67>] task_work_run+0xa7/0xe0 [<ffffffff810af735>] get_signal+0x565/0x600 [<ffffffff811a9815>] ? __filemap_fdatawrite_range+0x65/0x90 [<ffffffff810144a7>] do_signal+0x37/0x730 [<ffffffffa0569921>] ? nfs4_file_fsync+0x81/0x150 [nfsv4] [<ffffffff81254dbb>] ? vfs_fsync_range+0x3b/0xb0 [<ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280 [<ffffffff81014bff>] do_notify_resume+0x5f/0xa0 [<ffffffff8178ec3c>] int_signal+0x12/0x17 Code: 48 8b 40 70 8b 00 83 f8 03 74 20 83 f8 04 75 13 55 48 89 ce 48 89 d7 48 89 e5 e8 14 0f 0e 00 5d c3 66 90 0f 0b 66 0f 1f 44 00 00 <48> 8b 82 e8 00 00 00 c3 66 66 66 66 90 55 48 89 e5 41 57 41 56 RIP [<ffffffffa048f6b8>] nfs4_ff_find_or_create_ds_client+0x48/0x50 [nfs_layout_flexfiles] RSP <ffff88004cc47890> CR2: 00000000000000e8 When the DS connection attempt fails, nfs4_ff_layout_prepare_ds marks it for the error but then just returns the ds as if it were usable. The comments though say: /* Upon return, either ds is connected, or ds is NULL */ Ensure that we set the return pointer to NULL in the event that the connection attempt fails. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-03-16 15:46:48 -04:00
Christoph Hellwig	95d9f6c3ed	nfs: remove nfs_inode_dio_wait Just call inode_dio_wait directly instead of through a pointless wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-03-16 15:42:43 -04:00
Christoph Hellwig	4ff79bc709	nfs: remove nfs4_file_fsync The only difference to nfs_file_fsync is the call to pnfs_sync_inode. But pnfs_sync_inode is just an inline that calls a pNFS layout driver method if CONFIG_PNFS is designed, and thus can be called just fine from the core NFS module. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2016-03-16 15:42:43 -04:00
Linus Torvalds	271ecc5253	Merge branch 'akpm' (patches from Andrew) Merge first patch-bomb from Andrew Morton: - some misc things - ofs2 updates - about half of MM - checkpatch updates - autofs4 update * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) autofs4: fix string.h include in auto_dev-ioctl.h autofs4: use pr_xxx() macros directly for logging autofs4: change log print macros to not insert newline autofs4: make autofs log prints consistent autofs4: fix some white space errors autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked() autofs4: fix coding style line length in autofs4_wait() autofs4: fix coding style problem in autofs4_get_set_timeout() autofs4: coding style fixes autofs: show pipe inode in mount options kallsyms: add support for relative offsets in kallsyms address table kallsyms: don't overload absolute symbol type for percpu symbols x86: kallsyms: disable absolute percpu symbols on !SMP checkpatch: fix another left brace warning checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses checkpatch: warn on bare unsigned or signed declarations without int checkpatch: exclude asm volatile from complex macro check mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate() mm: migrate: consolidate mem_cgroup_migrate() calls mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous ...	2016-03-16 11:51:08 -07:00
Dmitry V. Levin	5f8d498d43	vfs: show_vfsstat: do not ignore errors from show_devname method Explicitly check show_devname method return code and bail out in case of an error. This fixes regression introduced by commit `9d4d65748a`. Cc: stable@vger.kernel.org Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-16 13:09:08 -04:00
J. Bruce Fields	2f6fc056e8	nfsd: fix deadlock secinfo+readdir compound nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also unlocks if necessary (nfsd filehandle locking is probably too lenient), so it gets unlocked eventually, but if the following op in the compound needs to lock it again, we can deadlock. A fuzzer ran into this; normal clients don't send a secinfo followed by a readdir in the same compound. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2016-03-16 10:51:21 -04:00
Ashish Samant	742f992708	fuse: return patrial success from fuse_direct_io() If a user calls writev/readv in direct io mode with partially valid data in the iovec array such that any vector other than the first one in the array contains invalid data, we currently return the error for the invalid iovec. Instead, we should return the number of bytes already written/read and not the error as we do in the non direct_io case. Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-16 14:38:31 +01:00
Ian Kent	8a78d59304	autofs4: use pr_xxx() macros directly for logging Use the standard pr_xxx() log macros directly for log prints instead of the AUTOFS_XXX() macros. Signed-off-by: Ian Kent <ikent@redhat.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Ian Kent	90967c87e3	autofs4: change log print macros to not insert newline Common kernel coding practice is to include the newline of log prints within the log text rather than hidden away in a macro. To avoid introducing inconsistencies as changes are made change the log macros to not include the newline. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Ian Kent	cab49f9ed8	autofs4: make autofs log prints consistent Use the pr_() print in AUTOFS_() macros instead of printks and include the module name in log message macros. Also use the AUTOFS_*() macros everywhere instead of raw printks. Signed-off-by: Ian Kent <raven@themaw.net> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Ian Kent	0266725ad4	autofs4: fix some white space errors Fix some white space format errors. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Ian Kent	e3cd8067c1	autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked() The return from an ioctl if an invalid ioctl is passed in should be EINVAL not ENOSYS. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Ian Kent	b3f67a988c	autofs4: fix coding style line length in autofs4_wait() The need for this is questionable but checkpatch.pl complains about the line length and it's a straightfoward change. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Ian Kent	aa330ddc53	autofs4: fix coding style problem in autofs4_get_set_timeout() Refactor autofs4_get_set_timeout() to eliminate coding style error. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Ian Kent	e9a7c2f1a5	autofs4: coding style fixes Try and make the coding style completely consistent throughtout the autofs module and inline with kernel coding style recommendations. Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Stanislav Kinsburskiy	c83aa55d0b	autofs: show pipe inode in mount options This is required for CRIU (Checkpoint Restart In Userspace) to migrate a mount point when write end in user space is closed. Below is a brief description of the problem. To migrate a non-catatonic autofs mount point, one has to restore the control pipe between kernel and autofs master process. One of the autofs masters is systemd, which closes pipe write end after passing it to the kernel with mount call. To be able to restore the systemd control pipe one has to know which read pipe end in systemd corresponds to the write pipe end in the kernel. The pipe "fd" in mount options is not enough because it was closed and probably replaced by some other descriptor. Thus, some other attribute is required to be able to find the read pipe end. The best attribute to use to find the correct pipe end is inode number becuase it's unique for the whole system and can't be reused while the autofs mount exists. This attribute can also be used to recognize a situation where an autofs mount has no master (no process with specified "pgrp" or no file descriptor with "pipe_ino", specified in autofs mount options). Signed-off-by: Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Johannes Weiner	62cccb8c8e	mm: simplify lock_page_memcg() Now that migration doesn't clear page->mem_cgroup of live pages anymore, it's safe to make lock_page_memcg() and the memcg stat functions take pages, and spare the callers from memcg objects. [akpm@linux-foundation.org: fix warnings] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Johannes Weiner	81f8c3a461	mm: memcontrol: generalize locking for the page->mem_cgroup binding These patches tag the page cache radix tree eviction entries with the memcg an evicted page belonged to, thus making per-cgroup LRU reclaim work properly and be as adaptive to new cache workingsets as global reclaim already is. This should have been part of the original thrash detection patch series, but was deferred due to the complexity of those patches. This patch (of 5): So far the only sites that needed to exclude charge migration to stabilize page->mem_cgroup have been per-cgroup page statistics, hence the name mem_cgroup_begin_page_stat(). But per-cgroup thrash detection will add another site that needs to ensure page->mem_cgroup lifetime. Rename these locking functions to the more generic lock_page_memcg() and unlock_page_memcg(). Since charge migration is a cgroup1 feature only, we might be able to delete it at some point, and these now easy to identify locking sites along with it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Andrew Morton	02c43638ec	fs/mpage.c:mpage_readpages(): use lru_to_page() helper Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Jun Piao	8d67d3c244	ocfs2/dlm: fix a variable overflow problem in dlmdomain.c In dlm_send_join_cancels(), node is defined with type unsigned int, but initialized with -1, this will lead variable overflow. Although this won't cause any runtime problem, the code looks a little uncoordinated. Signed-off-by: Jun Piao <piaojun@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Jiufei Xue	814ce69432	ocfs2: fix a tiny race that leads file system read-only when o2hb detect a node down, it first set the dead node to recovery map and create ocfs2rec which will replay journal for dead node. o2hb thread then call dlm_do_local_recovery_cleanup() to delete the lock for dead node. After the lock of dead node is gone, locks for other nodes can be granted and may modify the meta data without replaying journal of the dead node. The detail is described as follows. N1 N2 N3(master) modify the extent tree of inode, and commit dirty metadata to journal, then goes down. o2hb thread detects N1 goes down, set recovery map and delete the lock of N1. dlm_thread flush ast for the lock of N2. do not detect the death of N1, so recovery map is empty. read inode from disk without replaying the journal of N1 and modify the extent tree of the inode that N1 had modified. ocfs2rec recover the journal of N1. The modification of N2 is lost. The modification of N1 and N2 are not serial, and it will lead to read-only file system. We can set recovery_waiting flag to the lock resource after delete the lock for dead node to prevent other node from getting the lock before dlm recovery. After dlm recovery, the recovery map on N2 is not empty, ocfs2_inode_lock_full_nested() will wait for ocfs2 recovery. Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
xuejiufei	d277f33eda	ocfs2/dlm: return EINVAL when the lockres on migration target is in DROPPING_REF state If master migrate this lock resource to node when it happened to purge it, a new lock resource will be created and inserted into hash list. If then master goes down, the lock resource being purged is recovered, so there exist two lock resource with different owner. So return error to master if the lock resource is in DROPPING state, master will retry to migrate this lock resource. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
xuejiufei	8c03439681	ocfs2/dlm: clear DROPPING_REF flag when the master goes down If the master goes down after return in-progress for deref message. The lock resource on non-master node can not be purged. Clear the DROPPING_REF flag and recovery it. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
xuejiufei	842b90b624	ocfs2/dlm: return in progress if master can not clear the refmap bit right now Master returns in-progress to non-master node when it can not clear the refmap bit right now. And non-master node will not purge the lock resource until receiving deref done message. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
xuejiufei	60d663cb52	ocfs2/dlm: add DEREF_DONE message This series of patches is to fix the dis-order issue of setting/clearing refmap bit described below. Node 1 Node 2(master) dlmlock dlm_do_master_request dlm_master_request_handler -> dlm_lockres_set_refmap_bit dlmlock succeed dlmunlock succeed dlm_purge_lockres dlm_deref_handler -> find lock resource is in DLM_LOCK_RES_SETREF_INPROG state, so dispatch a deref work dlm_purge_lockres succeed. call dlmlock again dlm_do_master_request dlm_master_request_handler -> dlm_lockres_set_refmap_bit deref work trigger, call dlm_lockres_clear_refmap_bit to clear Node 1 from refmap dlm_purge_lockres succeed dlm_send_remote_lock_request return DLM_IVLOCKID because the lockres is not exist BUG if the lockres is $RECOVERY This series of patches add a new message to keep the order of set and clear. Other nodes can purge the lock resource only after the refmap bit on master is cleared. This patch is to add DEREF_DONE message and corresponding handler. Node can purge the lock resource after receiving this message. As a new message is added, so increase the minor number of dlm protocol version. Signed-off-by: xuejiufei <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Joseph Qi	39b29af030	ocfs2/dlm: fix a typo in dlmcommon.h Refer to cluster/tcp.h, NET_MAX_PAYLOAD_BYTES is a typo for O2NET_MAX_PAYLOAD_BYTES. Since currently DLM_MIG_LOCKRES_RESERVED is not actually used, it won't cause any problem. But we'd better correct it for further use. Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
jiangyiwen	bfd97a0320	ocfs2: use spinlock_irqsave() to downconvert lock in ocfs2_osb_dump() Commit `a75e9ccabd` ("ocfs2: use spinlock irqsave for downconvert lock") missed an unmodified place in ocfs2_osb_dump(), so it still exists a deadlock scenario. ocfs2_wake_downconvert_thread ocfs2_rw_unlock ocfs2_dio_end_io dio_complete ..... bio_endio req_bio_endio .... scsi_io_completion blk_done_softirq __do_softirq do_softirq irq_exit do_IRQ ocfs2_osb_dump cat /sys/kernel/debug/ocfs2/${uuid}/fs_state This patch still uses spin_lock_irqsave() - replace spin_lock() to solve this situation. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
jiangyiwen	4d548f61d6	ocfs2/cluster: replace the interrupt safe spinlocks with common ones There actually no hardware or software interrupts in the context which using o2hb_live_lock, so we don't need to worry about race conditions caused by irq/softirq with spinlock held. Turning off irq is not good for system performance after all. Just replace them with a non interrupt safe function. Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-15 16:55:16 -07:00
Linus Torvalds	ba33ea811e	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Ingo Molnar: "This is another big update. Main changes are: - lots of x86 system call (and other traps/exceptions) entry code enhancements. In particular the complex parts of the 64-bit entry code have been migrated to C code as well, and a number of dusty corners have been refreshed. (Andy Lutomirski) - vDSO special mapping robustification and general cleanups (Andy Lutomirski) - cpufeature refactoring, cleanups and speedups (Borislav Petkov) - lots of other changes ..." * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits) x86/cpufeature: Enable new AVX-512 features x86/entry/traps: Show unhandled signal for i386 in do_trap() x86/entry: Call enter_from_user_mode() with IRQs off x86/entry/32: Change INT80 to be an interrupt gate x86/entry: Improve system call entry comments x86/entry: Remove TIF_SINGLESTEP entry work x86/entry/32: Add and check a stack canary for the SYSENTER stack x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed x86/entry: Vastly simplify SYSENTER TF (single-step) handling x86/entry/traps: Clear DR6 early in do_debug() and improve the comment x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions x86/entry/32: Restore FLAGS on SYSEXIT x86/entry/32: Filter NT and speed up AC filtering in SYSENTER x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test selftests/x86: In syscall_nt, test NT\|TF as well x86/asm-offsets: Remove PARAVIRT_enabled x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled uprobes: __create_xol_area() must nullify xol_mapping.fault x86/cpufeature: Create a new synthetic cpu capability for machine check recovery ...	2016-03-15 09:32:27 -07:00
Bob Peterson	73b462d280	GFS2: Eliminate parameter non_block on gfs2_inode_lookup Now that we're not filtering out I_FREEING inodes from our lookups anymore, we can eliminate the non_block parameter from the lookup function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2016-03-15 10:46:50 -04:00
Bob Peterson	ff34245d52	GFS2: Don't filter out I_FREEING inodes anymore This patch basically reverts a very old patch from 2008, `7a9f53b3c1`, with the title "Alternate gfs2_iget to avoid looking up inodes being freed". The original patch was designed to avoid a deadlock caused by lock ordering with try_rgrp_unlink. The patch forced the function to not find inodes that were being removed by VFS. The problem is, that made it impossible for nodes to delete their own unlinked dinodes after a certain point in time, because the inode needed was not found by this filtering process. There is no longer a need for the patch, since function try_rgrp_unlink no longer locks the inode: All it does is queue the glock onto the delete work_queue, so there should be no more deadlock. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2016-03-15 10:46:45 -04:00
Bob Peterson	a4923865ea	GFS2: Prevent delete work from occurring on glocks used for create This patch tries to prevent delete work (queued via iopen callback) from executing if the glock is currently being used to create a new inode. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2016-03-15 10:46:37 -04:00
Bob Peterson	2df6f47150	GFS2: Fix direct IO write rounding error The fsx test in xfstests was failing because it was using direct IO writes which were using a bad calculation. It was using loff_t lstart = offset & (PAGE_CACHE_SIZE - 1); when it should be loff_t lstart = offset & ~(PAGE_CACHE_SIZE - 1); Thus, the write at offset 0x67e00 was calculating lstart to be 0xe00, the address of our corruption. Instead, it should have been 0x67000. This patch fixes the calculation. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>	2016-03-15 10:46:28 -04:00
Arnd Bergmann	67893f12e5	gfs2: avoid uninitialized variable warning We get a bogus warning about a potential uninitialized variable use in gfs2, because the compiler does not figure out that we never use the leaf number if get_leaf_nr() returns an error: fs/gfs2/dir.c: In function 'get_first_leaf': fs/gfs2/dir.c:802:9: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized] fs/gfs2/dir.c: In function 'dir_split_leaf': fs/gfs2/dir.c:1021:8: warning: 'leaf_no' may be used uninitialized in this function [-Wmaybe-uninitialized] Changing the 'if (!error)' to 'if (!IS_ERR_VALUE(error))' is sufficient to let gcc understand that this is exactly the same condition as in IS_ERR() so it can optimize the code path enough to understand it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Bob Peterson <rpeterso@redhat.com>	2016-03-15 10:46:11 -04:00
Dave Chinner	2cdb958aba	Merge branch 'xfs-misc-fixes-4.6-4' into for-next	2016-03-15 11:44:35 +11:00
Christoph Hellwig	355cced452	xfs: always set rvalp in xfs_dir2_node_trim_free xfs_dir2_node_trim_free can return with setting the rvalp argument pointer. Initialize it to 0 at the beginning of the function and only update it to 1 if we succeeded trimming a freespace block. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-15 11:44:18 +11:00
Eric Sandeen	cc07eed833	xfs: ensure committed is initialized in xfs_trans_roll __xfs_trans_roll() can return without setting the committed argument; this was a problem for xfs_bmap_finish(): int committed;/ xact committed or not */ ... error = __xfs_trans_roll(tp, ip, &committed); if (error) { ... if (committed) { and we tested an uninitialized "committed" variable on the error path. No caller is preserving "committed" state across calls to __xfs_trans_roll(), so just initialize committed inside the function to avoid future errors like this. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-15 11:42:47 +11:00
Brian Foster	d34999c97a	xfs: borrow indirect blocks from freed extent when available xfs_bmap_del_extent() handles extent removal from the in-core and on-disk extent lists. When removing a delalloc range, it updates the indirect block reservation appropriately based on the removal. It currently enforces that the new indirect block reservation is less than or equal to the original. This is normally the case in all situations except for in certain cases when the removed range creates a hole in a single delalloc extent, thus splitting a single delalloc extent in two. It is possible with small enough extents to split an indlen==1 extent into two such slightly smaller extents. This leaves one extent with 0 indirect blocks and leads to assert failures in other areas (e.g., xfs_bunmapi() if the extent happens to be removed). Update the indlen distribution code to steal blocks from the deleted extent, if necessary, to satisfy the worst case total indirect reservation for the new extents. This is safe as the caller does not update the fdblocks counters until the extent is removed. Blocks stolen in this manner simply remain accounted as allocated, having ownership transferred from the data extent to an indirect reservation. As a precaution, fall back to the original reservation algorithm if the new indlen requirement is not met and warn if we end up with extents without any reservation at all to detect this more easily in the future. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-15 11:42:47 +11:00
Brian Foster	a9bd24ac2b	xfs: refactor delalloc indlen reservation split into helper The delayed allocation indirect reservation splitting code is not sufficient in some cases where a delalloc extent is split in two. In preparation for enhancements to this code, refactor the current indlen distribution algorithm into a new helper function. [dchinner: rename temp, temp2 variables] Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-15 11:42:46 +11:00
Brian Foster	b2706a05ba	xfs: update freeblocks counter after extent deletion xfs_bunmapi() currently updates the fdblocks counter, unreserves quota, etc. before the extent is deleted by xfs_bmap_del_extent(). The function has problems dividing up the indirect reserved blocks for scenarios where a single delalloc extent is split in two. Particularly, there aren't always enough blocks reserved for multiple extents in a single extent reservation. The solution to this problem is to allow the extent removal code to steal from the deleted extent to meet indirect reservation requirements. Move the block of code in xfs_bmapi() that updates the fdblocks counter to after the call to xfs_bmap_del_extent() to allow the codepath to update the extent record before the free blocks are accounted. Also, reshuffle the code slightly so the delalloc accounting occurs near the xfs_bmap_del_extent() call to provide context for the comments. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-15 11:42:46 +11:00
Brian Foster	801cc4e17a	xfs: debug mode forced buffered write failure Add a DEBUG mode-only sysfs knob to enable forced buffered write failure. An additional side effect of this mode is brute force killing of delayed allocation blocks in the range of the write. The latter is the prime motiviation behind this patch, as userspace test infrastructure requires a reliable mechanism to create and split delalloc extents without causing extent conversion. Certain fallocate operations (i.e., zero range) were used for this in the past, but the implementations have changed such that delalloc extents are flushed and converted to real blocks, rendering the test useless. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-15 11:42:44 +11:00
Mike Marshall	2180c52cc7	Orangefs: fix sloppy cleanups of debugfs and sysfs init failures. Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-14 15:48:28 -04:00
Mike Marshall	a7d3e78ab5	Orangefs: follow_link -> get_link change Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-14 15:48:28 -04:00
Mike Marshall	53f57fef43	Orangefs: Extra sanity insurance on buffer before using string functions on it. Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-14 15:48:28 -04:00
Mike Marshall	ab6652524a	Linux 4.5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW5j4RAAoJEHm+PkMAQRiGhVEH/0qZbM1J+WnCK92bm9+inCnB JO2JViGIuCQB5BxljVMil2dzrw85D+dC7+fryr0wVBhhBlr0lXPJGSYCYYTEaI20 Wco5YlTmjRirUwmxWzBXvB5kvTdIaNfNYDcFch6lbsaLUNgqydNKtk08ckO/4k0D AmaShW8swBiXE/RmHuj8H41ksHsnY8W62dlczEaAIfr4kluPX/kKnyXpmpvmZm1j sM4fskPlq+Jz5pOXXFsFfrhiBgpSUnwSj1tNwK5+DkmaVnWOkPuwkqLBWqpy4pzm GTeDBdf5/ixGxgNsZ2VWtbPnc2wEP7SIcu45MU7QFw5kqwDN2nN63BRVXI5Z5qY= =RFx2 -----END PGP SIGNATURE----- Orangefs: merge to v4.5 Merge tag 'v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into current Linux 4.5	2016-03-14 15:39:42 -04:00
Adam Buchbinder	bb7ab3b92e	btrfs: Fix misspellings in comments. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-03-14 15:05:02 +01:00
Seth Forshee	744742d692	fuse: Add reference counting for fuse_io_priv The 'reqs' member of fuse_io_priv serves two purposes. First is to track the number of oustanding async requests to the server and to signal that the io request is completed. The second is to be a reference count on the structure to know when it can be freed. For sync io requests these purposes can be at odds. fuse_direct_IO() wants to block until the request is done, and since the signal is sent when 'reqs' reaches 0 it cannot keep a reference to the object. Yet it needs to use the object after the userspace server has completed processing requests. This leads to some handshaking and special casing that it needlessly complicated and responsible for at least one race condition. It's much cleaner and safer to maintain a separate reference count for the object lifecycle and to let 'reqs' just be a count of outstanding requests to the userspace server. Then we can know for sure when it is safe to free the object without any handshaking or special cases. The catch here is that most of the time these objects are stack allocated and should not be freed. Initializing these objects with a single reference that is never released prevents accidental attempts to free the objects. Fixes: `9d5722b777` ("fuse: handle synchronous iocbs internally") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-14 15:02:51 +01:00
Robert Doebbelin	7cabc61e01	fuse: do not use iocb after it may have been freed There's a race in fuse_direct_IO(), whereby is_sync_kiocb() is called on an iocb that could have been freed if async io has already completed. The fix in this case is simple and obvious: cache the result before starting io. It was discovered by KASan: kernel: ================================================================== kernel: BUG: KASan: use after free in fuse_direct_IO+0xb1a/0xcc0 at addr ffff88036c414390 Signed-off-by: Robert Doebbelin <robert@quobyte.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `bcba24ccdc` ("fuse: enable asynchronous processing direct IO") Cc: <stable@vger.kernel.org> # 3.10+	2016-03-14 15:02:50 +01:00
Ashish Samant	2e3fcb1ccd	btrfs: Print Warning only if ENOSPC_DEBUG is enabled Dont print warning for ENOSPC error unless ENOSPC_DEBUG is enabled. Use btrfs_debug if it is enabled. Signed-off-by: Ashish Samant <ashish.samant@oracle.com> [ preserve the WARN_ON ] Signed-off-by: David Sterba <dsterba@suse.com>	2016-03-14 14:59:54 +01:00
Al Viro	ed782b5a70	dcache.c: new helper: __d_add() d_add() with inode->i_lock already held; common to d_add() and d_splice_alias(). All ->lookup() instances that end up hashing the dentry they are given will hash it here. This almost completes the preparations to parallel lookups proper - the only remaining bit is taking security_d_instantiate() past d_rehash() and doing rehashing without dropping ->d_lock. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:17:38 -04:00
Al Viro	de689f5e36	don't bother with __d_instantiate(dentry, NULL) it's a no-op - bumping ->d_seq is pointless there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:17:32 -04:00
Al Viro	27f203f655	untangle fsnotify_d_instantiate() a bit First of all, don't bother calling it if inode is NULL - that makes inode argument unused. Moreover, do it before dropping ->d_lock, not right after that (and don't bother grabbing ->d_lock in it, of course). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:17:28 -04:00
Al Viro	34d0d19dc0	uninline d_add() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:17:24 -04:00
Al Viro	668d0cd56e	replace d_add_unique() with saner primitive new primitive: d_exact_alias(dentry, inode). If there is an unhashed dentry with the same name/parent and given inode, rehash, grab and return it. Otherwise, return NULL. The only caller of d_add_unique() switched to d_exact_alias() + d_splice_alias(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:17:20 -04:00
Al Viro	e12a4e8a04	quota: use lookup_one_len_unlocked() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:16:48 -04:00
Al Viro	85f40482bc	cifs_get_root(): use lookup_one_len_unlocked() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:16:44 -04:00
Al Viro	130f9ab75d	nfs_lookup: don't bother with d_instantiate(dentry, NULL) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:16:40 -04:00
Al Viro	9d95afd597	kill dentry_unhash() the last user is gone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:16:33 -04:00
Al Viro	f8b31710e4	ceph_fill_trace(): don't bother with d_instantiate(dn, NULL) ... and use d_add(dn, NULL) in case we need to hash a negative unhashed rather than using d_rehash() directly. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:16:06 -04:00
Al Viro	de4acda16e	autofs4: don't bother with d_instantiate(dentry, NULL) in ->lookup() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:16:00 -04:00
Al Viro	5cf3b560af	configfs: move d_rehash() into configfs_create() for regular files ... and turn it into d_add in there Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:55 -04:00
Al Viro	f7380af04b	ceph: don't bother with d_rehash() in splice_dentry() d_splice_alias() guarantees that it'll be always hashed Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:51 -04:00
Al Viro	949a852e46	namei: teach lookup_slow() to skip revalidate ... and make mountpoint_last() use it. That makes all candidates for lookup with parent locked shared go through lookup_slow(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:46 -04:00
Al Viro	e3c1392808	namei: massage lookup_slow() to be usable by lookup_one_len_unlocked() Return dentry and don't pass nameidata or path; lift crossing mountpoints into the caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:40 -04:00
Al Viro	d6d95ded91	lookup_one_len_unlocked(): use lookup_dcache() No need to lock parent just because of ->d_revalidate() on child; contrary to the stale comment, lookup_dcache() can be used without locking the parent. Result can be moved as soon as we return, of course, but the same is true for lookup_one_len_unlocked() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:36 -04:00
Al Viro	74ff0ffc7f	namei: simplify invalidation logics in lookup_dcache() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:15:31 -04:00
Al Viro	e9742b5332	namei: change calling conventions for lookup_{fast,slow} and follow_managed() Have lookup_fast() return 1 on success and 0 on "need to fall back"; lookup_slow() and follow_managed() return positive (1) on success. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:14:35 -04:00
Al Viro	5d0f49c136	namei: untanlge lookup_fast() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-14 00:14:25 -04:00
vikram.jadhav07	0304688676	ext4: clean up error handling in the MMP support There is memory leak as both caller function kmmpd() and callee read_mmp_block() not releasing bh_check (i.e buffer_head). Given patch fixes this problem. [ Additional changes suggested by Andreas Dilger -- TYT ] Signed-off-by: Jadhav Vikram <vikramjadhavpucsd2007@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-13 17:56:52 -04:00
Michal Hocko	490c1b444c	jbd2: do not fail journal because of frozen_buffer allocation failure Journal transaction might fail prematurely because the frozen_buffer is allocated by GFP_NOFS request: [ 72.440013] do_get_write_access: OOM for frozen_buffer [ 72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory (...snipped....) [ 72.495559] do_get_write_access: OOM for frozen_buffer [ 72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.496839] do_get_write_access: OOM for frozen_buffer [ 72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.505766] Aborting journal on device sda1-8. [ 72.505851] EXT4-fs (sda1): Remounting filesystem read-only This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM" because small GPF_NOFS allocations never failed. This allocation seems essential for the journal and GFP_NOFS is too restrictive to the memory allocator so let's use __GFP_NOFAIL here to emulate the previous behavior. Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-13 17:38:20 -04:00
Konstantin Khlebnikov	adb7ef600c	ext4: use __GFP_NOFAIL in ext4_free_blocks() This might be unexpected but pages allocated for sbi->s_buddy_cache are charged to current memory cgroup. So, GFP_NOFS allocation could fail if current task has been killed by OOM or if current memory cgroup has no free memory left. Block allocator cannot handle such failures here yet. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-13 17:29:06 -04:00
Aihua Zhang	a2821e34df	ext4: fix compile error while opening the macro DOUBLE_CHECK the error is: fs/ext4/mballoc.c:475:43: error: 'struct ext4_group_info' has no member named 'bb_bitmap'. so, the definition of macro DOUBLE_CHECK should before 'struct ext4_group_info', I fixed it, and I moved the macro AGGRESSIVE_CHECK together, because I think they shoule be together. Signed-off-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-13 17:18:12 -04:00
Ales Novak	7915a861c0	ext4: print ext4 mount option data_err=abort correctly If data_err=abort option is specified for an ext3/ext4 mount, /proc/mounts does show it as "(null)". This is caused by token2str() returning NULL for Opt_data_err_abort (due to its pattern containing '='). Signed-off-by: Ales Novak <alnovak@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-12 21:55:50 -05:00
Eryu Guan	5e1021f2b6	ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is ignored in the following "if" condition and ext4_expand_extra_isize() might be called with NULL iloc.bh set, which triggers NULL pointer dereference. This is uncovered by commit `8b4953e13f` ("ext4: reserve code points for the project quota feature"), which enlarges the ext4_inode size, and run the following script on new kernel but with old mke2fs: #/bin/bash mnt=/mnt/ext4 devname=ext4-error dev=/dev/mapper/$devname fsimg=/home/fs.img trap cleanup 0 1 2 3 9 15 cleanup() { umount $mnt >/dev/null 2>&1 dmsetup remove $devname losetup -d $backend_dev rm -f $fsimg exit 0 } rm -f $fsimg fallocate -l 1g $fsimg backend_dev=`losetup -f --show $fsimg` devsize=`blockdev --getsz $backend_dev` good_tab="0 $devsize linear $backend_dev 0" error_tab="0 $devsize error $backend_dev 0" dmsetup create $devname --table "$good_tab" mkfs -t ext4 $dev mount -t ext4 -o errors=continue,strictatime $dev $mnt dmsetup load $devname --table "$error_tab" && dmsetup resume $devname echo 3 > /proc/sys/vm/drop_caches ls -l $mnt exit 0 [ Patch changed to simplify the function a tiny bit. -- Ted ] Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-12 21:40:32 -05:00
Linus Torvalds	2a62ec0af2	xfs: fixes for 4.5-rc7 Changes: o Only perform torn log write detection on dirty logs. This prevents failures being detected due to a clean filesystem being moved between machines or kernels of different architectures (e.g. 32 -> 64 bit, BE -> LE, etc). This fixes a regression introduced by the torn log write detection in 4.5-rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW4fdHAAoJEK3oKUf0dfod/6EP/1Mi2K+z8t9FaevB0yiy+Yfs CzRe2Sim5EF67IFeh1CBChcJ4dpUtVxwn+vM6/tfOWM8jS0Oo1Chr5woRm2Xc1Ko O4xmLcoooIBeustVt12/3+lKR0ACY4tSq8V673wBp7tSFi4dj5cnpb2pDuQTio3q JCTFtHkG7s5d2XnDn0dYVdrm7/eKB1ZdQCaVxikVtqQvdwrnyZpo0Q5iu5/Ync4H ULOoMW1xrrJQ7bZcMq4uLM9GglUEB2/tPfT2jFtiUFaNo+420B7FzZR9e6P9giBV JB/t02uiqicN0+WN9xyu+ohYMtjUZ2wrysLaX8P9szy/Rmsn7gOUYs946KUhullD D5JFzB/IUrLnIhfY4il8bK6NoTLPCj9DlktaA7GikA7QAyZFLrRr3b1r/XbR2lDB 8Sy3ij7yKh2fhThOk4D6fxyVkSgKpr9E2gz6LSl45imbrj69IjXCJwadD1i7yB8j VJj+Vr54DcjxFR0SnCrpGSG2i7fgkGk+8woIyVkPczPMpVlmQrpnmBbD0+fn4d31 aRX4aDmv7OsT+OKEoy9Hu3wRmfUZSmaRmp+2QdJ0dT98LEFoUCmhsaiJLL+nVgv0 tsApndnvAFxWHZZ9w5VPnJ/99YIvWpb3zzn6mKD3XfN/2Mf4sMcN2JTzxLgEdU9D 2JX+S1/AUMZfL0Ghaww8 =NDeH -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "This is a fix for a regression introduced in 4.5-rc1 by the new torn log write detection code. The regression only affects people moving a clean filesystem between machines/kernels of different architecture (such as changing between 32 bit and 64 bit kernels), but this is the recommended (and only!) safe way to migrate a filesystem between architectures so we really need to ensure it works. The changes are larger than I'd prefer right at the end of the release cycle, but the majority of the change is just factoring code to enable the detection of a clean log at the correct time to avoid this issue. Changes: - Only perform torn log write detection on dirty logs. This prevents failures being detected due to a clean filesystem being moved between machines or kernels of different architectures (e.g. 32 -> 64 bit, BE -> LE, etc). This fixes a regression introduced by the torn log write detection in 4.5-rc1" * tag 'xfs-for-linus-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: only run torn log write detection on dirty logs xfs: refactor in-core log state update to helper xfs: refactor unmount record detection into helper xfs: separate log head record discovery from verification	2016-03-11 10:21:32 -08:00
Linus Torvalds	63cf207e93	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of fixes: Fix for my dumb braino in ncpfs and a long-standing breakage on recovery from failed rename() in jffs2" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: jffs2: reduce the breakage on recovery from halfway failed rename() ncpfs: fix a braino in OOM handling in ncp_fill_cache()	2016-03-11 10:13:49 -08:00
Dan Carpenter	07c9a8e077	btrfs: scrub: silence an uninitialized variable warning It's basically harmless if "ref_level" isn't initialized since it's only used for an error message, but it causes a static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-03-11 17:21:59 +01:00
Anand Jain	ebb8765b2d	btrfs: move btrfs_compression_type to compression.h So that its better organized. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-03-11 17:12:46 +01:00
Anand Jain	8ae1af3cd1	btrfs: rename btrfs_print_info to btrfs_print_mod_info So that it indicates what it does. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-03-11 17:12:46 +01:00
Satoru Takeuchi	3c1d84b71e	Btrfs: Show a warning message if one of objectid reaches its highest value It's better to show a warning message for the exceptional case that one of objectid (in most case, inode number) reaches its highest value. For example, if inode cache is off and this event happens, we can't create any file even if there are not so many files. This message ease detecting such problem. Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-03-11 17:12:35 +01:00
Rasmus Villemoes	02def69fae	btrfs: use kbasename in btrfsic_mount This is more readable. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-03-11 16:55:52 +01:00
Wiebe, Wladislav (Nokia - DE/Ulm)	764fd639d7	pstore: Add support for 64 Bit address space Some architectures have their reserved RAM in 64 Bit address space. Therefore convert mem_address module parameter to ullong. Signed-off-by: Wladislav Wiebe <wladislav.wiebe@nokia.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>	2016-03-10 09:43:36 -08:00
Geliang Tang	a8ed9b8695	ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry() BUFFER_TRACE info "call ext4_handle_dirty_metadata" doesn't match the code, so drop it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-10 00:18:57 -05:00
Adam Buchbinder	b8a07463c8	ext4: fix misspellings in comments. Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-09 23:49:05 -05:00
OGAWA Hirofumi	c0a2ad9b50	jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path On umount path, jbd2_journal_destroy() writes latest transaction ID (->j_tail_sequence) to be used at next mount. The bug is that ->j_tail_sequence is not holding latest transaction ID in some cases. So, at next mount, there is chance to conflict with remaining (not overwritten yet) transactions. mount (id=10) write transaction (id=11) write transaction (id=12) umount (id=10) <= the bug doesn't write latest ID mount (id=10) write transaction (id=11) crash mount [recovery process] transaction (id=11) transaction (id=12) <= valid transaction ID, but old commit must not replay Like above, this bug become the cause of recovery failure, or FS corruption. So why ->j_tail_sequence doesn't point latest ID? Because if checkpoint transactions was reclaimed by memory pressure (i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated. (And another case is, __jbd2_journal_clean_checkpoint_list() is called with empty transaction.) So in above cases, ->j_tail_sequence is not pointing latest transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not done too. So, to fix this problem with minimum changes, this patch updates ->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes, some optimizations would be possible to avoid unnecessary REQ_FLUSH for example though.) BTW, journal->j_tail_sequence = ++journal->j_transaction_sequence; Increment of ->j_transaction_sequence seems to be unnecessary, but ext3 does this. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org	2016-03-09 23:47:25 -05:00
Jan Kara	2d90c160e5	ext4: more efficient SEEK_DATA implementation Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as ext4_seek_data() iterates hole block-by-block. Fix the problem by using returned hole size from ext4_map_blocks() and thus skip the hole in one go. Update also SEEK_HOLE implementation to follow the same pattern as SEEK_DATA to make future maintenance easier. Furthermore we add cond_resched() to both ext4_seek_data() and ext4_seek_hole() to avoid softlockups in case evil user creates huge fragmented file and we have to go through lots of extents. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-09 23:11:13 -05:00
Jan Kara	e3fb8eb14e	ext4: cleanup handling of bh->b_state in DAX mmap ext4_dax_mmap_get_block() updates bh->b_state directly instead of using ext4_update_bh_state(). This is mostly a cosmetic issue since DAX code always passes on-stack buffer_head but clean this up to make code more uniform. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-09 23:03:27 -05:00
Jan Kara	facab4d971	ext4: return hole from ext4_map_blocks() Currently, ext4_map_blocks() just returns 0 when it finds a hole and allocation is not requested. However we have all the information available to tell how large the hole actually is and there are callers of ext4_map_blocks() which would save some block-by-block hole iteration if they knew this information. So fill in struct ext4_map_blocks even for holes with the information we have. We keep returning 0 for holes to maintain backward compatibility of the function. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-09 22:54:00 -05:00
Jan Kara	140a52508a	ext4: factor out determining of hole size ext4_ext_put_gap_in_cache() determines hole size in the extent tree, then trims this with possible delayed allocated blocks, and inserts the result into the extent status tree. Factor out determination of the size of the hole in the extent tree as we will need this information in ext4_ext_map_blocks() as well. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-09 22:46:57 -05:00
Linus Torvalds	718e47a573	This fixes a regression which crept in v4.5-rc5. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJW4N4xAAoJEPL5WVaVDYGjJQsH/i/9SP178CiaMeUp22PHmETi UpCaQd9AY3xGGIjCktL2DC4NC86fjsRMYl1FJdVMxElUx54fuEU17wEW4BZyjUhI aF9X7LfxQcxe+CRsY37ZdJ19nmE6EUZay8Vt/tB2LK/RvfruLNYmnzX5MmmjJY/S 1TKz6Jy5M0DTl+jpod2nv/xJ2j32WSPul8Un/iBinC16LPH+Q7KZRVjFLlf/krsM SvZ1G6I70P7t9HW88BO9KhiYyxxuwqWC6SSoPMKTr4WeGnYQbA2JE6PJPktqsq76 Q91ucFkkGi+DZuZe5EuDMYMBrwaHQG8hKG3ueCj/pTu9IRErW94uO++H03bichk= =Yjfq -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fix from Ted Ts'o: "This fixes a regression which crept in v4.5-rc5" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: iterate over buffer heads correctly in move_extent_per_page()	2016-03-09 19:33:05 -08:00
Jan Kara	87d8a74b56	ext4: fix setting of referenced bit in ext4_es_lookup_extent() We were setting referenced bit on the extent structure we return from ext4_es_lookup_extent() which is just a private structure on stack. Thus setting had no effect. Set the bit in the structure in the status tree instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-09 22:26:55 -05:00
Eryu Guan	6ffe77bad5	ext4: iterate over buffer heads correctly in move_extent_per_page() In commit `bcff24887d` ("ext4: don't read blocks from disk after extents being swapped") bh is not updated correctly in the for loop and wrong data has been written to disk. generic/324 catches this on sub-page block size ext4. Fixes: `bcff24887d` ("ext4: don't read blocks from disk after extentsbeing swapped") Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-09 21:37:53 -05:00
Ross Zwisler	30f471fd88	dax: check return value of dax_radix_entry() dax_pfn_mkwrite() previously wasn't checking the return value of the call to dax_radix_entry(), which was a mistake. Instead, capture this return value and return the appropriate VM_FAULT_ value. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Jan Kara	566e8dfd88	ocfs2: fix return value from ocfs2_page_mkwrite() ocfs2_page_mkwrite() could mistakenly return error code instead of mkwrite status value. Fix it. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-03-09 15:43:42 -08:00
Martin Brandenburg	acfcbaf192	orangefs: make fs_mount_pending static Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-09 13:26:39 -05:00
Martin Brandenburg	c62da5853d	orangefs: Avoid symlink upcall if target is too long. Previously the client-core detected this condition by sheer luck! Since we used strncpy, no NUL byte would be included on the name. The client-core would call strlen, which would read past the end of its buffer, but return a number large enough that the client-core would return ENAMETOOLONG. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-09 13:26:39 -05:00
Mike Marshall	162ada7764	Orangefs: improve the POSIXness of interrupted writes... Don't return EINTR on interrupted writes if some data has already been written. Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-09 13:12:37 -05:00
Mike Marshall	cf07c0bf88	Orangefs: add a new gossip statement Signed-off-by: Mike Marshall <hubcap@omnibond.com>	2016-03-09 13:11:45 -05:00
Jan Kara	600be30a8b	ext4: remove i_ioend_count Remove counter of pending io ends as it is unused. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-08 23:39:21 -05:00
Jan Kara	109811c20f	ext4: simplify io_end handling for AIO DIO When mapping blocks for direct IO, we allocate io_end structure before mapping blocks and store pointer to it in the inode. This creates a requirement that any AIO DIO using io_end must be protected by i_mutex. This created problems in the past with dioread_nolock mode which was corrupting io_end pointers. Also io_end is allocated unnecessarily in case where we don't need to convert any extents (which is a common case for example when overwriting file). We fix the problem by allocating io_end only once we return unwritten extent from block mapping function for AIO DIO (so we can save some pointless io_end allocations) and we pass pointer to it in bh->b_private which generic DIO code later passes to our end IO callback. That way we remove any need for global pointer to io_end structure and thus fix the races. The downside of this change is that the checking for unwritten IO in flight in ext4_extents_can_be_merged() is more racy since we now increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping i_data_sem. However the check has been racy already before because ext4_writepages() already increment i_unwritten after dropping i_data_sem and reserved blocks save us from hitting ENOSPC in the worst case. Signed-off-by: Jan Kara <jack@suse.cz>	2016-03-08 23:36:46 -05:00
Jan Kara	efe70c2951	ext4: move trans handling and completion deferal out of _ext4_get_block There is no need to handle starting of a transaction and deferal of DIO completion in _ext4_get_block() function. We can move this out to get block functions for direct IO that need it. That way we can add stricter checks verifying things work as we expect. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-08 23:35:46 -05:00
Jan Kara	705965bd6d	ext4: rename and split get blocks functions Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better describe what it does. Also split out get blocks functions for direct IO. Later we move functionality from _ext4_get_blocks() there. There's no functional change in this patch. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-08 23:08:10 -05:00
Jan Kara	e142d05263	ext4: use i_mutex to serialize unaligned AIO DIO Currently we've used hashed aio_mutex to serialize unaligned AIO DIO. However the code cleanups that happened after 2011 when the lock was introduced made aio_mutex acquired at almost the same places where we already have exclusion using i_mutex. So just use i_mutex for the exclusion of unaligned AIO DIO. The change moves waiting for pending unwritten extent conversion under i_mutex. That makes special handling of O_APPEND writes unnecessary and also avoids possible livelocking of unaligned AIO DIO with aligned one (nothing was preventing contiguous stream of aligned AIO DIOs to let unaligned AIO DIO wait forever). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-08 22:44:50 -05:00
Jan Kara	3bd6ad7b68	ext4: pack ioend structure better On 64-bit architectures we have two 4-byte holes in struct ext4_io_end. Order entries better to avoid this and thus make the structure occupy 64 instead of 72 bytes for 64-bit architectures. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-03-08 22:26:39 -05:00
Dave Chinner	ab9d1e4f7b	Merge branch 'xfs-misc-fixes-4.6-3' into for-next	2016-03-09 08:18:30 +11:00
Luis de Bethencourt	a5fd276bdc	xfs: remove impossible condition bp_release is set to 0 just before the breakpoint of the for loop before the conditional check (in line 458). The other breakpoint is a goto that skips the dead code. Addresses-Coverity-Id: 102338 Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-09 08:17:56 +11:00
Darrick J. Wong	30cbc591c3	xfs: check sizes of XFS on-disk structures at compile time Check the sizes of XFS on-disk structures when compiling the kernel. Use this to catch inadvertent changes in structure size due to padding and alignment issues, etc. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-09 08:15:14 +11:00
Al Viro	f93812846f	jffs2: reduce the breakage on recovery from halfway failed rename() d_instantiate(new_dentry, old_inode) is absolutely wrong thing to do - it will oops if new_dentry used to be positive, for starters. What we need is d_invalidate() the target and be done with that. Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-07 23:07:10 -05:00
Al Viro	803c00123a	ncpfs: fix a braino in OOM handling in ncp_fill_cache() Failing to allocate an inode for child means that cache for parent is incompletely populated. So it's parent directory inode ('dir') that needs NCPI_DIR_CACHE flag removed, not the child inode ('inode', which is what we'd failed to allocate in the first place). Fucked-up-in: commit `5e993e25` ("ncpfs: get rid of d_validate() nonsense") Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org # v3.19 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-07 22:25:16 -05:00
Boris BREZILLON	f5b8aa78ef	mtd: kill the ecclayout->oobavail field ecclayout->oobavail is just redundant with the mtd->oobavail field. Moreover, it prevents static const definition of ecc layouts since the NAND framework is calculating this value based on the ecclayout->oobfree field. Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com>	2016-03-07 16:23:09 -08:00
Linus Torvalds	01ffa3df22	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Overlayfs bug fixes. All marked as -stable material" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: copy new uid/gid into overlayfs runtime inode ovl: ignore lower entries when checking purity of non-directory entries ovl: fix getcwd() failure after unsuccessful rmdir ovl: fix working on distributed fs as lower layer	2016-03-07 15:23:25 -08:00
Ingo Molnar	ec87e1cf7d	Linux 4.5-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW3LO0AAoJEHm+PkMAQRiGhewIAIVHA1+qSSXEHTFeuLRuYpiz +ptQUIjPJdakWm/XqOnwSG8SWUuD4XL6ysfNmLSZIdqXYBAPpAuwT1UA2FZhz0dN soZxMNleAvzHWRDFLqwjVdOVlTxS6CTTdEQNzi+3R0ZCADllsRcuj/GBIY+M8cr6 LvxK8BnhDU+Au3gZQjaujTMO7fKG6gOq4wKz/U7RIG37A6rwW577kEfLg4ZgFwt9 RVjsky5mrX9+4l3QFtox9ZC383P/0VZ6+vXwN2QH1/joDK4EvA8pCwsGTyjRJiqi fArHbS+mHyAtbPWJmDbVlQ5dkZJAqRgtWBydjQYoC16S4Bwdce2/FbhBiTgEQAo= =sqln -----END PGP SIGNATURE----- Merge tag 'v4.5-rc7' into x86/asm, to pick up SMAP fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-07 09:27:30 +01:00
Dave Chinner	3c1a79f5ff	Merge branch 'xfs-misc-fixes-4.6-2' into for-next	2016-03-07 09:34:54 +11:00
Dave Chinner	85a9f38d38	Merge branch 'xfs-dax-fixes-4.6' into for-next	2016-03-07 09:34:31 +11:00
Dave Chinner	3d93ec0364	Merge branch 'xfs-writepage-rework-4.6' into for-next	2016-03-07 09:34:02 +11:00
Darrick J. Wong	0df61da8ac	xfs: ioends require logically contiguous file offsets We need to create a new ioend if the current writepage call isn't logically contiguous with the range contained in the previous ioend. Hopefully writepage gets called in order of increasing file offset. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-07 09:32:14 +11:00
Dave Chinner	7f0ed5461a	Merge branch 'xfs-buf-macro-cleanup-4.6' into for-next	2016-03-07 09:31:00 +11:00
Dave Chinner	a2bbcb60ff	Merge branch 'xfs-gut-icdinode-4.6' into for-next	2016-03-07 09:30:32 +11:00
Dave Chinner	6d247d47fb	Merge branch 'xfs-misc-fixes-4.6' into for-next	2016-03-07 09:30:12 +11:00
Dave Chinner	acb3e26fc3	Merge branch 'xfs-dio-fix-4.6' into for-next	2016-03-07 09:29:48 +11:00
Dave Chinner	1b186d25b0	Merge branch 'xfs-get-next-dquot-4.6' into for-next	2016-03-07 09:29:25 +11:00
Dave Chinner	c53473be45	Merge branch 'xfs-rt-fixes-4.6' into for-next	2016-03-07 09:29:04 +11:00
Dave Chinner	9deed09554	Merge branch 'xfs-torn-log-fixes-4.5' into for-next	2016-03-07 09:28:36 +11:00
Darrick J. Wong	5110cd82ca	xfs: use named array initializers for log item dumping Use named array initializers for the string arrays used to dump log items, rather than depending on the order being maintained correctly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-07 08:40:03 +11:00
Darrick J. Wong	49ca9118e6	xfs: fix computation of inode btree maxlevels Commit 88740da18[1] introduced a function to compute the maximum height of the inode btree back in 1994. Back then, apparently, the freespace and inode btrees shared the same geometry; however, it has long since been the case that the inode and freespace btrees have different record and key sizes. Therefore, we must use m_inobt_mnr if we want a correct calculation/log reservation/etc. (Yes, this bug has been around for 21 years and ten months.) (Yes, I was in middle school when this bug was committed.) [1] http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=88740da18ddd9d7ba3ebaa9502fefc6ef2fd19cd Historical-research-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-07 08:39:56 +11:00
Dave Chinner	a798011c8f	xfs: reinitialise per-AG structures if geometry changes during recovery If a crash occurs immediately after a filesystem grow operation, the updated superblock geometry is found only in the log. After we recover the log, the superblock is reread and re-initialised and so has the new geometry in memory. If the new geometry has more AGs than prior to the grow operation, then the new AGs will not have in-memory xfs_perag structurea associated with them. This will result in an oops when the first metadata buffer from a new AG is looked up in the buffer cache, as the block lies within the new geometry but then fails to find a perag structure on lookup. This is easily fixed by simply re-initialising the perag structure after re-reading the superblock at the conclusion of the first pahse of log recovery. This, however, does not fix the case of log recovery requiring access to metadata in the newly grown space. Fortunately for us, because the in-core superblock has not been updated, this will result in detection of access beyond the end of the filesystem and so recovery will fail at that point. If this proves to be a problem, then we can address it separately to the current reported issue. Reported-by: Alex Lyakas <alex@zadarastorage.com> Tested-by: Alex Lyakas <alex@zadarastorage.com> Signed-off-by: Dave Chinner <dchinner@redhat.com>	2016-03-07 08:39:36 +11:00
Brian Foster	7f6aff3a29	xfs: only run torn log write detection on dirty logs XFS uses CRC verification over a sub-range of the head of the log to detect and handle torn writes. This torn log write detection currently runs unconditionally at mount time, regardless of whether the log is dirty or clean. This is problematic in cases where a filesystem might end up being moved across different, incompatible (i.e., opposite byte-endianness) architectures. The problem lies in the fact that log data is not necessarily written in an architecture independent format. For example, certain bits of data are written in native endian format. Further, the size of certain log data structures differs (i.e., struct xlog_rec_header) depending on the word size of the cpu. This leads to false positive crc verification errors and ultimately failed mounts when a cleanly unmounted filesystem is mounted on a system with an incompatible architecture from data that was written near the head of the log. Update the log head/tail discovery code to run torn write detection only when the log is not clean. This means something other than an unmount record resides at the head of the log and log recovery is imminent. It is a requirement to run log recovery on the same type of host that had written the content of the dirty log and therefore CRC failures are legitimate corruptions in that scenario. Reported-by: Jan Beulich <JBeulich@suse.com> Tested-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-07 08:22:22 +11:00
Brian Foster	717bc0ebca	xfs: refactor in-core log state update to helper Once the record at the head of the log is identified and verified, the in-core log state is updated based on the record. This includes information such as the current head block and cycle, the start block of the last record written to the log, the tail lsn, etc. Once torn write detection is conditional, this logic will need to be reused. Factor the code to update the in-core log data structures into a new helper function. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-07 08:22:22 +11:00
Brian Foster	65b99a08b3	xfs: refactor unmount record detection into helper Once the mount sequence has identified the head and tail blocks of the physical log, the record at the head of the log is located and examined for an unmount record to determine if the log is clean. This currently occurs after torn write verification of the head region of the log. This must ultimately be separated from torn write verification and may need to be called again if the log head is walked back due to a torn write (to determine whether the new head record is an unmount record). Separate this logic into a new helper function. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-07 08:22:22 +11:00
Brian Foster	82ff6cc26e	xfs: separate log head record discovery from verification The code that locates the log record at the head of the log is buried in the log head verification function. This is fine when torn write verification occurs unconditionally, but this behavior is problematic for filesystems that might be moved across systems with different architectures. In preparation for separating examination of the log head for unmount records from torn write detection, lift the record location logic out of the log verification function and into the caller. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2016-03-07 08:22:22 +11:00
Linus Torvalds	21b27a74ec	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fix from Sage Weil: "This is a final commit we missed to align the protocol compatibility with the feature bits. It decodes a few extra fields in two different messages and reports EIO when they are used (not yet supported)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support	2016-03-06 11:31:13 -08:00
Christoph Hellwig	1ae1602de0	configfs: switch ->default groups to a linked list Replace the current NULL-terminated array of default groups with a linked list. This gets rid of lots of nasty code to size and/or dynamically allocate the array. While we're at it also provide a conveniant helper to remove the default groups. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Felipe Balbi <balbi@kernel.org> [drivers/usb/gadget] Acked-by: Joel Becker <jlbec@evilplan.org> Acked-by: Nicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com>	2016-03-06 16:11:24 +01:00
Al Viro	6c51e513a3	lookup_dcache(): lift d_alloc() into callers ... and kill need_lookup thing Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-05 20:09:32 -05:00
Al Viro	6583fe22d1	do_last(): reorder and simplify a bit bugger off on negatives a bit earlier, simplify the tests Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-05 18:14:03 -05:00
Al Viro	05ef1c50e7	Merge branch 'for-linus' into work.lookups for the sake of namei.c fixes	2016-03-05 18:10:51 -05:00
Linus Torvalds	e5322c5406	Merge branch 'for-linus2' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Round 2 of this. I cut back to the bare necessities, the patch is still larger than it usually would be at this time, due to the number of NVMe fixes in there. This pull request contains: - The 4 core fixes from Ming, that fix both problems with exceeding the virtual boundary limit in case of merging, and the gap checking for cloned bio's. - NVMe fixes from Keith and Christoph: - Regression on larger user commands, causing problems with reading log pages (for instance). This touches both NVMe, and the block core since that is now generally utilized also for these types of commands. - Hot removal fixes. - User exploitable issue with passthrough IO commands, if !length is given, causing us to fault on writing to the zero page. - Fix for a hang under error conditions - And finally, the current series regression for umount with cgroup writeback, where the final flush would happen async and hence open up window after umount where the device wasn't consistent. fsck right after umount would show this. From Tejun" * 'for-linus2' of git://git.kernel.dk/linux-block: block: support large requests in blk_rq_map_user_iov block: fix blk_rq_get_max_sectors for driver private requests nvme: fix max_segments integer truncation nvme: set queue limits for the admin queue writeback: flush inode cgroup wb switches instead of pinning super_block NVMe: Fix 0-length integrity payload NVMe: Don't allow unsupported flags NVMe: Move error handling to failed reset handler NVMe: Simplify device reset failure NVMe: Fix namespace removal deadlock NVMe: Use IDA for namespace disk naming NVMe: Don't unmap controller registers on reset block: merge: get the 1st and last bvec via helpers block: get the 1st and last bvec via helpers block: check virt boundary in bio_will_gap() block: bio: introduce helpers to get the 1st and last bvec	2016-03-04 18:17:17 -08:00
Linus Torvalds	c51797d25d	This contains two important JFFS2 fixes marked for stable: • a lock ordering problem between the page lock and the internal f->sem mutex, which was causing occasional deadlocks in garbage collection, and • a scan failure causing moved directories to sometimes end up appearing to have hard links. There are also a couple of trivial MAINTAINERS file updates. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEABECAAYFAlbaGIsACgkQdwG7hYl686OpGQCgu0l4E7cQ/v1Af9kZatj6fnzN LvcAnR3SzmiH1jxNGSY7C1mUQWosRl/9 =Ker9 -----END PGP SIGNATURE----- Merge tag 'for-linus-20160304' of git://git.infradead.org/linux-mtd Pull jffs2 fixes from David Woodhouse: "This contains two important JFFS2 fixes marked for stable: - a lock ordering problem between the page lock and the internal f->sem mutex, which was causing occasional deadlocks in garbage collection - a scan failure causing moved directories to sometimes end up appearing to have hard links. There are also a couple of trivial MAINTAINERS file updates" * tag 'for-linus-20160304' of git://git.infradead.org/linux-mtd: MAINTAINERS: add maintainer entry for FREESCALE GPMI NAND driver Fix directory hardlinks from deleted directories jffs2: Fix page lock / f->sem deadlock Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin" MAINTAINERS: update Han's email	2016-03-04 17:36:46 -08:00
Linus Torvalds	2cdcb2b5b5	Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fix from Chris Mason: "Filipe nailed down a problem where tree log replay would do some work that orphan code wasn't expecting to be done yet, leading to BUG_ON" * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix loading of orphan roots leading to BUG_ON	2016-03-04 17:31:32 -08:00
Yan, Zheng	5ea5c5e0a7	ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support Add support for the format change of MClientReply/MclientCaps. Also add code that denies access to inodes with pool_ns layouts. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2016-03-04 21:00:37 +01:00
Christoph Hellwig	c43c83a294	direct-io: only use block polling if explicitly requested Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stephen Bates <stephen.bates@pmcs.com> Tested-by: Stephen Bates <stephen.bates@pmcs.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-04 12:20:10 -05:00
Christoph Hellwig	97be7ebe53	vfs: add the RWF_HIPRI flag for preadv2/pwritev2 This adds a flag that tells the file system that this is a high priority request for which it's worth to poll the hardware. The flag is purely advisory and can be ignored if not supported. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Stephen Bates <stephen.bates@pmcs.com> Tested-by: Stephen Bates <stephen.bates@pmcs.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-03-04 12:20:10 -05:00

... 2 3 4 5 6 ...

44118 Commits