linux

Author	SHA1	Message	Date
Sougata Santra	89ac9b4d3d	hfsplus: fix longname handling Longname is not correctly handled by hfsplus driver. If an attempt to create a longname(>255) file/directory is made, it succeeds by creating a file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key. Thus leaving the volume in an inconsistent state. This patch fixes this issue. Although lookup is always called first to create a negative entry, so just doing a check in lookup would probably fix this issue. I choose to propagate error to other iops as well. Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from hfsplus_cat_build_key, to avoid unncessary branching. Thanks a lot. TEST: ------ dir="TEST_DIR" cdir=`pwd` name255="_123456789_123456789_123456789_123456789_123456789_123456789\ _123456789_123456789_123456789_123456789_123456789_123456789_123456789\ _123456789_123456789_123456789_123456789_123456789_123456789_123456789\ _123456789_123456789_123456789_123456789_123456789_1234" name256="${name255}5" mkdir $dir cd $dir touch $name255 rm -f $name255 touch $name256 ls -la cd $cdir rm -rf $dir RESULT: ------- [sougata@ultrabook tmp]$ cdir=`pwd` [sougata@ultrabook tmp]$ name255="_123456789_123456789_123456789_123456789_123456789_123456789\ > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\ > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\ > _123456789_123456789_123456789_123456789_123456789_1234" [sougata@ultrabook tmp]$ name256="${name255}5" [sougata@ultrabook tmp]$ [sougata@ultrabook tmp]$ mkdir $dir [sougata@ultrabook tmp]$ cd $dir [sougata@ultrabook TEST_DIR]$ touch $name255 [sougata@ultrabook TEST_DIR]$ rm -f $name255 [sougata@ultrabook TEST_DIR]$ touch $name256 [sougata@ultrabook TEST_DIR]$ ls -la ls: cannot access _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234: No such file or directory total 0 drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 . drwxrwxrwx 1 root root 6 Feb 20 19:56 .. -????????? ? ? ? ? ? _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234 [sougata@ultrabook TEST_DIR]$ cd $cdir [sougata@ultrabook tmp]$ rm -rf $dir rm: cannot remove `TEST_DIR': Directory not empty -ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops. This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN and incorrect keys, leaving the FS in an inconsistent state. This patch fixes this issue. Signed-off-by: Sougata Santra <sougata@tuxera.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-18 19:08:10 -08:00
Eric W. Biederman	c297abfdf1	mnt: Fix a memory stomp in umount While reviewing the code of umount_tree I realized that when we append to a preexisting unmounted list we do not change pprev of the former first item in the list. Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on the former first item of the list will stomp unmounted.first leaving it set to some random mount point which we are likely to free soon. This isn't likely to hit, but if it does I don't know how anyone could track it down. [ This happened because we don't have all the same operations for hlist's as we do for normal doubly-linked lists. In particular, list_splice() is easy on our standard doubly-linked lists, while hlist_splice() doesn't exist and needs both start/end entries of the hlist. And commit `38129a13e6` incorrectly open-coded that missing hlist_splice(). We should think about making these kinds of "mindless" conversions easier to get right by adding the missing hlist helpers - Linus ] Fixes: `38129a13e6` switch mnt_hash to hlist Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-18 11:22:02 -08:00
Linus Torvalds	44e8967d59	Ceph: remove left-over reject file Neither Sage nor I noticed that Zheng Yan had mistakenly committed fs/ceph/super.h.rej as part of commit `31c542a199` ("ceph: add inline data to pagecache"). Remove it. Requested-by: Yan, Zheng <ukernel@gmail.com> Cc: Sage Weil <sweil@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-17 18:47:01 -08:00
Linus Torvalds	57666509b7	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "The big item here is support for inline data for CephFS and for message signatures from Zheng. There are also several bug fixes, including interrupted flock request handling, 0-length xattrs, mksnap, cached readdir results, and a message version compat field. Finally there are several cleanups from Ilya, Dan, and Markus. Note that there is another series coming soon that fixes some bugs in the RBD 'lingering' requests, but it isn't quite ready yet" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) ceph: fix setting empty extended attribute ceph: fix mksnap crash ceph: do_sync is never initialized libceph: fixup includes in pagelist.h ceph: support inline data feature ceph: flush inline version ceph: convert inline data to normal data before data write ceph: sync read inline data ceph: fetch inline data when getting Fcr cap refs ceph: use getattr request to fetch inline data ceph: add inline data to pagecache ceph: parse inline data in MClientReply and MClientCaps libceph: specify position of extent operation libceph: add CREATE osd operation support libceph: add SETXATTR/CMPXATTR osd operations support rbd: don't treat CEPH_OSD_OP_DELETE as extent op ceph: remove unused stringification macros libceph: require cephx message signature by default ceph: introduce global empty snap context ceph: message versioning fixes ...	2014-12-17 16:03:12 -08:00
Linus Torvalds	87c31b39ab	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace related fixes from Eric Biederman: "As these are bug fixes almost all of thes changes are marked for backporting to stable. The first change (implicitly adding MNT_NODEV on remount) addresses a regression that was created when security issues with unprivileged remount were closed. I go on to update the remount test to make it easy to detect if this issue reoccurs. Then there are a handful of mount and umount related fixes. Then half of the changes deal with the a recently discovered design bug in the permission checks of gid_map. Unix since the beginning has allowed setting group permissions on files to less than the user and other permissions (aka ---rwx---rwx). As the unix permission checks stop as soon as a group matches, and setgroups allows setting groups that can not later be dropped, results in a situtation where it is possible to legitimately use a group to assign fewer privileges to a process. Which means dropping a group can increase a processes privileges. The fix I have adopted is that gid_map is now no longer writable without privilege unless the new file /proc/self/setgroups has been set to permanently disable setgroups. The bulk of user namespace using applications even the applications using applications using user namespaces without privilege remain unaffected by this change. Unfortunately this ix breaks a couple user space applications, that were relying on the problematic behavior (one of which was tools/selftests/mount/unprivileged-remount-test.c). To hopefully prevent needing a regression fix on top of my security fix I rounded folks who work with the container implementations mostly like to be affected and encouraged them to test the changes. > So far nothing broke on my libvirt-lxc test bed. :-) > Tested with openSUSE 13.2 and libvirt 1.2.9. > Tested-by: Richard Weinberger <richard@nod.at> > Tested on Fedora20 with libvirt 1.2.11, works fine. > Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> > Ok, thanks - yes, unprivileged lxc is working fine with your kernels. > Just to be sure I was testing the right thing I also tested using > my unprivileged nsexec testcases, and they failed on setgroup/setgid > as now expected, and succeeded there without your patches. > Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com> > I tested this with Sandstorm. It breaks as is and it works if I add > the setgroups thing. > Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :(" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Unbreak the unprivileged remount tests userns; Correct the comment in map_write userns: Allow setting gid_maps without privilege when setgroups is disabled userns: Add a knob to disable setgroups on a per user namespace basis userns: Rename id_map_mutex to userns_state_mutex userns: Only allow the creator of the userns unprivileged mappings userns: Check euid no fsuid when establishing an unprivileged uid mapping userns: Don't allow unprivileged creation of gid mappings userns: Don't allow setgroups until a gid mapping has been setablished userns: Document what the invariant required for safe unprivileged mappings. groups: Consolidate the setgroups permission checks mnt: Clear mnt_expire during pivot_root mnt: Carefully set CL_UNPRIVILEGED in clone_mnt mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers. umount: Do not allow unmounting rootfs. umount: Disallow unprivileged mount force mnt: Update unprivileged remount test mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount	2014-12-17 12:31:40 -08:00
Linus Torvalds	d6666be6f0	MTD updates for 3.19: * Add device tree support for DoC3 * SPI NOR: Refactoring, for better layering between spi-nor.c and its driver users (e.g., m25p80.c) New flash device support Support 6-byte ID strings * NAND New NAND driver for Allwinner SoC's (sunxi) GPMI NAND: add support for raw (no ECC) access, for testing purposes Add ATO manufacturer ID A few odd driver fixes * MTD tests: Allow testers to compensate for OOB bitflips in oobtest Fix a torturetest regression * nandsim: Support longer ID byte strings And more. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUj6PbAAoJEFySrpd9RFgtahcP/RGvknk9lnitaZI7+aZPP8Zs AopfiuisLNv3v87EEBAWGYjRYm6vuuhO1z3K55iOIlemBVzoMIP0jf68XGy9uXnL Ox6AHqxm55wwmc+CHry5/GssaqE6GzdPm8TBP+AGGNhHrhc+raJL55R0QJaoYVwX pUxkhWaa4lZ6CrOIMQ3n+MEnduilHZoFIcXSc1UtI0y9IXsf1m0Qs8M5jGN8BQ16 HOVNJN9wOXF89swoi7bKsyAn+QFUYgdksAisncb6b9r5Ks5KHmcOuS1LM5X9YoUr KfeogHfDM68fcaHsSvMU1xmxjXGtE+HFJE52eYNPB1fNbT3JAC13FFj92GeSsZtE ekpCQh4OPLazT/2wCUHTQwC7T1dCilwyW7VJB9MSl7fSBo9P7jIiUHxVUdM43kez Di02/XWi4IULTwrgzZiTT8yplFrVdvkmKHAAFEIoaVWiF/l4DeSodLGUw7owmNYn rJPBPQulpPHRwKZY7gThJuOUXpgbT715GSbvmPYXimHBqmViiPkrhqQ/b/v4PRRs Nlfhwbswr7WBq6vmPkd6eOyfdFANmWcZQMp/++BCdI/7mhfaik72GWMTBSuJ7hN5 PB+z95soHaKBWlbiDGGGPvuqJmPkOVq1R8itQdIYBWEh7eNSHecwVxyUJJ+V3oPv QkD7mEP2ZozZe3Ys2EJQ =gDW8 -----END PGP SIGNATURE----- Merge tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd Pull MTD updates from Brian Norris: "Summary: - Add device tree support for DoC3 - SPI NOR: Refactoring, for better layering between spi-nor.c and its driver users (e.g., m25p80.c) New flash device support Support 6-byte ID strings - NAND: New NAND driver for Allwinner SoC's (sunxi) GPMI NAND: add support for raw (no ECC) access, for testing purposes Add ATO manufacturer ID A few odd driver fixes - MTD tests: Allow testers to compensate for OOB bitflips in oobtest Fix a torturetest regression - nandsim: Support longer ID byte strings And more" * tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd: (63 commits) mtd: tests: abort torturetest on erase errors mtd: physmap_of: fix potential NULL dereference mtd: spi-nor: allow NULL as chip name and try to auto detect it mtd: nand: gpmi: add raw oob access functions mtd: nand: gpmi: add proper raw access support mtd: nand: gpmi: add gpmi_copy_bits function mtd: spi-nor: factor out write_enable() for erase commands mtd: spi-nor: add support for s25fl128s mtd: spi-nor: remove the jedec_id/ext_id mtd: spi-nor: add id/id_len for flash_info{} mtd: nand: correct the comment of function nand_block_isreserved() jffs2: Drop bogus if in comment mtd: atmel_nand: replace memcpy32_toio/memcpy32_fromio with memcpy mtd: cafe_nand: drop duplicate .write_page implementation mtd: m25p80: Add support for serial flash Spansion S25FL132K MTD: m25p80: fix inconsistency in m25p_ids compared to spi_nor_ids mtd: spi-nor: improve wait-till-ready timeout loop mtd: delete unnecessary checks before two function calls mtd: nand: omap: Fix NAND enumeration on 3430 LDP mtd: nand: add ATO manufacturer info ...	2014-12-17 09:59:26 -08:00
Linus Torvalds	c103b21c20	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse update from Miklos Szeredi: "The first part makes sure we don't hold up umount with pending async requests. In addition to being a cleanup, this is a small behavioral change (for the better) and unlikely to break anything. The second part prepares for a cleanup of the fuse device I/O code by adding a helper for simple request submission, with some savings in line numbers already realized" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: use file_inode() in fuse_file_fallocate() fuse: introduce fuse_simple_request() helper fuse: reduce max out args fuse: hold inode instead of path after release fuse: flush requests on umount fuse: don't wake up reserved req in fuse_conn_kill()	2014-12-17 09:41:32 -08:00
Yan, Zheng	0aeff37aba	ceph: fix setting empty extended attribute make sure 'value' is not null. otherwise __ceph_setxattr will remove the extended attribute. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-12-17 20:18:49 +03:00
Yan, Zheng	275dd19ea4	ceph: fix mksnap crash mksnap reply only contain 'target', does not contain 'dentry'. So it's wrong to use req->r_reply_info.head->is_dentry to detect traceless reply. Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-12-17 20:09:53 +03:00
Dan Carpenter	021b77bee2	ceph: do_sync is never initialized Probably this code was syncing a lot more often then intended because the do_sync variable wasn't set to zero. Cc: stable@vger.kernel.org # v3.11+ Fixes: `c62988ec09` ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:53 +03:00
Yan, Zheng	65a22662bf	ceph: support inline data feature Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:53 +03:00
Yan, Zheng	e20d258d73	ceph: flush inline version After converting inline data to normal data, client need to flush the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes cap messages (sent to MDS) contain inline_version and inline_data. Client always converts inline data to normal data before data write, so the inline data length part is always zero. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:53 +03:00
Yan, Zheng	28127bdd2f	ceph: convert inline data to normal data before data write Before any data write, convert inline data to normal data and set i_inline_version to CEPH_INLINE_NONE. The OSD request that saves inline data to object contains 3 operations (CMPXATTR, WRITE and SETXATTR). It compares a xattr named 'inline_version' to prevent old data overwrites newer data. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	83701246ae	ceph: sync read inline data we can't use getattr to fetch inline data while holding Fr cap, because it can cause deadlock. If we need to sync read inline data, drop cap refs first, then use getattr to fetch inline data. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	3738daa68a	ceph: fetch inline data when getting Fcr cap refs we can't use getattr to fetch inline data after getting Fcr caps, because it can cause deadlock. The solution is try bringing inline data to page cache when not holding any cap, and hope the inline data page is still there after getting the Fcr caps. If the page is still there, pin it in page cache for later IO. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	01deead041	ceph: use getattr request to fetch inline data Add a new parameter 'locked_page' to ceph_do_getattr(). If inline data in getattr reply will be copied to the page. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	31c542a199	ceph: add inline data to pagecache Request reply and cap message can contain inline data. add inline data to the page cache if there is Fc cap. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	fb01d1f8b0	ceph: parse inline data in MClientReply and MClientCaps Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:52 +03:00
Yan, Zheng	715e4cd405	libceph: specify position of extent operation allow specifying position of extent operation in multi-operations osd request. This is required for cephfs to convert inline data to normal data (compare xattr, then write object). Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:52 +03:00
Ilya Dryomov	ca3995ad13	ceph: remove unused stringification macros These were used to report git versions a long time ago. Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:51 +03:00
Yan, Zheng	97c85a828f	ceph: introduce global empty snap context Current snaphost code does not properly handle moving inode from one empty snap realm to another empty snap realm. After changing inode's snap realm, some dirty pages' snap context can be not equal to inode's i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs() The fix is introduce a global empty snap context for all empty snap realm. This avoids triggering the BUG() for filesystem with no snapshot. Fixes: http://tracker.ceph.com/issues/9928 Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:51 +03:00
John Spray	7cfa0313d0	ceph: message versioning fixes There were two places we were assigning version in host byte order instead of network byte order. Also in MSG_CLIENT_SESSION we weren't setting compat_version in the header to reflect continued compatability with older MDSs. Fixes: http://tracker.ceph.com/issues/9945 Signed-off-by: John Spray <john.spray@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>	2014-12-17 20:09:51 +03:00
Yan, Zheng	33d0733796	libceph: message signature support Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:50 +03:00
SF Markus Elfring	e96a650a81	ceph, rbd: delete unnecessary checks before two function calls The functions ceph_put_snap_context() and iput() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> [idryomov@redhat.com: squashed rbd.c hunk, changelog] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2014-12-17 20:09:50 +03:00
Yan, Zheng	70db4f3629	ceph: introduce a new inode flag indicating if cached dentries are ordered After creating/deleting/renaming file, offsets of sibling dentries may change. So we can not use cached dentries to satisfy readdir. But we can still use the cached dentries to conclude -ENOENT for lookup. This patch introduces a new inode flag indicating if child dentries are ordered. The flag is set at the same time marking a directory complete. After creating/deleting/renaming file, we clear the flag on directory inode. This prevents ceph_readdir() from using cached dentries to satisfy readdir syscall. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:50 +03:00
Yan, Zheng	9280be24dc	ceph: fix file lock interruption When a lock operation is interrupted, current code sends a unlock request to MDS to undo the lock operation. This method does not work as expected because the unlock request can drop locks that have already been acquired. The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR requests to interrupt blocked file lock request. These requests do not drop locks that have alread been acquired, they only interrupt blocked file lock request. Signed-off-by: Yan, Zheng <zyan@redhat.com>	2014-12-17 20:09:49 +03:00
Dmitry V. Levin	9d4d65748a	vfs: make mounts and mountstats honor root dir like mountinfo does As we already show mountpoints relative to the root directory, thanks to the change made back in 2000, change show_vfsmnt() and show_vfsstat() to skip out-of-root mountpoints the same way as show_mountinfo() does. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-17 08:27:15 -05:00
Dmitry V. Levin	9ad4dc4f73	vfs: cleanup show_mountinfo Starting with commit v3.2-rc4-1-g02125a8, seq_path_root() no longer changes the value of its "struct path root" argument. Starting with commit v3.2-rc7-104-g8c9379e, the "struct path root" argument of seq_path_root() is const. As result, the temporary variable "root" in show_mountinfo() that holds a copy of struct path root is no longer needed. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-17 08:27:15 -05:00
Al Viro	7d65cf10e3	unfuck binfmt_misc.c (broken by commit `e6084d4`) scanarg(s, del) never returns s; the empty field results in s + 1. Restore the correct checks, and move NUL-termination into scanarg(), while we are at it. Incidentally, mixing "coding style cleanups" (for small values of cleanup) with functional changes is a Bad Idea(tm)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-17 08:27:14 -05:00
Al Viro	50062175ff	vm_area_operations: kill ->migrate() the only instance this method has ever grown was one in kernfs - one that call ->migrate() of another vm_ops if it exists. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-17 08:26:51 -05:00
Al Viro	b1bc6d7f16	move_extent_per_page(): get rid of unused w_flags ... and comparing get_fs() with KERNEL_DS used only to initialize that Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-17 06:43:56 -05:00
Al Viro	98af592f5b	btrfs: filp_open() returns ERR_PTR() on failure, not NULL... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-17 06:43:56 -05:00
Linus Torvalds	603ba7e41b	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile #2 from Al Viro: "Next pile (and there'll be one or two more). The large piece in this one is getting rid of /proc//ns/ weirdness; among other things, it allows to (finally) make nameidata completely opaque outside of fs/namei.c, making for easier further cleanups in there" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coda_venus_readdir(): use file_inode() fs/namei.c: fold link_path_walk() call into path_init() path_init(): don't bother with LOOKUP_PARENT in argument fs/namei.c: new helper (path_cleanup()) path_init(): store the "base" pointer to file in nameidata itself make default ->i_fop have ->open() fail with ENXIO make nameidata completely opaque outside of fs/namei.c kill proc_ns completely take the targets of /proc//ns/ symlinks to separate fs bury struct proc_ns in fs/proc copy address of proc_ns_ops into ns_common new helpers: ns_alloc_inum/ns_free_inum make proc_ns_operations work with struct ns_common * instead of void * switch the rest of proc_ns_operations to working with &...->ns netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns common object embedded into various struct ....ns	2014-12-16 15:53:03 -08:00
Linus Torvalds	31f48fc8f2	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull isofs and reiserfs fixes from Jan Kara: "A reiserfs and an isofs fix. They arrived after I sent you my first pull request and I don't want to delay them unnecessarily till rc2" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: isofs: Fix infinite looping over CE entries reiserfs: destroy allocated commit workqueue	2014-12-16 15:46:01 -08:00
Linus Torvalds	0b233b7c79	Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "A comparatively quieter cycle for nfsd this time, but still with two larger changes: - RPC server scalability improvements from Jeff Layton (using RCU instead of a spinlock to find idle threads). - server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna Schumaker, enabling fallocate on new clients" * 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits) nfsd4: fix xdr4 count of server in fs_location4 nfsd4: fix xdr4 inclusion of escaped char sunrpc/cache: convert to use string_escape_str() sunrpc: only call test_bit once in svc_xprt_received fs: nfsd: Fix signedness bug in compare_blob sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt sunrpc: convert to lockless lookup of queued server threads sunrpc: fix potential races in pool_stats collection sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it sunrpc: require svc_create callers to pass in meaningful shutdown routine sunrpc: have svc_wake_up only deal with pool 0 sunrpc: convert sp_task_pending flag to use atomic bitops sunrpc: move rq_cachetype field to better optimize space sunrpc: move rq_splice_ok flag into rq_flags sunrpc: move rq_dropme flag into rq_flags sunrpc: move rq_usedeferral flag to rq_flags sunrpc: move rq_local field to rq_flags sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it nfsd: minor off by one checks in __write_versions() sunrpc: release svc_pool_map reference when serv allocation fails ...	2014-12-16 15:25:31 -08:00
Jan Kara	f54e18f1b8	isofs: Fix infinite looping over CE entries Rock Ridge extensions define so called Continuation Entries (CE) which define where is further space with Rock Ridge data. Corrupted isofs image can contain arbitrarily long chain of these, including a one containing loop and thus causing kernel to end in an infinite loop when traversing these entries. Limit the traversal to 32 entries which should be more than enough space to store all the Rock Ridge data. Reported-by: P J P <ppandit@redhat.com> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>	2014-12-15 15:53:26 +01:00
Linus Torvalds	67e2c38838	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer updates from James Morris: "In terms of changes, there's general maintenance to the Smack, SELinux, and integrity code. The IMA code adds a new kconfig option, IMA_APPRAISE_SIGNED_INIT, which allows IMA appraisal to require signatures. Support for reading keys from rootfs before init is call is also added" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits) selinux: Remove security_ops extern security: smack: fix out-of-bounds access in smk_parse_smack() VFS: refactor vfs_read() ima: require signature based appraisal integrity: provide a hook to load keys when rootfs is ready ima: load x509 certificate from the kernel integrity: provide a function to load x509 certificate from the kernel integrity: define a new function integrity_read_file() Security: smack: replace kzalloc with kmem_cache for inode_smack Smack: Lock mode for the floor and hat labels ima: added support for new kernel cmdline parameter ima_template_fmt ima: allocate field pointers array on demand in template_desc_init_fields() ima: don't allocate a copy of template_fmt in template_desc_init_fields() ima: display template format in meas. list if template name length is zero ima: added error messages to template-related functions ima: use atomic bit operations to protect policy update interface ima: ignore empty and with whitespaces policy lines ima: no need to allocate entry for comment ima: report policy load status ima: use path names cache ...	2014-12-14 20:36:37 -08:00
Linus Torvalds	e6b5be2be4	Driver core patches for 3.19-rc1 Here's the set of driver core patches for 3.19-rc1. They are dominated by the removal of the .owner field in platform drivers. They touch a lot of files, but they are "simple" changes, just removing a line in a structure. Other than that, a few minor driver core and debugfs changes. There are some ath9k patches coming in through this tree that have been acked by the wireless maintainers as they relied on the debugfs changes. Everything has been in linux-next for a while. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM 53kAoLeteByQ3iVwWurwwseRPiWa8+MI =OVRS -----END PGP SIGNATURE----- Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core update from Greg KH: "Here's the set of driver core patches for 3.19-rc1. They are dominated by the removal of the .owner field in platform drivers. They touch a lot of files, but they are "simple" changes, just removing a line in a structure. Other than that, a few minor driver core and debugfs changes. There are some ath9k patches coming in through this tree that have been acked by the wireless maintainers as they relied on the debugfs changes. Everything has been in linux-next for a while" * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits) Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries" fs: debugfs: add forward declaration for struct device type firmware class: Deletion of an unnecessary check before the function call "vunmap" firmware loader: fix hung task warning dump devcoredump: provide a one-way disable function device: Add dev_<level>_once variants ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries ath: use seq_file api for ath9k debugfs files debugfs: add helper function to create device related seq_file drivers/base: cacheinfo: remove noisy error boot message Revert "core: platform: add warning if driver has no owner" drivers: base: support cpu cache information interface to userspace via sysfs drivers: base: add cpu_device_create to support per-cpu devices topology: replace custom attribute macros with standard DEVICE_ATTR* cpumask: factor out show_cpumap into separate helper function driver core: Fix unbalanced device reference in drivers_probe driver core: fix race with userland in device_add() sysfs/kernfs: make read requests on pre-alloc files use the buffer. sysfs/kernfs: allow attributes to request write buffer be pre-allocated. fs: sysfs: return EGBIG on write if offset is larger than file size ...	2014-12-14 16:10:09 -08:00
Linus Torvalds	7a02d08969	These patches optionally add LZ4 compression support to Squashfs. LZ4 is a lightweight compression algorithm which can be used on embedded systems to reduce CPU and memory overhead (in comparison to the standard zlib compression). These patches add the wrapper code to allow Squashfs to use the existing LZ4 decompression code, and the necessary configuration option. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJUjgIJAAoJEJAch/D1fbHUArwP/iDiDSpqxdfTQHwUKxB57skO 0iBzg6bXPlqwmsnllegg0SV0vOvFjpWiXWpOOtAVCJlfbov8DgUndsigpvG3UhD/ qJpNXAJ+xSdOzBqj3bS7SOu2DwPY8Gz4rxGQcNN3PsuOVR/EUgAnNlv22ZHY10A5 XQVyPbkwZ73TrZ2uKA8leWArFtCbM4oYGpxP+ramEox8nVFEOtixn5IcX5WkbGEL Yt0NRw8K8vDIIETWVariugUFE4C1olFk+YmqqAw7cmDGJ70cEg5jh9ocNkwDIZPj I9BNtkggBRMaCPwGsH6IvahMFUyWLQUgGayfY/fgbRiB9ZuYIQ1lyPDhzbgWczoE o34eXAIDdmfPrmYlDEBDkYnXXtwuYqdOVYOtcEnyFEYqpHfaeS2h2s9nTiM+rz21 v0UEaDRmPtlkK/ZdLKUsrOf+8y9ejkT0R67swFaguHshL6EHey7X5ghmOuwCoL9x fzGWtPFR+Nbqga5T3dwf+apvyUVrPaw6gZu36NNim2779ZgpnIPzW6MEYUMhtXCn ef2+NvS9AeGyo7kiqlNQrihQWZSN0W/AiVsEeulzk5h+adzSNQ5eipzAO9DAAp16 8muY4nq51bOGVaWzqJz/KacCmt7i0qUdmS1p4l2uqPp9gH/s/S91yrYn/iszf3AV CpwU2i9g3nQu9ecDc1Os =mr7J -----END PGP SIGNATURE----- Merge tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next Pull squashfs update from Phillip Lougher: "These patches optionally add LZ4 compression support to Squashfs. LZ4 is a lightweight compression algorithm which can be used on embedded systems to reduce CPU and memory overhead (in comparison to the standard zlib compression). These patches add the wrapper code to allow Squashfs to use the existing LZ4 decompression code, and the necessary configuration option" * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next: Squashfs: Add LZ4 compression configuration option Squashfs: add LZ4 compression support	2014-12-14 14:42:53 -08:00
Linus Torvalds	7d22286ff7	Merge git://git.kvack.org/~bcrl/aio-next Pull aio updates from Benjamin LaHaise. * git://git.kvack.org/~bcrl/aio-next: aio: Skip timer for io_getevents if timeout=0 aio: Make it possible to remap aio ring	2014-12-14 13:36:57 -08:00
Kevin Cernekee	97c7134ae2	Fix signed/unsigned pointer warning Commit `2ae83bf938` ("[CIFS] Fix setting time before epoch (negative time values)") changed "u64 t" to "s64 t", which makes do_div() complain about a pointer signedness mismatch: CC fs/cifs/netmisc.o In file included from ./arch/mips/include/asm/div64.h:12:0, from include/linux/kernel.h:124, from include/linux/list.h:8, from include/linux/wait.h:6, from include/linux/net.h:23, from fs/cifs/netmisc.c:25: fs/cifs/netmisc.c: In function ‘cifs_NTtimeToUnix’: include/asm-generic/div64.h:43:28: warning: comparison of distinct pointer types lacks a cast [enabled by default] (void)(((typeof((n)) )0) == ((uint64_t )0)); \ ^ fs/cifs/netmisc.c:941:22: note: in expansion of macro ‘do_div’ ts.tv_nsec = (long)do_div(t, 10000000) * 100; Introduce a temporary "u64 abs_t" variable to fix this. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Signed-off-by: Steve French <steve.french@primarydata.com>	2014-12-14 14:55:57 -06:00
Sachin Prabhu	9235d09873	Convert MessageID in smb2_hdr to LE We have encountered failures when When testing smb2 mounts on ppc64 machines when using both Samba as well as Windows 2012. On poking around, the problem was determined to be caused by the high endian MessageID passed in the header for smb2. On checking the corresponding MID for smb1 is converted to LE before being sent on the wire. We have tested this patch successfully on a ppc64 machine. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>	2014-12-14 14:55:45 -06:00
Fam Zheng	5f785de588	aio: Skip timer for io_getevents if timeout=0 In this case, it is basically a polling. Let's not involve timer at all because that would hurt performance for application event loops. In an arbitrary test I've done, io_getevents syscall elapsed time reduces from 50000+ nanoseconds to a few hundereds. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-12-13 17:50:20 -05:00
Pavel Emelyanov	e4a0d3e720	aio: Make it possible to remap aio ring There are actually two issues this patch addresses. Let me start with the one I tried to solve in the beginning. So, in the checkpoint-restore project (criu) we try to dump tasks' state and restore one back exactly as it was. One of the tasks' state bits is rings set up with io_setup() call. There's (almost) no problems in dumping them, there's a problem restoring them -- if I dump a task with aio ring originally mapped at address A, I want to restore one back at exactly the same address A. Unfortunately, the io_setup() does not allow for that -- it mmaps the ring at whatever place mm finds appropriate (it calls do_mmap_pgoff() with zero address and without the MAP_FIXED flag). To make restore possible I'm going to mremap() the freshly created ring into the address A (under which it was seen before dump). The problem is that the ring's virtual address is passed back to the user-space as the context ID and this ID is then used as search key by all the other io_foo() calls. Reworking this ID to be just some integer doesn't seem to work, as this value is already used by libaio as a pointer using which this library accesses memory for aio meta-data. So, to make restore work we need to make sure that a) ring is mapped at desired virtual address b) kioctx->user_id matches this value Having said that, the patch makes mremap() on aio region update the kioctx's user_id and mmap_base values. Here appears the 2nd issue I mentioned in the beginning of this mail. If (regardless of the C/R dances I do) someone creates an io context with io_setup(), then mremap()-s the ring and then destroys the context, the kill_ioctx() routine will call munmap() on wrong (old) address. This will result in a) aio ring remaining in memory and b) some other vma get unexpectedly unmapped. What do you think? Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>	2014-12-13 17:49:50 -05:00
Linus Torvalds	caf292ae5b	Merge branch 'for-3.19/core' of git://git.kernel.dk/linux-block Pull block driver core update from Jens Axboe: "This is the pull request for the core block IO changes for 3.19. Not a huge round this time, mostly lots of little good fixes: - Fix a bug in sysfs blktrace interface causing a NULL pointer dereference, when enabled/disabled through that API. From Arianna Avanzini. - Various updates/fixes/improvements for blk-mq: - A set of updates from Bart, mostly fixing buts in the tag handling. - Cleanup/code consolidation from Christoph. - Extend queue_rq API to be able to handle batching issues of IO requests. NVMe will utilize this shortly. From me. - A few tag and request handling updates from me. - Cleanup of the preempt handling for running queues from Paolo. - Prevent running of unmapped hardware queues from Ming Lei. - Move the kdump memory limiting check to be in the correct location, from Shaohua. - Initialize all software queues at init time from Takashi. This prevents a kobject warning when CPUs are brought online that weren't online when a queue was registered. - Single writeback fix for I_DIRTY clearing from Tejun. Queued with the core IO changes, since it's just a single fix. - Version X of the __bio_add_page() segment addition retry from Maurizio. Hope the Xth time is the charm. - Documentation fixup for IO scheduler merging from Jan. - Introduce (and use) generic IO stat accounting helpers for non-rq drivers, from Gu Zheng. - Kill off artificial limiting of max sectors in a request from Christoph" * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits) bio: modify __bio_add_page() to accept pages that don't start a new segment blk-mq: Fix uninitialized kobject at CPU hotplugging blktrace: don't let the sysfs interface remove trace from running list blk-mq: Use all available hardware queues blk-mq: Micro-optimize bt_get() blk-mq: Fix a race between bt_clear_tag() and bt_get() blk-mq: Avoid that __bt_get_word() wraps multiple times blk-mq: Fix a use-after-free blk-mq: prevent unmapped hw queue from being scheduled blk-mq: re-check for available tags after running the hardware queue blk-mq: fix hang in bt_get() blk-mq: move the kdump check to blk_mq_alloc_tag_set blk-mq: cleanup tag free handling blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq <-> cpu map blk: introduce generic io stat accounting help function blk-mq: handle the single queue case in blk_mq_hctx_next_cpu genhd: check for int overflow in disk_expand_part_tbl() blk-mq: add blk_mq_free_hctx_request() blk-mq: export blk_mq_free_request() blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable ...	2014-12-13 14:14:23 -08:00
Jan Kara	37d469e767	fsnotify: remove destroy_list from fsnotify_mark destroy_list is used to track marks which still need waiting for srcu period end before they can be freed. However by the time mark is added to destroy_list it isn't in group's list of marks anymore and thus we can reuse fsnotify_mark->g_list for queueing into destroy_list. This saves two pointers for each fsnotify_mark. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:53 -08:00
Jan Kara	0809ab69a2	fsnotify: unify inode and mount marks handling There's a lot of common code in inode and mount marks handling. Factor it out to a common helper function. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:53 -08:00
Heinrich Schuchardt	820c12d5d6	fallocate: create FAN_MODIFY and IN_MODIFY events The fanotify and the inotify API can be used to monitor changes of the file system. System call fallocate() modifies files. Hence it should trigger the corresponding fanotify (FAN_MODIFY) and inotify (IN_MODIFY) events. The most interesting case is FALLOC_FL_COLLAPSE_RANGE because this value allows to create arbitrary file content from random data. This patch adds the missing call to fsnotify_modify(). The FAN_MODIFY and IN_MODIFY event will be created when fallocate() succeeds. It will even be created if the file length remains unchanged, e.g. when calling fanotify with flag FALLOC_FL_KEEP_SIZE. This logic was primarily chosen to keep the coding simple. It resembles the logic of the write() system call. When we call write() we always create a FAN_MODIFY event, even in the case of overwriting with identical data. Events FAN_MODIFY and IN_MODIFY do not provide any guarantee that data was actually changed. Furthermore even if if the filesize remains unchanged, fallocate() may influence whether a subsequent write() will succeed and hence the fallocate() call may be considered a modification. The fallocate(2) man page teaches: After a successful call, subsequent writes into the range specified by offset and len are guaranteed not to fail because of lack of disk space. So calling fallocate(fd, FALLOC_FL_KEEP_SIZE, offset, len) may result in different outcomes of a subsequent write depending on the values of offset and len. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jan Kara <jack@suse.cz> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@parisplace.org> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:53 -08:00
Fabian Frederick	92cab82b2c	fs/affs/file.c: remove obsolete pagesize check linux kernel doesn't manage page sizes below 4kb. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:52 -08:00
Fabian Frederick	9abb408307	fs/affs/file.c: add support to O_DIRECT Based on ext2_direct_IO Tested with O_DIRECT file open and sysbench/mariadb with 1% written queries improvement (update_non_index test) on a volume created with mkaffs. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
Fabian Frederick	1ee54b099a	fs/affs/amigaffs.c: use va_format instead of buffer/vnsprintf -Remove ErrorBuffer and use %pV -Add __printf to enable argument mistmatch warnings Original patch by Joe Perches. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
Fabian Frederick	7633978b43	fs/affs/file.c: forward declaration clean-up -Move file_operations to avoid forward declarations. -Remove unused declarations. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
David Drysdale	51f39a1f0c	syscalls: implement execveat() system call This patchset adds execveat(2) for x86, and is derived from Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). The primary aim of adding an execveat syscall is to allow an implementation of fexecve(3) that does not rely on the /proc filesystem, at least for executables (rather than scripts). The current glibc version of fexecve(3) is implemented via /proc, which causes problems in sandboxed or otherwise restricted environments. Given the desire for a /proc-free fexecve() implementation, HPA suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be an appropriate generalization. Also, having a new syscall means that it can take a flags argument without back-compatibility concerns. The current implementation just defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be added in future -- for example, flags for new namespaces (as suggested at https://lkml.org/lkml/2006/7/11/474). Related history: - https://lkml.org/lkml/2006/12/27/123 is an example of someone realizing that fexecve() is likely to fail in a chroot environment. - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered documenting the /proc requirement of fexecve(3) in its manpage, to "prevent other people from wasting their time". - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a problem where a process that did setuid() could not fexecve() because it no longer had access to /proc/self/fd; this has since been fixed. This patch (of 4): Add a new execveat(2) system call. execveat() is to execve() as openat() is to open(): it takes a file descriptor that refers to a directory, and resolves the filename relative to that. In addition, if the filename is empty and AT_EMPTY_PATH is specified, execveat() executes the file to which the file descriptor refers. This replicates the functionality of fexecve(), which is a system call in other UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and so relies on /proc being mounted). The filename fed to the executed program as argv[0] (or the name of the script fed to a script interpreter) will be of the form "/dev/fd/<fd>" (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively reflecting how the executable was found. This does however mean that execution of a script in a /proc-less environment won't work; also, script execution via an O_CLOEXEC file descriptor fails (as the file will not be accessible after exec). Based on patches by Meredydd Luff. Signed-off-by: David Drysdale <drysdale@google.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@aerifal.cx> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
Namjae Jeon	c0ef0cc9d2	fat: fix data past EOF resulting from fsx testsuite When running FSX with direct I/O mode, fsx resulted in DATA past EOF issues. fsx ./file2 -Z -r 4096 -w 4096 ... .. truncating to largest ever: 0x907c fallocating to largest ever: 0x11137 truncating to largest ever: 0x2c6fe truncating to largest ever: 0x2cfdf fallocating to largest ever: 0x40000 Mapped Read: non-zero data past EOF (0x18628) page offset 0x629 is 0x2a4e ... .. The reason being, it is doing a truncate down, but the zeroing does not happen on the last block boundary when offset is not aligned. Even though it calls truncate_setsize()->truncate_inode_pages()-> truncate_inode_pages_range() and considers the partial zeroout but it retrieves the page using find_lock_page() - which only looks the page in the cache. So, zeroing out does not happen in case of direct IO. Make a truncate page based around block_truncate_page for FAT filesystem and invoke that helper to zerout in case the offset is not aligned with the blocksize. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
Jan Kara	f441ada004	befs: remove dead code Coverity id: 1042674 Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
David Rientjes	5cec38ac86	fs, seq_file: fallback to vmalloc instead of oom kill processes Since commit `058504edd0` ("fs/seq_file: fallback to vmalloc allocation"), seq_buf_alloc() falls back to vmalloc() when the kmalloc() for contiguous memory fails. This was done to address order-4 slab allocations for reading /proc/stat on large machines and noticed because PAGE_ALLOC_COSTLY_ORDER < 4, so there is no infinite loop in the page allocator when allocating new slab for such high-order allocations. Contiguous memory isn't necessary for caller of seq_buf_alloc(), however. Other GFP_KERNEL high-order allocations that are <= PAGE_ALLOC_COSTLY_ORDER will simply loop forever in the page allocator and oom kill processes as a result. We don't want to kill processes so that we can allocate contiguous memory in situations when contiguous memory isn't necessary. This patch does the kmalloc() allocation with __GFP_NORETRY for high-order allocations. This still utilizes memory compaction and direct reclaim in the allocation path, the only difference is that it will fail immediately instead of oom kill processes when out of memory. [akpm@linux-foundation.org: add comment] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:49 -08:00
Johannes Weiner	6b4f7799c6	mm: vmscan: invoke slab shrinkers from shrink_zone() The slab shrinkers are currently invoked from the zonelist walkers in kswapd, direct reclaim, and zone reclaim, all of which roughly gauge the eligible LRU pages and assemble a nodemask to pass to NUMA-aware shrinkers, which then again have to walk over the nodemask. This is redundant code, extra runtime work, and fairly inaccurate when it comes to the estimation of actually scannable LRU pages. The code duplication will only get worse when making the shrinkers cgroup-aware and requiring them to have out-of-band cgroup hierarchy walks as well. Instead, invoke the shrinkers from shrink_zone(), which is where all reclaimers end up, to avoid this duplication. Take the count for eligible LRU pages out of get_scan_count(), which considers many more factors than just the availability of swap space, like zone_reclaimable_pages() currently does. Accumulate the number over all visited lruvecs to get the per-zone value. Some nodes have multiple zones due to memory addressing restrictions. To avoid putting too much pressure on the shrinkers, only invoke them once for each such node, using the class zone of the allocation as the pivot zone. For now, this integrates the slab shrinking better into the reclaim logic and gets rid of duplicative invocations from kswapd, direct reclaim, and zone reclaim. It also prepares for cgroup-awareness, allowing memcg-capable shrinkers to be added at the lruvec level without much duplication of both code and runtime work. This changes kswapd behavior, which used to invoke the shrinkers for each zone, but with scan ratios gathered from the entire node, resulting in meaningless pressure quantities on multi-zone nodes. Zone reclaim behavior also changes. It used to shrink slabs until the same amount of pages were shrunk as were reclaimed from the LRUs. Now it merely invokes the shrinkers once with the zone's scan ratio, which makes the shrinkers go easier on caches that implement aging and would prefer feeding back pressure from recently used slab objects to unused LRU pages. [vdavydov@parallels.com: assure class zone is populated] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:48 -08:00
Davidlohr Bueso	c8c06efa8b	mm: convert i_mmap_mutex to rwsem The i_mmap_mutex is a close cousin of the anon vma lock, both protecting similar data, one for file backed pages and the other for anon memory. To this end, this lock can also be a rwsem. In addition, there are some important opportunities to share the lock when there are no tree modifications. This conversion is straightforward. For now, all users take the write lock. [sfr@canb.auug.org.au: update fremap.c] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:45 -08:00
Davidlohr Bueso	83cde9e8ba	mm: use new helper functions around the i_mmap_mutex Convert all open coded mutex_lock/unlock calls to the i_mmap_[lock/unlock]_write() helpers. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:45 -08:00
Thomas Gleixner	c291ee6221	genirq: Prevent proc race against freeing of irq descriptors Since the rework of the sparse interrupt code to actually free the unused interrupt descriptors there exists a race between the /proc interfaces to the irq subsystem and the code which frees the interrupt descriptor. CPU0 CPU1 show_interrupts() desc = irq_to_desc(X); free_desc(desc) remove_from_radix_tree(); kfree(desc); raw_spinlock_irq(&desc->lock); /proc/interrupts is the only interface which can actively corrupt kernel memory via the lock access. /proc/stat can only read from freed memory. Extremly hard to trigger, but possible. The interfaces in /proc/irq/N/ are not affected by this because the removal of the proc file is serialized in procfs against concurrent readers/writers. The removal happens before the descriptor is freed. For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue as the descriptor is never freed. It's merely cleared out with the irq descriptor lock held. So any concurrent proc access will either see the old correct value or the cleared out ones. Protect the lookup and access to the irq descriptor in show_interrupts() with the sparse_irq_lock. Provide kstat_irqs_usr() which is protecting the lookup and access with sparse_irq_lock and switch /proc/stat to use it. Document the existing kstat_irqs interfaces so it's clear that the caller needs to take care about protection. The users of these interfaces are either not affected due to SPARSE_IRQ=n or already protected against removal. Fixes: `1f5a5b87f7` "genirq: Implement a sane sparse_irq allocator" Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org	2014-12-13 13:33:07 +01:00
Jiri Slaby	fa0c554073	reiserfs: destroy allocated commit workqueue When resirefs is trying to mount a partition, it creates a commit workqueue (sbi->commit_wq). But when mount fails later, the workqueue is not freed. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-by: auxsvr@gmail.com Reported-by: Benoît Monin <benoit.monin@gmx.fr> Cc: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org # >= 3.16 Cc: reiserfs-devel@vger.kernel.org Fixes: `797d9016ce` Signed-off-by: Jan Kara <jack@suse.cz>	2014-12-12 22:18:07 +01:00
Linus Torvalds	6ce4436c9c	Couple of pstore-ram enhancements to allow use of different memory attributes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUi0B6AAoJEKurIx+X31iByH8P/jfMgzyUO+KpJMA1DbgCAG7x WPJgbMUyPwB63DH09RyMEmiwf61Rl1klXTPVNY0Dnj7qRJOmpB9U3vGIfO4HpD84 5IZMBlc+Jl+kJCxSAJYbTJTZLsIMjFGOfuVTvlY+HnMBitQVBumKptmC0DoBBqgz yYy5MHRMaVoHcogyMyBiknmxdxu6/ruUKY+6yyvdUESt0SCcJG8V6Qik7TMmnx47 NvIIPzfibvvLLnd8IOEj2fwh8XMtJdfcCxPpAEvEaNq0jZEDF9K22jttTQvl9r92 NQf7JKQQrNfzloRZ3flKax5ZMGi9RkcirTLLdJ4I2xMGVHOA4XUAjsSCYR6INuuJ Ox00FnuiIrADNw37m52Y+ujPTF1C2PQUNK69gwsLd84MSjy+95F2dlC5cC3Yt4N5 rpstXxWELZTqjMGD8GTPOpv6zlg799IbFexr4H6KTc+47EX0MNayJiI6L597gYnq gIiPmDnnz6WlWp4HHgBIwjNAH3Tbf/uU3MlgzqS3Ftd7YkYmLnxvClhrwgErviFn Nfnz2LtGuMxMHSt0uSWxODVEaR4reKRVJBvhRSGWL1PufylEyt0YWayiqpohuKD9 6X/RufWK5qdCBHytoGyMUZ57oqxth9QSVG4RBkGPmaZgMq/5DdyOhBfW0yInjMuo AuDMmqrU5yFTitLMGcsG =kcmD -----END PGP SIGNATURE----- Merge tag 'please-pull-morepstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull pstore update #2 from Tony Luck: "Couple of pstore-ram enhancements to allow use of different memory attributes" * tag 'please-pull-morepstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: pstore-ram: Allow optional mapping with pgprot_noncached pstore-ram: Fix hangs by using write-combine mappings	2014-12-12 11:34:13 -08:00
Linus Torvalds	bdeb03cada	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "From a feature point of view, most of the code here comes from Miao Xie and others at Fujitsu to implement scrubbing and replacing devices on raid56. This has been in development for a while, and it's a big improvement. Filipe and Josef have a great assortment of fixes, many of which solve problems corruptions either after a crash or in error conditions. I still have a round two from Filipe for next week that solves corruptions with discard and block group removal" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (62 commits) Btrfs: make get_caching_control unconditionally return the ctl Btrfs: fix unprotected deletion from pending_chunks list Btrfs: fix fs mapping extent map leak Btrfs: fix memory leak after block remove + trimming Btrfs: make btrfs_abort_transaction consider existence of new block groups Btrfs: fix race between writing free space cache and trimming Btrfs: fix race between fs trimming and block group remove/allocation Btrfs, replace: enable dev-replace for raid56 Btrfs: fix freeing used extents after removing empty block group Btrfs: fix crash caused by block group removal Btrfs: fix invalid block group rbtree access after bg is removed Btrfs, raid56: fix use-after-free problem in the final device replace procedure on raid56 Btrfs, replace: write raid56 parity into the replace target device Btrfs, replace: write dirty pages into the replace target device Btrfs, raid56: support parity scrub on raid56 Btrfs, raid56: use a variant to record the operation type Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted Btrfs, raid56: don't change bbio and raid_map Btrfs: remove unnecessary code of stripe_index assignment in __btrfs_map_block Btrfs: remove noused bbio_ret in __btrfs_map_block in condition ...	2014-12-12 11:15:23 -08:00
Linus Torvalds	a7cb7bb664	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree update from Jiri Kosina: "Usual stuff: documentation updates, printk() fixes, etc" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits) intel_ips: fix a type in error message cpufreq: cpufreq-dt: Move newline to end of error message ps3rom: fix error return code treewide: fix typo in printk and Kconfig ARM: dts: bcm63138: change "interupts" to "interrupts" Replace mentions of "list_struct" to "list_head" kernel: trace: fix printk message scsi: mpt2sas: fix ioctl in comment zbud, zswap: change module author email clocksource: Fix 'clcoksource' typo in comment arm: fix wording of "Crotex" in CONFIG_ARCH_EXYNOS3 help gpio: msm-v1: make boolean argument more obvious usb: Fix typo in usb-serial-simple.c PCI: Fix comment typo 'COMFIG_PM_OPS' powerpc: Fix comment typo 'CONIFG_8xx' powerpc: Fix comment typos 'CONFiG_ALTIVEC' clk: st: Spelling s/stucture/structure/ isci: Spelling s/stucture/structure/ usb: gadget: zero: Spelling s/infrastucture/infrastructure/ treewide: Fix company name in module descriptions ...	2014-12-12 10:08:06 -08:00
Linus Torvalds	ccb5a4910d	This pull request includes the following UBI/UBIFS changes: * UBI debug messages now include the UBI device number. This change is responsible for the big diffstat since it touched every debugging print statement. * An Xattr bug-fix which fixes SELinux support * Several error path fixes in UBI/UBIFS -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJUixUjAAoJECmIfjd9wqK0n0oQAL5fAGFszUnmPa+NHi1IgDlv dUBcN9GrXM8CN5LxQX2NH4WuxyY9gZpQsZtDXolutICbHT55De/plQyJUE5XHXnq U2SHir1wsHnUeDJEqlAKE4zXWUEwY4C5mqDZh8fPUM+pyFNmlt4L/mi4hjkFmpqt 1gPqJ9boa3fwrT3jdaClJTXN5d+8Y1JahQwuSINsX6rInB/cfh2FFZ2fxWWogxvf BoN1iQdbWJrmkd2KLLbQqOeI5LwBT5jdf0Z0hkwHEsDCA0ZiKBCoQRZKjkwlaTCZ JSQ2Fv/RkUGg+YJJgC5xJnpR4VlGyn6X2z7/W5idhKzELlmHrKaw3bXZJJoTElPr kRSpcq02eF3pJKLMpvFuV6rLpqbkpGDML3+VtZ1Fta3cqQ0E9TvrSJAII5j+EiNG D03IkCVX5ozmeZr08DmSx8W7HJ4beMs5E8eDkEaAS7AhQU5pmCcai1vZJQZjrsV0 5yCmYsArN6yYS79mrH6eQuoKsJ48mOoU6zC8vmvu5uar6HzfK9eC6J3JndH1lGp1 iDXJ9TS1AX/jFdZWAdyJic29TQi1hPhZITdhLfT11MZtYLWT1CpvNgBa4MefpD0X YBsgVjQXA96F4Aix3ILWGEyaKbHUOmqIBpKy95tRpGgMxwlpagsqm2jn1e8Sb4Kd H9YCeVsracNeK1E2ua13 =sZ7o -----END PGP SIGNATURE----- Merge tag 'upstream-3.19-rc1' of git://git.infradead.org/linux-ubifs Pull UBI/UBIFS updates from Artem Bityutskiy: "This includes the following UBI/UBIFS changes: - UBI debug messages now include the UBI device number. This change is responsible for the big diffstat since it touched every debugging print statement. - An Xattr bug-fix which fixes SELinux support - Several error path fixes in UBI/UBIFS" * tag 'upstream-3.19-rc1' of git://git.infradead.org/linux-ubifs: UBI: Fix invalid vfree() UBI: Fix double free after do_sync_erase() UBIFS: fix a couple bugs in UBIFS xattr length calculation UBI: vtbl: Use ubi_eba_atomic_leb_change() UBI: Extend UBI layer debug/messaging capabilities UBIFS: fix budget leak in error path	2014-12-12 09:57:22 -08:00
Linus Torvalds	c05e14f7b3	xfs: update for 3.19-rc1 This update contains: o more on-disk format header consolidation o move some structures shared with userspace to libxfs o new per-mount workqueue to fix for deadlocks between nested loop mounted filesystems o various bug fixes for ENOSPC, stats, quota off and preallocation o a bunch of compiler warning fixes for set-but-unused variables o various code cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJUihOWAAoJEK3oKUf0dfodYbkP/iXuIYOhpmc1rUORMDl2JDBc iTjXqz1Ydp6vJrq2+3qeAsCbJciNdZ72eNKdvgRbFAN4BW8tv1Wc9QR5m2ZIpCkf 7buCzbkI64j9HoNAiZJhrMp/eyJ0X1hRGk1ANUaBT9ouXWOBDaOD/sNj9cMptWOA 72BpTMN0FszAJxW6rNEk1M/i+W2ly0qmD0QJPQU18Z62NU5E+D/uMkg2xif4dhwK CSNMgCIv0X1qmve2lMOgwHbgkmHRwbXKSb4Z5vV8pDUh49tkRtxJ2ky7mE7aglrq pjChpEqDktkCL/RHAT3XJ77tRIyBXwvpC7ewHXiYBY83OcGfRFv0jMCJ+R+1b3KD p1faOVwd/H0tStd+0rF+tMMn8TuujQ597upLGhWdy1BpY3nnkJ7iJ8lyJv+aiCzr Oh3DvyX1XgxnEo7yVr+x64TFz/GPkyuvVPSfL3gspqEZErC4BN+AEP/3fF+5SGed x9QplB+lcy7IpzB+HURPZL4TqWl4Ib29pArZY1mQ1rJz6IFFbDSzj6lo36YDBrP8 HRG2LDxgc1udPPMxdZ3PAV3nt4/ufaxSTmT5HGV0Aj+hjkSfLvBDFMuVz9t6vfn9 YN3ocKWxJr2QISc0fcQ/hsBDiHVyoFgDOikBAetaqpdoM7OM7FHtLXtwLDILldx9 DZAIS0msNrjc7gGCrbxj =2SJP -----END PGP SIGNATURE----- Merge tag 'xfs-for-linus-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs update from Dave Chinner: "There's relatively little change in this update; it is mainly bug fixes, cleanups and more of the on-going libxfs restructuring and on-disk format header consolidation work. Details: - more on-disk format header consolidation - move some structures shared with userspace to libxfs - new per-mount workqueue to fix for deadlocks between nested loop mounted filesystems - various bug fixes for ENOSPC, stats, quota off and preallocation - a bunch of compiler warning fixes for set-but-unused variables - various code cleanups" * tag 'xfs-for-linus-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (24 commits) xfs: split metadata and log buffer completion to separate workqueues xfs: fix set-but-unused warnings xfs: move type conversion functions to xfs_dir.h xfs: move ftype conversion functions to libxfs xfs: lobotomise xfs_trans_read_buf_map() xfs: active inodes stat is broken xfs: cleanup xfs_bmse_merge returns xfs: cleanup xfs_bmse_shift_one goto mess xfs: fix premature enospc on inode allocation xfs: overflow in xfs_iomap_eof_align_last_fsb xfs: fix simple_return.cocci warning in xfs_bmse_shift_one xfs: fix simple_return.cocci warning in xfs_file_readdir libxfs: fix simple_return.cocci warnings xfs: remove unnecessary null checks xfs: merge xfs_inum.h into xfs_format.h xfs: move most of xfs_sb.h to xfs_format.h xfs: merge xfs_ag.h into xfs_format.h xfs: move acl structures to xfs_format.h xfs: merge xfs_dinode.h into xfs_format.h xfs: catch invalid negative blknos in _xfs_buf_find() ...	2014-12-12 09:48:17 -08:00
Linus Torvalds	9bfccec24e	Lots of bugs fixes, including Zheng and Jan's extent status shrinker fixes, which should improve CPU utilization and potential soft lockups under heavy memory pressure, and Eric Whitney's bigalloc fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJUiRUwAAoJENNvdpvBGATwltQP/3sjHtFw+RUvKgQ8vX9M2THk 4b9j0ja0mrD3ObTXUxdDuOh1q09MsfSUiOYK6KZOav3nO/dRODqZnWgXz/zJt3LC R97s4velgzZi3F2ijnLiCo5RVZahN9xs8bUHZ85orMIr5wogwGdaUpnoqZSg0Ehr PIFnTNORyNXBwEm3XPjUmENTdyq9FZ8DsS6ACFzgFi79QTSyJFEM4LAl2XaqwMGV fVhNwnOGIyT8lHZAtDcobkaC86NjakmpW2Ip3p9/UEQtynh16UeVXKEO3K7CcQ+L YJRDNnSIlGpR1OJp+v6QJPUd8q4fc/8JW9AxxsLak0eqkszuB+MxoQXOCFV5AWaf jrs4TV3y0hCuB4OwuYUpnfcU1o+O7p39MqXMv8SA1ZBPbijN/LQSMErFtXj2oih6 3gJHUWLwELGeR+d9JlI29zxhOeOIotX255UBgj2oasQ0X3BW3qAgQ4LmP3QY90Pm BUmxiMoIWB9N3kU4XQGf+Kyy8JeMLJj0frHDxI3XLz+B+IlWCCkBH6y3AD/a13kS HHMMLOwHGEs0lYEKsm89dkcij5GuKd8eKT8Q0+CvKD9Z6HPdYvQxoazmF87Q6j/7 ZmshaVxtWaLpNbDaXVg+IgZifJAN0+mVzVHRhY9TSjx8k9qLdSgSEqYWjkSjx9Ij nNB2zVrHZDMvZ7MCZy85 =ZrTc -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Lots of bugs fixes, including Zheng and Jan's extent status shrinker fixes, which should improve CPU utilization and potential soft lockups under heavy memory pressure, and Eric Whitney's bigalloc fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits) ext4: ext4_da_convert_inline_data_to_extent drop locked page after error ext4: fix suboptimal seek_{data,hole} extents traversial ext4: ext4_inline_data_fiemap should respect callers argument ext4: prevent fsreentrance deadlock for inline_data ext4: forbid journal_async_commit in data=ordered mode jbd2: remove unnecessary NULL check before iput() ext4: Remove an unnecessary check for NULL before iput() ext4: remove unneeded code in ext4_unlink ext4: don't count external journal blocks as overhead ext4: remove never taken branch from ext4_ext_shift_path_extents() ext4: create nojournal_checksum mount option ext4: update comments regarding ext4_delete_inode() ext4: cleanup GFP flags inside resize path ext4: introduce aging to extent status tree ext4: cleanup flag definitions for extent status tree ext4: limit number of scanned extents in status tree shrinker ext4: move handling of list of shrinkable inodes into extent status code ext4: change LRU to round-robin in extent status tree shrinker ext4: cache extent hole in extent status tree for ext4_da_map_blocks() ext4: fix block reservation for bigalloc filesystems ...	2014-12-12 09:28:03 -08:00
Miklos Szeredi	1c68271cf1	fuse: use file_inode() in fuse_file_fallocate() Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-12 10:04:51 +01:00
Miklos Szeredi	7078187a79	fuse: introduce fuse_simple_request() helper The following pattern is repeated many times: req = fuse_get_req_nopages(fc); /* Initialize req->(in\|out).args / fuse_request_send(fc, req); err = req->out.h.error; fuse_put_request(req); Create a new replacement helper: / Initialize args */ err = fuse_simple_request(fc, &args); In addition to reducing the code size, this will ease moving from the complex arg-based to a simpler page-based I/O on the fuse device. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-12 09:49:05 +01:00
Miklos Szeredi	f704dcb538	fuse: reduce max out args The third out-arg is never actually used. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-12 09:49:05 +01:00
Miklos Szeredi	baebccbe99	fuse: hold inode instead of path after release path_put() in release could trigger a DESTROY request in fuseblk. The possible deadlock was worked around by doing the path_put() with schedule_work(). This complexity isn't needed if we just hold the inode instead of the path. Since we now flush all requests before destroying the super block we can be sure that all held inodes will be dropped. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-12 09:49:04 +01:00
Miklos Szeredi	580640ba5d	fuse: flush requests on umount Use fuse_abort_conn() instead of fuse_conn_kill() in fuse_put_super(). This flushes and aborts requests still on any queues. But since we've already reset fc->connected, those requests would not be useful anyway and would be flushed when the fuse device is closed. Next patches will rely on requests being flushed before the superblock is destroyed. Use fuse_abort_conn() in cuse_process_init_reply() too, since it makes no difference there, and we can get rid of fuse_conn_kill(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-12 09:49:04 +01:00
Miklos Szeredi	0c4dd4ba14	fuse: don't wake up reserved req in fuse_conn_kill() Waking up reserved_req_waitq from fuse_conn_kill() doesn't make sense since we aren't chaging ff->reserved_req here, which is what this waitqueue signals. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-12 09:49:04 +01:00
Linus Torvalds	c0222ac086	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS updates from Ralf Baechle: "This is an unusually large pull request for MIPS - in parts because lots of patches missed the 3.18 deadline but primarily because some folks opened the flood gates. - Retire the MIPS-specific phys_t with the generic phys_addr_t. - Improvments for the backtrace code used by oprofile. - Better backtraces on SMP systems. - Cleanups for the Octeon platform code. - Cleanups and fixes for the Loongson platform code. - Cleanups and fixes to the firmware library. - Switch ATH79 platform to use the firmware library. - Grand overhault to the SEAD3 and Malta interrupt code. - Move the GIC interrupt code to drivers/irqchip - Lots of GIC cleanups and updates to the GIC code to use modern IRQ infrastructures and features of the kernel. - OF documentation updates for the GIC bindings - Move GIC clocksource driver to drivers/clocksource - Merge GIC clocksource driver with clockevent driver. - Further updates to bring the GIC clocksource driver up to date. - R3000 TLB code cleanups - Improvments to the Loongson 3 platform code. - Convert pr_warning to pr_warn. - Merge a bunch of small lantiq and ralink fixes that have been staged/lingering inside the openwrt tree for a while. - Update archhelp for IP22/IP32 - Fix a number of issues for Loongson 1B. - New clocksource and clockevent driver for Loongson 1B. - Further work on clk handling for Loongson 1B. - Platform work for Broadcom BMIPS. - Error handling cleanups for TurboChannel. - Fixes and optimization to the microMIPS support. - Option to disable the FTLB. - Dump more relevant information on machine check exception - Change binfmt to allow arch to examine PT_PROC headers - Support for new style FPU register model in O32 - VDSO randomization. - BCM47xx cleanups - BCM47xx reimplement the way the kernel accesses NVRAM information. - Random cleanups - Add support for ATH25 platforms - Remove pointless locking code in some PCI platforms. - Some improvments to EVA support - Minor Alchemy cleanup" 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (185 commits) MIPS: Add MFHC0 and MTHC0 instructions to uasm. MIPS: Cosmetic cleanups of page table headers. MIPS: Add CP0 macros for extended EntryLo registers MIPS: Remove now unused definition of phys_t. MIPS: Replace use of phys_t with phys_addr_t. MIPS: Replace MIPS-specific 64BIT_PHYS_ADDR with generic PHYS_ADDR_T_64BIT PCMCIA: Alchemy Don't select 64BIT_PHYS_ADDR in Kconfig. MIPS: lib: memset: Clean up some MIPS{EL,EB} ifdefery MIPS: iomap: Use __mem_{read,write}{b,w,l} for MMIO MIPS: <asm/types.h> fix indentation. MAINTAINERS: Add entry for BMIPS multiplatform kernel MIPS: Enable VDSO randomization MIPS: Remove a temporary hack for debugging cache flushes in SMTC configuration MIPS: Remove declaration of obsolete arch_init_clk_ops() MIPS: atomic.h: Reformat to fit in 79 columns MIPS: Apply `.insn' to fixup labels throughout MIPS: Fix microMIPS LL/SC immediate offsets MIPS: Kconfig: Only allow 32-bit microMIPS builds MIPS: signal.c: Fix an invalid cast in ISA mode bit handling MIPS: mm: Only build one microassembler that is suitable ...	2014-12-11 17:56:37 -08:00
Eric W. Biederman	9cc46516dd	userns: Add a knob to disable setgroups on a per user namespace basis - Expose the knob to user space through a proc file /proc/<pid>/setgroups A value of "deny" means the setgroups system call is disabled in the current processes user namespace and can not be enabled in the future in this user namespace. A value of "allow" means the segtoups system call is enabled. - Descendant user namespaces inherit the value of setgroups from their parents. - A proc file is used (instead of a sysctl) as sysctls currently do not allow checking the permissions at open time. - Writing to the proc file is restricted to before the gid_map for the user namespace is set. This ensures that disabling setgroups at a user namespace level will never remove the ability to call setgroups from a process that already has that ability. A process may opt in to the setgroups disable for itself by creating, entering and configuring a user namespace or by calling setns on an existing user namespace with setgroups disabled. Processes without privileges already can not call setgroups so this is a noop. Prodcess with privilege become processes without privilege when entering a user namespace and as with any other path to dropping privilege they would not have the ability to call setgroups. So this remains within the bounds of what is possible without a knob to disable setgroups permanently in a user namespace. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2014-12-11 18:06:36 -06:00
Linus Torvalds	70e71ca0af	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) New offloading infrastructure and example 'rocker' driver for offloading of switching and routing to hardware. This work was done by a large group of dedicated individuals, not limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend, Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu 2) Start making the networking operate on IOV iterators instead of modifying iov objects in-situ during transfers. Thanks to Al Viro and Herbert Xu. 3) A set of new netlink interfaces for the TIPC stack, from Richard Alpe. 4) Remove unnecessary looping during ipv6 routing lookups, from Martin KaFai Lau. 5) Add PAUSE frame generation support to gianfar driver, from Matei Pavaluca. 6) Allow for larger reordering levels in TCP, which are easily achievable in the real world right now, from Eric Dumazet. 7) Add a variable of napi_schedule that doesn't need to disable cpu interrupts, from Eric Dumazet. 8) Use a doubly linked list to optimize neigh_parms_release(), from Nicolas Dichtel. 9) Various enhancements to the kernel BPF verifier, and allow eBPF programs to actually be attached to sockets. From Alexei Starovoitov. 10) Support TSO/LSO in sunvnet driver, from David L Stevens. 11) Allow controlling ECN usage via routing metrics, from Florian Westphal. 12) Remote checksum offload, from Tom Herbert. 13) Add split-header receive, BQL, and xmit_more support to amd-xgbe driver, from Thomas Lendacky. 14) Add MPLS support to openvswitch, from Simon Horman. 15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen Klassert. 16) Do gro flushes on a per-device basis using a timer, from Eric Dumazet. This tries to resolve the conflicting goals between the desired handling of bulk vs. RPC-like traffic. 17) Allow userspace to ask for the CPU upon what a packet was received/steered, via SO_INCOMING_CPU. From Eric Dumazet. 18) Limit GSO packets to half the current congestion window, from Eric Dumazet. 19) Add a generic helper so that all drivers set their RSS keys in a consistent way, from Eric Dumazet. 20) Add xmit_more support to enic driver, from Govindarajulu Varadarajan. 21) Add VLAN packet scheduler action, from Jiri Pirko. 22) Support configurable RSS hash functions via ethtool, from Eyal Perry. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits) Fix race condition between vxlan_sock_add and vxlan_sock_release net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header net/mlx4: Add support for A0 steering net/mlx4: Refactor QUERY_PORT net/mlx4_core: Add explicit error message when rule doesn't meet configuration net/mlx4: Add A0 hybrid steering net/mlx4: Add mlx4_bitmap zone allocator net/mlx4: Add a check if there are too many reserved QPs net/mlx4: Change QP allocation scheme net/mlx4_core: Use tasklet for user-space CQ completion events net/mlx4_core: Mask out host side virtualization features for guests net/mlx4_en: Set csum level for encapsulated packets be2net: Export tunnel offloads only when a VxLAN tunnel is created gianfar: Fix dma check map error when DMA_API_DEBUG is enabled cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call net: fec: only enable mdio interrupt before phy device link up net: fec: clear all interrupt events to support i.MX6SX net: fec: reset fep link status in suspend function net: sock: fix access via invalid file descriptor net: introduce helper macro for_each_cmsghdr ...	2014-12-11 14:27:06 -08:00
Tony Lindgren	027bc8b082	pstore-ram: Allow optional mapping with pgprot_noncached On some ARMs the memory can be mapped pgprot_noncached() and still be working for atomic operations. As pointed out by Colin Cross <ccross@android.com>, in some cases you do want to use pgprot_noncached() if the SoC supports it to see a debug printk just before a write hanging the system. On ARMs, the atomic operations on strongly ordered memory are implementation defined. So let's provide an optional kernel parameter for configuring pgprot_noncached(), and use pgprot_writecombine() by default. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rob Herring <robherring2@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Olof Johansson <olof@lixom.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: stable@vger.kernel.org Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-12-11 13:38:31 -08:00
Rob Herring	7ae9cb8193	pstore-ram: Fix hangs by using write-combine mappings Currently trying to use pstore on at least ARMs can hang as we're mapping the peristent RAM with pgprot_noncached(). On ARMs, pgprot_noncached() will actually make the memory strongly ordered, and as the atomic operations pstore uses are implementation defined for strongly ordered memory, they may not work. So basically atomic operations have undefined behavior on ARM for device or strongly ordered memory types. Let's fix the issue by using write-combine variants for mappings. This corresponds to normal, non-cacheable memory on ARM. For many other architectures, this change does not change the mapping type as by default we have: #define pgprot_writecombine pgprot_noncached The reason why pgprot_noncached() was originaly used for pstore is because Colin Cross <ccross@android.com> had observed lost debug prints right before a device hanging write operation on some systems. For the platforms supporting pgprot_noncached(), we can add a an optional configuration option to support that. But let's get pstore working first before adding new features. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Anton Vorontsov <cbouatmailru@gmail.com> Cc: Colin Cross <ccross@android.com> Cc: Olof Johansson <olof@lixom.net> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rob Herring <rob.herring@calxeda.com> [tony@atomide.com: updated description] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2014-12-11 13:35:49 -08:00
Al Viro	93fe74b2e2	coda_venus_readdir(): use file_inode() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-11 16:28:12 -05:00
Al Viro	d465887f9d	fs/namei.c: fold link_path_walk() call into path_init() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-11 16:27:57 -05:00
Al Viro	980f3ea2f6	path_init(): don't bother with LOOKUP_PARENT in argument Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-11 16:27:57 -05:00
Al Viro	893b7775a7	fs/namei.c: new helper (path_cleanup()) All callers of path_init() proceed to do the identical cleanup when they are done with nameidata. Don't open-code it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-11 16:27:57 -05:00
Al Viro	5e53084d77	path_init(): store the "base" pointer to file in nameidata itself Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-11 16:27:57 -05:00
Linus Torvalds	b6da0076ba	Merge branch 'akpm' (patchbomb from Andrew) Merge first patchbomb from Andrew Morton: - a few minor cifs fixes - dma-debug upadtes - ocfs2 - slab - about half of MM - procfs - kernel/exit.c - panic.c tweaks - printk upates - lib/ updates - checkpatch updates - fs/binfmt updates - the drivers/rtc tree - nilfs - kmod fixes - more kernel/exit.c - various other misc tweaks and fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits) exit: pidns: fix/update the comments in zap_pid_ns_processes() exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting exit: exit_notify: re-use "dead" list to autoreap current exit: reparent: call forget_original_parent() under tasklist_lock exit: reparent: avoid find_new_reaper() if no children exit: reparent: introduce find_alive_thread() exit: reparent: introduce find_child_reaper() exit: reparent: document the ->has_child_subreaper checks exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper() exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting exit: proc: don't try to flush /proc/tgid/task/tgid exit: release_task: fix the comment about group leader accounting exit: wait: drop tasklist_lock before psig->c* accounting exit: wait: don't use zombie->real_parent exit: wait: cleanup the ptrace_reparented() checks usermodehelper: kill the kmod_thread_locker logic usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper() fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races ...	2014-12-10 18:34:42 -08:00
Al Viro	bd9b51e79c	make default ->i_fop have ->open() fail with ENXIO As it is, default ->i_fop has NULL ->open() (along with all other methods). The only case where it matters is reopening (via procfs symlink) a file that didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned to something sane (default would fail on read/write/ioctl/etc.). Unfortunately, such case exists - alloc_file() users, especially anon_get_file() ones. There we have tons of opened files of very different kinds sharing the same inode. As the result, attempt to reopen those via procfs succeeds and you get a descriptor you can't do anything with. Moreover, in case of sockets we set ->i_fop that will only be used on such reopen attempts - and put a failing ->open() into it to make sure those do not succeed. It would be simpler to put such ->open() into default ->i_fop and leave it unchanged both for anon inode (as we do anyway) and for socket ones. Result: * everything going through do_dentry_open() works as it used to * sock_no_open() kludge is gone * attempts to reopen anon-inode files fail as they really ought to * ditto for aio_private_file() * ditto for perfmon - this one actually tried to imitate sock_no_open() trick, but failed to set ->i_fop, so in the current tree reopens succeed and yield completely useless descriptor. Intent clearly had been to fail with -ENXIO on such reopens; now it actually does. * everything else that used alloc_file() keeps working - it has ->i_fop set for its inodes anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-10 21:32:15 -05:00
Al Viro	1f55a6ec94	make nameidata completely opaque outside of fs/namei.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-10 21:32:13 -05:00
Al Viro	707c5960f1	Merge branch 'nsfs' into for-next	2014-12-10 21:31:59 -05:00
Al Viro	3d3d35b1e9	kill proc_ns completely procfs inodes need only the ns_ops part; nsfs inodes don't need it at all Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-10 21:30:57 -05:00
Al Viro	e149ed2b80	take the targets of /proc//ns/ symlinks to separate fs New pseudo-filesystem: nsfs. Targets of /proc//ns/ live there now. It's not mountable (not even registered, so it's not in /proc/filesystems, etc.). Files on it are bindable - we explicitly permit that in do_loopback(). This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well. get_proc_ns() is a macro now (it's simply returning ->i_private; would have been an inline, if not for header ordering headache). proc_ns_inode() is an ex-parrot. The interface used in procfs is ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops). Dentries and inodes are never hashed; a non-counting reference to dentry is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path() if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details of that mechanism. As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt; it does nd_jump_link() on a consistent <vfsmount,dentry> pair it gets from ns_get_path(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-12-10 21:30:20 -05:00
Oleg Nesterov	c35a7f18a0	exit: proc: don't try to flush /proc/tgid/task/tgid proc_flush_task_mnt() always tries to flush task/pid, but this is pointless if we reap the leader. d_invalidate() is recursive, and if nothing else the next d_hash_and_lookup(tgid) should fail anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:17 -08:00
Rasmus Villemoes	ddbc22e27e	fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp Relying on the sign (after casting to int) of the difference of two quantities for comparison is usually wrong. For example, should a-b turn out to be 2^31, the return value of cmp(a,b) is -2^31; but that would also be the return value from cmp(b, a). So a compares less than b and b compares less than a. One can also easily find three values a,b,c such that a compares less than b, b compares less than c, but a does not compare less than c. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:16 -08:00
Ryusuke Konishi	705304a863	nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races Same story as in commit `41080b5a24` ("nfsd race fixes: ext2") (similar ext2 fix) except that nilfs2 needs to use insert_inode_locked4() instead of insert_inode_locked() and a bug of a check for dead inodes needs to be fixed. If nilfs_iget() is called from nfsd after nilfs_new_inode() calls insert_inode_locked4(), nilfs_iget() will wait for unlock_new_inode() at the end of nilfs_mkdir()/nilfs_create()/etc to unlock the inode. If nilfs_iget() is called before nilfs_new_inode() calls insert_inode_locked4(), it will create an in-core inode and read its data from the on-disk inode. But, nilfs_iget() will find i_nlink equals zero and fail at nilfs_read_inode_common(), which will lead it to call iget_failed() and cleanly fail. However, this sanity check doesn't work as expected for reused on-disk inodes because they leave a non-zero value in i_mode field and it hinders the test of i_nlink. This patch also fixes the issue by removing the test on i_mode that nilfs2 doesn't need. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:16 -08:00
Markus Elfring	72b9918ea4	nilfs2: deletion of an unnecessary check before the function call "iput" The iput() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:16 -08:00
Andreas Rohner	75dc857c46	nilfs2: avoid duplicate segment construction for fsync() This patch removes filemap_write_and_wait_range() from nilfs_sync_file(), because it triggers a data segment construction by calling nilfs_writepages() with WB_SYNC_ALL. A data segment construction does not remove the inode from the i_dirty list and it does not clear the NILFS_I_DIRTY flag. Therefore nilfs_inode_dirty() still returns true, which leads to an unnecessary duplicate segment construction in nilfs_sync_file(). A call to filemap_write_and_wait_range() is not needed, because NILFS2 does not rely on the generic writeback mechanisms. Instead it implements its own mechanism to collect all dirty pages and write them into segments. It is more efficient to initiate the segment construction directly in nilfs_sync_file() without the detour over filemap_write_and_wait_range(). Additionally the lock of i_mutex is not needed, because all code blocks that are protected by i_mutex are also protected by a NILFS transaction: Function i_mutex nilfs_transaction ------------------------------------------------------ nilfs_ioctl_setflags: yes yes nilfs_fiemap: yes no nilfs_write_begin: yes yes nilfs_write_end: yes yes nilfs_lookup: yes no nilfs_create: yes yes nilfs_link: yes yes nilfs_mknod: yes yes nilfs_symlink: yes yes nilfs_mkdir: yes yes nilfs_unlink: yes yes nilfs_rmdir: yes yes nilfs_rename: yes yes nilfs_setattr: yes yes For nilfs_lookup() i_mutex is held for the parent directory, to protect it from modification. The segment construction does not modify directory inodes, so no lock is needed. nilfs_fiemap() reads the block layout on the disk, by using nilfs_bmap_lookup_contig(). This is already protected by bmap->b_sem. Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:16 -08:00
Jan Kara	a682e9c28c	ncpfs: return proper error from NCP_IOC_SETROOT ioctl If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error return value is then (in most cases) just overwritten before we return. This can result in reporting success to userspace although error happened. This bug was introduced by commit `2e54eb96e2` ("BKL: Remove BKL from ncpfs"). Propagate the errors correctly. Coverity id: 1226925. Fixes: `2e54eb96e2` ("BKL: Remove BKL from ncpfs") Signed-off-by: Jan Kara <jack@suse.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:13 -08:00
Jungseung Lee	52f5592e54	fs/binfmt_elf.c: fix internal inconsistency relating to vma dump size vma_dump_size() has been used several times on actual dumper and it is supposed to return the same value for the same vma. But vma_dump_size() could return different values for same vma. The known problem case is concurrent shared memory removal. If a vma is used for a shared memory and that shared memory is removed between writing program header and dumping vma memory, this will result in a dump file which is internally consistent. To fix the problem, we set baseline to get dump size and store the size into vma_filesz and always use the same vma dump size which is stored in vma_filsz. The consistnecy with reality is not actually guranteed, but it's tolerable since that is fully consistent with base line. Signed-off-by: Jungseung Lee <js07.lee@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:12 -08:00
Andrew Morton	f7e1ad1a1e	fs/binfmt_misc.c: use GFP_KERNEL instead of GFP_USER GFP_USER means "honour cpuset nodes-allowed beancounting". These are regular old kernel objects and there seems no reason to give them this treatment. Acked-by: Mike Frysinger <vapier@gentoo.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:12 -08:00
Mike Frysinger	e6084d4a08	binfmt_misc: clean up code style a bit Clean up various coding style issues that checkpatch complains about. No functional changes here. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:12 -08:00
Mike Frysinger	6b899c4e9a	binfmt_misc: add comments & debug logs When trying to develop a custom format handler, the errors returned all effectively get bucketed as EINVAL with no kernel messages. The other errors (ENOMEM/EFAULT) are internal/obvious and basic. Thus any time a bad handler is rejected, the developer has to walk the dense code and try to guess where it went wrong. Needing to dive into kernel code is itself a fairly high barrier for a lot of people. To improve this situation, let's deploy extensive pr_debug markers at logical parse points, and add comments to the dense parsing logic. It let's you see exactly where the parsing aborts, the string the kernel received (useful when dealing with shell code), how it translated the buffers to binary data, and how it will apply the mask at runtime. Some example output: $ echo ':qemu-foo:M::\x7fELF\xAD\xAD\x01\x00:\xff\xff\xff\xff\xff\x00\xff\x00:/usr/bin/qemu-foo:POC' > register $ dmesg binfmt_misc: register: received 92 bytes binfmt_misc: register: delim: 0x3a {:} binfmt_misc: register: name: {qemu-foo} binfmt_misc: register: type: M (magic) binfmt_misc: register: offset: 0x0 binfmt_misc: register: magic[raw]: 5c 78 37 66 45 4c 46 5c 78 41 44 5c 78 41 44 5c \x7fELF\xAD\xAD\ binfmt_misc: register: magic[raw]: 78 30 31 5c 78 30 30 00 x01\x00. binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 66 66 5c 78 66 66 5c 78 66 66 \xff\xff\xff\xff binfmt_misc: register: mask[raw]: 5c 78 66 66 5c 78 30 30 5c 78 66 66 5c 78 30 30 \xff\x00\xff\x00 binfmt_misc: register: mask[raw]: 00 . binfmt_misc: register: magic/mask length: 8 binfmt_misc: register: magic[decoded]: 7f 45 4c 46 ad ad 01 00 .ELF.... binfmt_misc: register: mask[decoded]: ff ff ff ff ff 00 ff 00 ........ binfmt_misc: register: magic[masked]: 7f 45 4c 46 ad 00 01 00 .ELF.... binfmt_misc: register: interpreter: {/usr/bin/qemu-foo} binfmt_misc: register: flag: P (preserve argv0) binfmt_misc: register: flag: O (open binary) binfmt_misc: register: flag: C (preserve creds) The [raw] lines show us exactly what was received from userspace. The lines after that show us how the kernel has decoded things. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:12 -08:00
Yann Droneaud	8d10a03582	fs/file.c: replace get_unused_fd() with get_unused_fd_flags(0) This patch replaces calls to get_unused_fd() with equivalent call to get_unused_fd_flags(0) to preserve current behavor for existing code. In a further patch, get_unused_fd() will be removed so that new code start using get_unused_fd_flags(), with the hope O_CLOEXEC could be used, either by default or choosen by userspace. Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-10 17:41:10 -08:00

1 2 3 4 5 ...

38993 Commits