linux

Author	SHA1	Message	Date
Linus Torvalds	82cb6acea4	MTD merge for 3.13 * Unify some compile-time differences so that we have fewer uses of #ifdef CONFIG_OF in atmel_nand * Other general cleanups (removing unused functions, options, variables, fields; use correct interfaces) * Fix BUG() for new odd-sized NAND, which report non-power-of-2 dimensions via ONFI * Miscellaneous driver fixes (SPI NOR flash; BCM47xx NAND flash; etc.) * Improve differentiation between SLC and MLC NAND -- this clarifies an ABI issue regarding the MTD "type" (in sysfs and in ioctl(MEMGETINFO)), where the MTD_MLCNANDFLASH type was present but inconsistently used * Extend GPMI NAND to support multi-chip-select NAND for some platforms * Many improvements to the OMAP2/3 NAND driver, including an expanded DT binding to bring us closer to mainline support for some OMAP systems * Fix a deadlock in the error path of the Atmel NAND driver probe * Correct the error codes from MTD mmap() to conform to POSIX and the Linux Programmer's Manual. This is an acknowledged change in the MTD ABI, but I can't imagine somebody relying on the non-standard -ENOSYS error code specifically. Am I just being unimaginative? :) * Fix a few important GPMI NAND bugs (one regression from 3.12 and one long-standing race condition) * More? Read the log! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSgzYRAAoJEFySrpd9RFgtv8EP/3ZIS1w4fHyWafVSdgVFGR0Y urlVDhg7iBauh9admN9xxBz6CYRwhjby8GnN87Q1qzu95Xp63RVx31nNfdBW3DGd 92vSyskijYJcUtanBxYqGp1i3EbQcpF4mumqxnre3C4KTLNije41t/wNVqnXAstU DWho2iymZdkweKJ0DqBA7WF4l/YscdFyNDanO9JWiwII05Rh3Acv7FPMFm3Clblw Nvfwzgp4XycYMeIQtkmQgQ3GgeWtxPgQwqMofn97MVH4zeTsmUP317ohIMukLGJD db33J2xBdrIbk9P4D3RvjOCYyAyonu9y6/p+B1Vmj+R4CAUvQOIljhklHFoT3UZW OzUHPxB6T0+NZyQ/5IRQIYH9As++vdb/bzsUXm/cXceI4o4I0QCPy/8adifakBOF IUX9/BCdUOfKXvdOXY5dXMR2sY1IBg/1WfI+qcAoITsS/EVrUTrOcfSLyGqF0ERU c7mAzXiyp4D51x66/QnfJ4aJjlioQSoa3mK1j4fXqH08YB5Zclpz938Bo1AO3lWy /n+NYSbeXJoi4rVkNawjrRVs+0OTby2XQ5OqBlUMH6f30fqjUefPm66ZBMhbxzYu 5QFDctUbnHCyAPpOtM/WR3/NOkIqVhQl1331A+dG2TzLK0vTHs+kbt/YmIITpjI+ yn70XJGhk1F4gy8zhD+V =z5qO -----END PGP SIGNATURE----- Merge tag 'for-linus-20131112' of git://git.infradead.org/linux-mtd Pull MTD changes from Brian Norris: - Unify some compile-time differences so that we have fewer uses of #ifdef CONFIG_OF in atmel_nand - Other general cleanups (removing unused functions, options, variables, fields; use correct interfaces) - Fix BUG() for new odd-sized NAND, which report non-power-of-2 dimensions via ONFI - Miscellaneous driver fixes (SPI NOR flash; BCM47xx NAND flash; etc.) - Improve differentiation between SLC and MLC NAND -- this clarifies an ABI issue regarding the MTD "type" (in sysfs and in the MEMGETINFO ioctl), where the MTD_MLCNANDFLASH type was present but inconsistently used - Extend GPMI NAND to support multi-chip-select NAND for some platforms - Many improvements to the OMAP2/3 NAND driver, including an expanded DT binding to bring us closer to mainline support for some OMAP systems - Fix a deadlock in the error path of the Atmel NAND driver probe - Correct the error codes from MTD mmap() to conform to POSIX and the Linux Programmer's Manual. This is an acknowledged change in the MTD ABI, but I can't imagine somebody relying on the non-standard -ENOSYS error code specifically. Am I just being unimaginative? :) - Fix a few important GPMI NAND bugs (one regression from 3.12 and one long-standing race condition) - More? Read the log! * tag 'for-linus-20131112' of git://git.infradead.org/linux-mtd: (98 commits) mtd: gpmi: fix the NULL pointer mtd: gpmi: fix kernel BUG due to racing DMA operations mtd: mtdchar: return expected errors on mmap() call mtd: gpmi: only scan two chips for imx6 mtd: gpmi: Use devm_kzalloc() mtd: atmel_nand: fix bug driver will in a dead lock if no nand detected mtd: nand: use a local variable to simplify the nand_scan_tail mtd: nand: remove deprecated IRQF_DISABLED mtd: dataflash: Say if we find a device we don't support mtd: nand: omap: fix error return code in omap_nand_probe() mtd: nand_bbt: kill NAND_BBT_SCANALLPAGES mtd: m25p80: fixup device removal failure path mtd: mxc_nand: Include linux/of.h header mtd: remove duplicated include from mtdcore.c mtd: m25p80: add support for Macronix mx25l3255e mtd: nand: omap: remove selection of BCH ecc-scheme via KConfig mtd: nand: omap: updated devm_xx for all resource allocation and free calls mtd: nand: omap: use drivers/mtd/nand/nand_bch.c wrapper for BCH ECC instead of lib/bch.c mtd: nand: omap: clean-up ecc layout for BCH ecc schemes mtd: nand: omap2: clean-up BCHx_HW and BCHx_SW ECC configurations in device_probe ...	2013-11-14 12:31:43 +09:00
Linus Torvalds	0910c0bdf7	Merge branch 'for-3.13/core' of git://git.kernel.dk/linux-block Pull block IO core updates from Jens Axboe: "This is the pull request for the core changes in the block layer for 3.13. It contains: - The new blk-mq request interface. This is a new and more scalable queueing model that marries the best part of the request based interface we currently have (which is fully featured, but scales poorly) and the bio based "interface" which the new drivers for high IOPS devices end up using because it's much faster than the request based one. The bio interface has no block layer support, since it taps into the stack much earlier. This means that drivers end up having to implement a lot of functionality on their own, like tagging, timeout handling, requeue, etc. The blk-mq interface provides all these. Some drivers even provide a switch to select bio or rq and has code to handle both, since things like merging only works in the rq model and hence is faster for some workloads. This is a huge mess. Conversion of these drivers nets us a substantial code reduction. Initial results on converting SCSI to this model even shows an 8x improvement on single queue devices. So while the model was intended to work on the newer multiqueue devices, it has substantial improvements for "classic" hardware as well. This code has gone through extensive testing and development, it's now ready to go. A pull request is coming to convert virtio-blk to this model will be will be coming as well, with more drivers scheduled for 3.14 conversion. - Two blktrace fixes from Jan and Chen Gang. - A plug merge fix from Alireza Haghdoost. - Conversion of __get_cpu_var() from Christoph Lameter. - Fix for sector_div() with 64-bit divider from Geert Uytterhoeven. - A fix for a race between request completion and the timeout handling from Jeff Moyer. This is what caused the merge conflict with blk-mq/core, in case you are looking at that. - A dm stacking fix from Mike Snitzer. - A code consolidation fix and duplicated code removal from Kent Overstreet. - A handful of block bug fixes from Mikulas Patocka, fixing a loop crash and memory corruption on blk cg. - Elevator switch bug fix from Tomoki Sekiyama. A heads-up that I had to rebase this branch. Initially the immutable bio_vecs had been queued up for inclusion, but a week later, it became clear that it wasn't fully cooked yet. So the decision was made to pull this out and postpone it until 3.14. It was a straight forward rebase, just pruning out the immutable series and the later fixes of problems with it. The rest of the patches applied directly and no further changes were made" * 'for-3.13/core' of git://git.kernel.dk/linux-block: (31 commits) block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO block: Do not call sector_div() with a 64-bit divisor kernel: trace: blktrace: remove redundent memcpy() in compat_blk_trace_setup() block: Consolidate duplicated bio_trim() implementations block: Use rw_copy_check_uvector() block: Enable sysfs nomerge control for I/O requests in the plug list block: properly stack underlying max_segment_size to DM device elevator: acquire q->sysfs_lock in elevator_change() elevator: Fix a race in elevator switching and md device initialization block: Replace __get_cpu_var uses bdi: test bdi_init failure block: fix a probe argument to blk_register_region loop: fix crash if blk_alloc_queue fails blk-core: Fix memory corruption if blkcg_init_queue fails block: fix race between request completion and timeout handling blktrace: Send BLK_TN_PROCESS events to all running traces blk-mq: don't disallow request merges for req->special being set blk-mq: mq plug list breakage blk-mq: fix for flush deadlock ...	2013-11-14 12:08:14 +09:00
Jeff Layton	6d769f1e14	nfs: don't retry detect_trunking with RPC_AUTH_UNIX more than once Currently, when we try to mount and get back NFS4ERR_CLID_IN_USE or NFS4ERR_WRONGSEC, we create a new rpc_clnt and then try the call again. There is no guarantee that doing so will work however, so we can end up retrying the call in an infinite loop. Worse yet, we create the new client using rpc_clone_client_set_auth, which creates the new client as a child of the old one. Thus, we can end up with a very long lineage of rpc_clnts. When we go to put all of the references to them, we can end up with a long call chain that can smash the stack as each rpc_free_client() call can recurse back into itself. This patch fixes this by simply ensuring that the SETCLIENTID call will only be retried in this situation if the last attempt did not use RPC_AUTH_UNIX. Note too that with this change, we don't need the (i > 2) check in the -EACCES case since we now have a more reliable test as to whether we should reattempt. Cc: stable@vger.kernel.org # v3.10+ Cc: Chuck Lever <chuck.lever@oracle.com> Tested-by/Acked-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2013-11-13 19:21:05 -05:00
J. Bruce Fields	6ff40decff	nfsd4: improve write performance with better sendspace reservations Currently the rpc code conservatively refuses to accept rpc's from a client if the sum of its worst-case estimates of the replies it owes that client exceed the send buffer space. Unfortunately our estimate of the worst-case reply for an NFSv4 compound is always the maximum read size. This can unnecessarily limit the number of operations we handle concurrently, for example in the case most operations are writes (which have small replies). We can do a little better if we check which ops the compound contains. This is still a rough estimate, we'll need to improve on it some day. Reported-by: Shyam Kaushik <shyamnfs1@gmail.com> Tested-by: Shyam Kaushik <shyamnfs1@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2013-11-13 16:12:54 -05:00
Al Viro	ede4cebce1	prepend_path() needs to reinitialize dentry/vfsmount/mnt on restarts ... and equivalent is needed in 3.12; it's broken there as well Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 07:45:40 -05:00
Li Zhong	4ec6c2aeab	fix unpaired rcu lock in prepend_path() Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 07:43:10 -05:00
Dan Carpenter	4fdb793ffe	locks: missing unlock on error in generic_add_lease() We should unlock here before returning. Fixes: `df4e8d2c1d` ('locks: implement delegations') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 07:30:53 -05:00
Dan Carpenter	7f62656be8	aio: checking for NULL instead of IS_ERR alloc_anon_inode() returns an ERR_PTR(), it doesn't return NULL. Fixes: `71ad7490c1` ('rework aio migrate pages to use aio fs') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-11-13 07:30:53 -05:00
Linus Torvalds	5cbb3d216e	Merge branch 'akpm' (patches from Andrew Morton) Merge first patch-bomb from Andrew Morton: "Quite a lot of other stuff is banked up awaiting further next->mainline merging, but this batch contains: - Lots of random misc patches - OCFS2 - Most of MM - backlight updates - lib/ updates - printk updates - checkpatch updates - epoll tweaking - rtc updates - hfs - hfsplus - documentation - procfs - update gcov to gcc-4.7 format - IPC" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (269 commits) ipc, msg: fix message length check for negative values ipc/util.c: remove unnecessary work pending test devpts: plug the memory leak in kill_sb ./Makefile: export initial ramdisk compression config option init/Kconfig: add option to disable kernel compression drivers: w1: make w1_slave::flags long to avoid memory corruption drivers/w1/masters/ds1wm.cuse dev_get_platdata() drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page() drivers/memstick/core/mspro_block.c: fix attributes array allocation drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr kernel/panic.c: reduce 1 byte usage for print tainted buffer gcov: reuse kbasename helper kernel/gcov/fs.c: use pr_warn() kernel/module.c: use pr_foo() gcov: compile specific gcov implementation based on gcc version gcov: add support for gcc 4.7 gcov format gcov: move gcov structs definitions to a gcc version specific file kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener() kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end() kernel/sysctl_binary.c: use scnprintf() instead of snprintf() ...	2013-11-13 15:45:43 +09:00
Linus Torvalds	9bc9ccd7db	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "All kinds of stuff this time around; some more notable parts: - RCU'd vfsmounts handling - new primitives for coredump handling - files_lock is gone - Bruce's delegations handling series - exportfs fixes plus misc stuff all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits) ecryptfs: ->f_op is never NULL locks: break delegations on any attribute modification locks: break delegations on link locks: break delegations on rename locks: helper functions for delegation breaking locks: break delegations on unlink namei: minor vfs_unlink cleanup locks: implement delegations locks: introduce new FL_DELEG lock flag vfs: take i_mutex on renamed file vfs: rename I_MUTEX_QUOTA now that it's not used for quotas vfs: don't use PARENT/CHILD lock classes for non-directories vfs: pull ext4's double-i_mutex-locking into common code exportfs: fix quadratic behavior in filehandle lookup exportfs: better variable name exportfs: move most of reconnect_path to helper function exportfs: eliminate unused "noprogress" counter exportfs: stop retrying once we race with rename/remove exportfs: clear DISCONNECTED on all parents sooner exportfs: more detailed comment for path_reconnect ...	2013-11-13 15:34:18 +09:00
Linus Torvalds	f023029427	dlm for 3.13 This set includes a single fix to resolve to a race that could cause lockspace shutdown to incorrectly return -EBUSY. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSgUirAAoJEDgbc8f8gGmqbyoQAK4mO5dAXYdZwqjxJ3qaE2IS UyBJDNeRMyxfKaCGl8B4vM8p35mR9JaYMGFmwgV5a4nkhUia+/2uNqSkd8zk35nq xglt94bQ8dpYvUXpQvhgORvG+j+AQvOY5YbYPCzTwUBSKLALdOS6cSKF8rC0EM2q JCfCCvXXEshfCSD0jvQKoXZrSkE+vOwJH9oqB8dzQYa6JT1fQidXGftey75AAtEc Gkx24glMhA+ovRP+TMjD4K/gwRlKXuUuCnPYnSWfuT3XHlGB+hhf81g1P6731DxB gVMoaSybaV8SS94scymQNVHkU6+7yRTMeTj3s7V/o4ZgAT7t1VvPEeXQC5AMdrnI BTO/S1SUJ+ySHsPy6Lr8DfRMtJww8ZLCptj30zGkzBIVkcSzr1y6/0kFWU3V7w7G mH0NyPrrH9HNwXaOELGKiQ37g+UTVnGQtdFXsZZzQ3iu0/2mKaA9x9bCahtt27YX Y7T88AaXQD0na3ubk/IwN3mEXzGRUUdUVyVYLcCBYNMCtncHZmtZAnAaoI6aaArc wleNIyMJmqeCufXMHf9TeT9RSIC49wMdU5A/2SB/MZ6H6xcZ0jr5yVKykLzJxjNo VwieufyvqFQ1sGr3nOUPYyRzBvtFvrjqT349TW8lo15z/zDb4xhA+SGVF1Mesuz7 Y2X1sJ6CmVd53QkmT+LA =ta9e -----END PGP SIGNATURE----- Merge tag 'dlm-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm fix from David Teigland: "This set includes a single fix to resolve to a race that could cause lockspace shutdown to incorrectly return -EBUSY" * tag 'dlm-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: Avoid that dlm_release_lockspace() incorrectly returns -EBUSY	2013-11-13 15:31:13 +09:00
Linus Torvalds	fbe43ff003	Mostly fixes for the power cut emulation UBIFS mode, and only one functional change which fixes a return error code. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSgh3wAAoJECmIfjd9wqK0dxwQALF6rMiFldVa9uZnNgOawzA9 yB8OR2EmQilZzCcSCLKaPIwzc7izJO7H6hbaLZrNY9UHzjoCwLhTGkbdkuGwkKuF Q+M99MKGnYghNWsOWu50J7wewR9zJPD1UGy3ZU/CJXfQzK4uwHIEOiXss+b0kZCo sW5ogFOiFzEkz8/CtGwffPXNWuqwPXbDBVHXBjnmv9MCCzsSRdGEpNClCOI/qUHQ pFZhKVHSiqYDPvvwQ/pTGv6kw2tu2pVh6+O232g35JB9G3NHq7MDxg0RLr61C7qK 3ZK6K34r6D09NKhjrOOAsf/EgzyNXtcIL9ySYamaZzEsMGBb2LNMneu3gkrk7O49 5CXi5Em4cg8eBzhEEmHDxCZanwPhggnuLh+YPlaBBNU/z3NiYRNYoHRjzQWCFwa/ yee6AzU+w1i0NRtjpPIAmpaqkryJnAZ+c2Hthlm+DPtN3DzWX13QxHdWTxC3Xycc 8aZNqC128meDqrKRxKkhh1H5TLRdsVTkC9yf0Iv8U8nQg4aT/dPJedNOI4n20Oqy C8LnABH6wtc7MImzA5xRbk7CHI0Feb8h6darSKWSLydKNJmvyJ6nRdsi5tnfBEyn kbLdYed833tCll86G8iYMwt3jO0DURn+yNlQrTsf3kvNTADvJdzAoP3tudgDxdIA T7Uxs/2VFIFAlDK+U5Aw =ap2l -----END PGP SIGNATURE----- Merge tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubifs Pull ubifs changes from Artem Bityutskiy: "Mostly fixes for the power cut emulation UBIFS mode, and only one functional change which fixes a return error code" * tag 'upstream-3.13-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: correct data corruption range UBIFS: fix return code UBIFS: remove unnecessary code in ubifs_garbage_collect	2013-11-13 15:28:45 +09:00
Linus Torvalds	a7fa20a594	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "This adds a ->writepage() implementation to fuse, improving mmaped writeout and paving the way for buffered writeback. And there's a patch to add a fix minor number for /dev/cuse, similarly to /dev/fuse" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: writepages: protect secondary requests from fuse file release fuse: writepages: update bdi writeout when deleting secondary request fuse: writepages: crop secondary requests fuse: writepages: roll back changes if request not found cuse: add fix minor number to /dev/cuse fuse: writepage: skip already in flight fuse: writepages: handle same page rewrites fuse: writepages: fix aggregation fuse: fix race in fuse_writepages() fuse: Implement writepages callback fuse: don't BUG on no write file fuse: lock page in mkwrite fuse: Prepare to handle multiple pages in writeback fuse: Getting file for writeback helper	2013-11-13 15:27:00 +09:00
Linus Torvalds	a30124539b	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext[23], udf and quota fixes from Jan Kara: "Assorted fixes in quota, ext2, ext3 & udf. Probably the most important is a fix of fs corruption issue in ext2 XIP support (OTOH xip is rarely used)" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix fs corruption in ext2_get_xip_mem() quota: info leak in quota_getquota() jbd: Revert "jbd: remove dependency on __GFP_NOFAIL" udf: fix for pathetic mount times in case of invalid file system ext3: Count journal as bsddf overhead in ext3_statfs	2013-11-13 15:25:47 +09:00
Linus Torvalds	dd1d1399f2	f2fs updates for v3.13 This patch-set includes the following major enhancement patches. o add a sysfs to control reclaiming free segments o enhance the f2fs global lock procedures o enhance the victim selection flow o wait for selected node blocks during fsync o add some tracepoints o add a config to remove abundant BUG_ONs The other bug fixes are as follows. o fix deadlock on acl operations o fix some bugs with respect to orphan inodes And, there are a bunch of cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJSgMZ/AAoJEEAUqH6CSFDSsL4P/Ri6GZyy5F0DGjAJElX825gO gthRZ1uq1OAXUYDOEy150CsFgIiWeu2MxiOV15UnmX893a5cXrf32afoa/Cqx8GG FVEYc5+dDdogezQCW6XSatQ4s7cDQDymyT2Mky4MJyAxhpYtvbpcyVI/OWdVcTwh pqdJfsfuOikbOOL6VU2B5dDKwjc+6lgntdv/eICzNCH9NqHv8kxmm+h3NfaqUVrW pK1irqsXrktcwLIrOH0c5ZpPcKPghJuw37oFpEw8MxYbTnpdrbLq4BKE/fRh8Fhf R+sQgEoWZriE1SISHmYjWdt87hnFCk3wysl61Z/zkOxnYKebRBrjEiudzxAHDIGY +I71ovpVCWe0uljdiTBpLQ/iN4p2fRMLjn7j1IsMzoG9yfVFduMaY70m1AOZI/7z 03QRpkmiRi7F8GYTSlPefsUUAnMYVDO6DzsyfHdxa8v+4UvWhSE4380L9DttNbCr 2/+NGRZ4kga6GSsMhdn2Bnm6i3TkMDJosu4USkv4qGR1SH1+S5dodwxfQdonPUZg 380kPkV7/gBYaMBSdrQFds3lh7g431gfYEfGSWt3vA14fFIWP7nIFpVIPGMM6/Sd GFe6gqZ2JLatqJnQNwEjPsBPPsiCAt6exbg86fTCvrS+oyQTiv44FNOWbz7iTrxw 5nZQfQHSMhKtux7rpM/N =Grs1 -----END PGP SIGNATURE----- Merge tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches. - add a sysfs to control reclaiming free segments - enhance the f2fs global lock procedures - enhance the victim selection flow - wait for selected node blocks during fsync - add some tracepoints - add a config to remove abundant BUG_ONs The other bug fixes are as follows. - fix deadlock on acl operations - fix some bugs with respect to orphan inodes And, there are a bunch of cleanups" * tag 'for-f2fs-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits) f2fs: issue more large discard command f2fs: fix memory leak after kobject init failed in fill_super f2fs: cleanup waiting routine for writeback pages in cp f2fs: avoid to use a NULL point in destroy_segment_manager f2fs: remove unnecessary TestClearPageError when wait pages writeback f2fs: update f2fs document f2fs: avoid to wait all the node blocks during fsync f2fs: check all ones or zeros bitmap with bitops for better mount performance f2fs: change the method of calculating the number summary blocks f2fs: fix calculating incorrect free size when update xattr in __f2fs_setxattr f2fs: add an option to avoid unnecessary BUG_ONs f2fs: introduce CONFIG_F2FS_CHECK_FS for BUG_ON control f2fs: fix a deadlock during init_acl procedure f2fs: clean up acl flow for better readability f2fs: remove unnecessary segment bitmap updates f2fs: add tracepoint for vm_page_mkwrite f2fs: add tracepoint for set_page_dirty f2fs: remove redundant set_page_dirty from write_compacted_summaries f2fs: add reclaiming control by sysfs f2fs: introduce f2fs_balance_fs_bg for some background jobs ...	2013-11-13 15:24:40 +09:00
Ilija Hadzic	66da0e1f90	devpts: plug the memory leak in kill_sb When devpts is unmounted, there may be a no-longer-used IDR tree hanging off the superblock we are about to kill. This needs to be cleaned up before destroying the SB. The leak is usually not a big deal because unmounting devpts is typically done when shutting down the whole machine. However, shutting down an LXC container instead of a physical machine exposes the problem (the garbage is detectable with kmemleak). Signed-off-by: Ilija Hadzic <ihadzic@research.bell-labs.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:36 +09:00
Kees Cook	d049f74f2d	exec/ptrace: fix get_dumpable() incorrect tests The get_dumpable() return value is not boolean. Most users of the function actually want to be testing for non-SUID_DUMP_USER(1) rather than SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a protected state. Almost all places did this correctly, excepting the two places fixed in this patch. Wrong logic: if (dumpable == SUID_DUMP_DISABLE) { /* be protective / } or if (dumpable == 0) { / be protective / } or if (!dumpable) { / be protective / } Correct logic: if (dumpable != SUID_DUMP_USER) { / be protective / } or if (dumpable != 1) { / be protective */ } Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a user was able to ptrace attach to processes that had dropped privileges to that user. (This may have been partially mitigated if Yama was enabled.) The macros have been moved into the file that declares get/set_dumpable(), which means things like the ia64 code can see them too. CVE-2013-2929 Reported-by: Vasily Kulikov <segoon@openwall.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:33 +09:00
Randy Dunlap	1c3fc3e5cc	kcore: add Kconfig help text Under Pseudo filesystems, /proc/kcore support has no help. Fixes a portion of kernel bugzilla #52671: https://bugzilla.kernel.org/show_bug.cgi?id=52671 Thanks for David Howells for the help text. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: <lailavrazda1979@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:33 +09:00
HATAYAMA Daisuke	5721cf84d9	procfs: clean up proc_reg_get_unmapped_area for 80-column limit Clean up proc_reg_get_unmapped_area due to its 80-column limit violation. Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:33 +09:00
Vyacheslav Dubeyko	95e0d7dbb9	hfsplus: implement attributes file creation functionality Implement functionality of creation AttributesFile metadata file on HFS+ volume in the case of absence of it. It makes trying to open AttributesFile's B-tree during mount of HFS+ volume. If HFS+ volume hasn't AttributesFile then a pointer on AttributesFile's B-tree keeps as NULL. Thereby, when it is discovered absence of AttributesFile on HFS+ volume in the begin of xattr creation operation then AttributesFile will be created. The creation of AttributesFile will have success in the case of availability (2 * clump) free blocks on HFS+ volume. Otherwise, creation operation is ended with error (-ENOSPC). Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:32 +09:00
Vyacheslav Dubeyko	099e9245e0	hfsplus: implement attributes file's header node initialization code Implement functionality of AttributesFile's header node initialization. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:32 +09:00
Vyacheslav Dubeyko	b3b5b0f03c	hfsplus: add metadata file's clump size calculation functionality There are situation when HFS+ volume had been created without AttributesFile. Such situation can take place because of using old mkfs.hfs utility or creation HFS+ volume without taking in mind necessity to use xattrs. For example, Mac OS X 10.4 (Tiger) doesn't create AttributesFile during mkfs phase. Also it is a very frequent situation for the case of users that created HFS+ volumes under Linux. As a result, xattrs and POSIX ACLs on HFS+ volume are unavailable for such users. This patchset implements functionality of AttributesFile creation on HFS+ volume in the case of this metadata file absence during operation of xattr creation. This patch: Add functionality of metadata file's clump size calculation. Operation of AttributesFile creation needs in clump size setting. This value will be used when AttributesFile will be extended. This code is adopted from code of newfs_hfs utility of diskdev_cmds packet http://opensource.apple.com/tarballs/diskdev_cmds/. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:32 +09:00
Michael Opdenacker	74a797d99a	fs/hfs/btree.h: remove duplicate defines This patch removes duplicate defines from fs/hfs/btree.h [akpm@linux-foundation.org: retain the comments] Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:32 +09:00
Jason Baron	67347fe4e6	epoll: do not take global 'epmutex' for simple topologies When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached directly to a wakeup source, we do not need to take the global 'epmutex', unless the epoll file descriptor is nested. The purpose of taking the 'epmutex' on add is to prevent complex topologies such as loops and deep wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD operations. However, for the simple case of an epoll file descriptor attached directly to a wakeup source (with no nesting), we do not need to hold the 'epmutex'. This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves scalability on larger systems. Quoting Nathan Zimmer's mail on SPECjbb performance: "On the 16 socket run the performance went from 35k jOPS to 125k jOPS. In addition the benchmark when from scaling well on 10 sockets to scaling well on just over 40 sockets. ... Currently the benchmark stops scaling at around 40-44 sockets but it seems like I found a second unrelated bottleneck." [akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc] Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:25 +09:00
Jason Baron	ae10b2b4eb	epoll: optimize EPOLL_CTL_DEL using rcu Nathan Zimmer found that once we get over 10+ cpus, the scalability of SPECjbb falls over due to the contention on the global 'epmutex', which is taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations. Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path by using rcu to guard against any concurrent traversals. Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for simple topologies. IE when adding a link from an epoll file descriptor to a wakeup source, where the epoll file descriptor is not nested. This patch (of 2): Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by converting the file->f_ep_links list into an rcu one. In this way, we can traverse the epoll network on the add path in parallel with deletes. Since deletes can't create loops or worse wakeup paths, this is safe. This patch in combination with the patch "epoll: Do not take global 'epmutex' for simple topologies", shows a dramatic performance improvement in scalability for SPECjbb. Signed-off-by: Jason Baron <jbaron@akamai.com> Tested-by: Nathan Zimmer <nzimmer@sgi.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Nelson Elhage <nelhage@nelhage.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> CC: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:25 +09:00
Oleg Nesterov	6bc080d8fd	debugfs: use list_next_entry() in debugfs_remove_recursive() Change debugfs_remove_recursive() to use list_next_entry(child), no changes in generated code. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:24 +09:00
Michael Opdenacker	54886a7153	cramfs: mark as obsolete Who needs cramfs when you have squashfs? At least, we should warn people that cramfs is obsolete. Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:12 +09:00
Jerome Marchand	00619bcc44	mm: factor commit limit calculation The same calculation is currently done in three differents places. Factor that code so future changes has to be made at only one place. [akpm@linux-foundation.org: uninline vm_commit_limit()] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:11 +09:00
Jan Kara	c4a391b53a	writeback: do not sync data dirtied after sync start When there are processes heavily creating small files while sync(2) is running, it can easily happen that quite some new files are created between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2). That can happen especially if there are several busy filesystems (remember that sync traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one fs before starting it on another fs). Because WB_SYNC_ALL pass is slow (e.g. causes a transaction commit and cache flush for each inode in ext3), resulting sync(2) times are rather large. The following script reproduces the problem: function run_writers { for (( i = 0; i < 10; i++ )); do mkdir $1/dir$i for (( j = 0; j < 40000; j++ )); do dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null done & done } for dir in "$@"; do run_writers $dir done sleep 40 time sync Fix the problem by disregarding inodes dirtied after sync(2) was called in the WB_SYNC_ALL pass. To allow for this, sync_inodes_sb() now takes a time stamp when sync has started which is used for setting up work for flusher threads. To give some numbers, when above script is run on two ext4 filesystems on simple SATA drive, the average sync time from 10 runs is 267.549 seconds with standard deviation 104.799426. With the patched kernel, the average sync time from 10 runs is 2.995 seconds with standard deviation 0.096. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:07 +09:00
Naoya Horiguchi	ec8e41aec1	/proc/pid/smaps: show VM_SOFTDIRTY flag in VmFlags line This flag shows that the VMA is "newly created" and thus represents "dirty" in the task's VM. You can clear it by "echo 4 > /proc/pid/clear_refs." Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Pavel Emelyanov <xemul@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:07 +09:00
David Rientjes	948927ee9e	mm, mempolicy: make mpol_to_str robust and always succeed mpol_to_str() should not fail. Currently, it either fails because the string buffer is too small or because a string hasn't been defined for a mempolicy mode. If a new mempolicy mode is introduced and no string is defined for it, just warn and return "unknown". If the buffer is too small, just truncate the string and return, the same behavior as snprintf(). This also fixes a bug where there was no NULL-byte termination when doing p++ = '=' and p++ ':' and maxlen has been reached. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Chen Gang <gang.chen@asianux.com> Cc: Rik van Riel <riel@redhat.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:05 +09:00
Xishi Qiu	83285c72e0	mm: use pgdat_end_pfn() to simplify the code in others Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn + pgdat->node_spanned_pages". Simplify the code, no functional change. Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:03 +09:00
Jan Kara	41ecc34598	ocfs2: simplify ocfs2_invalidatepage() and ocfs2_releasepage() Ocfs2 doesn't do data journalling. Thus its ->invalidatepage and ->releasepage functions never get called on buffers that have journal heads attached. So just use standard variants of functions from buffer.c. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:02 +09:00
Joe Perches	d00d2f8ab9	ocfs2: convert use of typedef ctl_table to struct ctl_table This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:02 +09:00
Xue jiufei	b1214e4757	ocfs2: fix possible double free in ocfs2_write_begin_nolock When ocfs2_write_cluster_by_desc() failed in ocfs2_write_begin_nolock() because of ENOSPC, it goes to out_quota, freeing data_ac(meta_ac). Then it calls ocfs2_try_to_free_truncate_log() to free space. If enough space freed, it will try to write again. Unfortunately, some error happenes before ocfs2_lock_allocators(), it goes to out and free data_ac(meta_ac) again. Signed-off-by: joyce <xuejiufei@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Acked-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:02 +09:00
Younger Liu	bfbca926d6	ocfs2: add missing errno in ocfs2_ioctl_move_extents() If the file is not regular or writeable, it should return errno(EPERM). This patch is based on `85a258b70d` ("ocfs2: fix error handling in ocfs2_ioctl_move_extents()"). Signed-off-by: Younger Liu <younger.liu@huawei.com> Signed-off-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:02 +09:00
Younger Liu	8abaae8d85	ocfs2: do not call brelse() if group_bh is not initialized in ocfs2_group_add() If group_bh is not initialized, there is no need to release. This problem does not cause anything wrong, but the patch would make the code more logical. Signed-off-by: Younger Liu <younger.liu@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:01 +09:00
Younger Liu	eedd40e1ca	ocfs2: rollback transaction in ocfs2_group_add() If ocfs2_journal_access_di() fails, group->bg_next_group should rollback. Otherwise, there would be a inconsistency between group_bh and main_bm_bh. Signed-off-by: Younger Liu <younger.liu@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:01 +09:00
Junxiao Bi	728b98059a	ocfs2: break useless while loop Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:01 +09:00
Akinobu Mita	518df6bdf3	ocfs2: use find_last_bit() We already have find_last_bit(). So just use it as described in the comment. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:01 +09:00
Xue jiufei	fae477b6f0	ocfs2: delay migration when the lockres is in migration state We trigger a bug in __dlm_lockres_reserve_ast() when we parallel umount 4 nodes. The situation is as follows: 1) Node A migrate all lockres it owned(eg. lockres A) to other nodes say node B when it umounts. 2) Receiving MIG_LOCKRES message from A, Node B masters the lockres A with DLM_LOCK_RES_MIGRATING state set. 3) Then we umount ocfs2 on node B. It also should migrate lockres A to another node, say node C. But now, DLM_LOCK_RES_MIGRATING state of lockers A is not cleared. Node B triggered the BUG on lockres with state DLM_LOCK_RES_MIGRATING. Signed-off-by: Xuejiufei <xuejiufei@huawei.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Tariq Saeed <tariq.x.saeed@oracle.com> Cc: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:01 +09:00
Xue jiufei	750e3c6581	ocfs2: skip locks in the blocked list A parallel umount on 4 nodes triggered a bug in dlm_process_recovery_date(). Here's the situation: Receiving MIG_LOCKRES message, A node processes the locks in migratable lockres. It copys lvb from migratable lockres when processing the first valid lock. If there is a lock in the blocked list with the EX level, it triggers the BUG. Since valid lvbs are set when locks are granted with EX or PR levels, locks in the blocked list cannot have valid lvbs. Therefore I think we should skip the locks in the blocked list. Signed-off-by: Xuejiufei <xuejiufei@huawei.com> Signed-off-by: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:01 +09:00
Akinobu Mita	a8f70de37b	ocfs2: use bitmap_weight() Use bitmap_weight() instead of reinventing the wheel. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:01 +09:00
Joel Becker	910bffe086	ocfs2: don't spam on -EDQUOT -EDQUOT is a user-visible error, not a logic problem. Teach mlog_errno() to ignore it like it ignores -ENOSPC, etc. Signed-off-by: Joel Becker <jlbec@evilplan.org> Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: Marek Królikowski <admin@wset.edu.pl> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:01 +09:00
Rui Xiang	58796207cf	ocfs2: add necessary check in case sb_getblk() fails sb_getblk() may return an err, so add a check for bh. [joseph.qi@huawei.com: also add a check after calling sb_getblk() in ocfs2_create_xattr_block()] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:00 +09:00
Rui Xiang	7391a294b8	ocfs2: return ENOMEM when sb_getblk() fails The only reason for sb_getblk() failing is if it can't allocate the buffer_head. So return ENOMEM instead when it fails. [joseph.qi@huawei.com: ocfs2_symlink_get_block() and ocfs2_read_blocks_sync() and ocfs2_read_blocks() need the same change] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Reviewed-by: Jie Liu <jeff.liu@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:00 +09:00
Junxiao Bi	f0cb0f0bca	fs/ocfs2/file.c: fix wrong comment Unwritten extent only exists for file systems which support holes. But the comment said was opposite meaning and also the comment is not very clear, so rephase it. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:00 +09:00
Goldwyn Rodrigues	06f9da6e82	fs/ocfs2: remove unnecessary variable bits_wanted from ocfs2_calc_extend_credits Code cleanup to remove unnecessary variable passed but never used to ocfs2_calc_extend_credits. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-11-13 12:09:00 +09:00
Linus Torvalds	10d0c9705e	DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0 R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20 PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN 2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8= =GCbY -----END PGP SIGNATURE----- Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates" * tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits) powerpc: add missing explicit OF includes for ppc dt/irq: add empty of_irq_count for !OF_IRQ dt: disable self-tests for !OF_IRQ of: irq: Fix interrupt-map entry matching MIPS: Netlogic: replace early_init_devtree() call of: Add Panasonic Corporation vendor prefix of: Add Chunghwa Picture Tubes Ltd. vendor prefix of: Add AU Optronics Corporation vendor prefix of/irq: Fix potential buffer overflow of/irq: Fix bug in interrupt parsing refactor. of: set dma_mask to point to coherent_dma_mask of: add vendor prefix for PHYTEC Messtechnik GmbH DT: sort vendor-prefixes.txt of: Add vendor prefix for Cadence of: Add empty for_each_available_child_of_node() macro definition arm/versatile: Fix versatile irq specifications. of/irq: create interrupts-extended property microblaze/pci: Drop PowerPC-ism from irq parsing of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code. of/irq: Use irq_of_parse_and_map() ...	2013-11-12 16:52:17 +09:00
Linus Torvalds	4b4d2b4634	H8/300 has been dead for several years, the kernel for it has not compiled for ages, and recent versions of gcc for it are broken. Remove support for it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJSev31AAoJEMsfJm/On5mBzSAQAKRBYLqtf3nJGm9pXGDhZPGG 7KSQ8S11pg/wnXYW6P/XJhFRBrYkOOCeqVKQHtmxG8MmXQkkOz95rsIvBbUzU/FT yJAPKpOHdh1yLhBGgCj3WhGtjVwpbut1/y9n2M5SpGautUgxfLj9fJiswSJx0n7t VRWKwfIpBFPLPs9w6hdDf94tIXhSX8Me2gd3LDCPBEQ2SZYd8rtBasYtDeC2+FLa Xow4ZQrCU7hpYscSUFzJpok35hl7weGhJ9jjXwtic4byFHvdiyHUwCOaEWC0hqNi fOLWFbvBogqjyAktfZhfyL9R9/7lGlLshLQNmJWR3bO+nCJ21h9ATw0R4gLBdT4/ lzLRnJ/4GdtbvmdqRxNjxxR4zHkZ+tE8HmaCmUzvqGfQyA5sJNBRrBDcWLUOVlO9 0iIZsJBZjSQXKXSk9P5xH4G0tlbAFEUnEHKsrt/mgsD9Z3SgbPKAIWSBAJA0AMQk DXZaXrBRilXOPUCZASZfmK8AQFC1GYB0tz7nT4x1mjT2/JClgG2kHCAGhNmI+CbK l9VRIgBydppLFPOGhZLSNGQp29xBhw9JgOVns4a1k7kJQEw9ht38h8Q2ckRYxXhP /z53eZKMQk62quWlyLRgR9mWqZc2CIifLVdFjiOELMh7wKPwL6eGrrrGBDbPtctS PX5K26geb0oA3ZMjpBLr =V6n6 -----END PGP SIGNATURE----- Merge tag 'h8300-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull h8300 platform removal from Guenter Roeck: "The patch series has been in -next for more than one relase cycle. I did get a number of Acks, and no objections. H8/300 has been dead for several years, the kernel for it has not compiled for ages, and recent versions of gcc for it are broken. Remove support for it" * tag 'h8300-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: CREDITS: Add Yoshinori Sato for h8300 fs/minix: Drop dependency on H8300 Drop remaining references to H8/300 architecture Drop MAINTAINERS entry for H8/300 watchdog: Drop references to H8300 architecture net/ethernet: Drop H8/300 Ethernet driver net/ethernet: smsc9194: Drop conditional code for H8/300 ide: Drop H8/300 driver Drop support for Renesas H8/300 (h8300) architecture	2013-11-12 14:13:14 +09:00
Andreas Dilger	3f61c0cc70	ext4: add prototypes for macro-generated functions It isn't very easy to find the declarations for the functions created by EXT4_INODE_BIT_FNS() because the names are generated by macros: ext4_test_inode_flag, ext4_set_inode_flag, ext4_clear_inode_flag ext4_test_inode_state, ext4_set_inode_state, ext4_clear_inode_state Add explicit declarations for these functions so that grep and tags can find them. Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2013-11-11 22:40:40 -05:00
Andreas Dilger	9206c56155	ext4: return non-zero st_blocks for inline data Return a non-zero st_blocks to userspace for statfs() and friends. Some versions of tar will assume that files with st_blocks == 0 do not contain any data and will skip reading them entirely. Signed-off-by: Andreas Dilger <andreas.dilger@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2013-11-11 22:38:12 -05:00
Miao Xie	91aef86f3b	Btrfs: rename btrfs_start_all_delalloc_inodes rename the function -- btrfs_start_all_delalloc_inodes(), and make its name be compatible to btrfs_wait_ordered_roots(), since they are always used at the same place. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:58 -05:00
Miao Xie	b02441999e	Btrfs: don't wait for the completion of all the ordered extents It is very likely that there are lots of ordered extents in the filesytem, if we wait for the completion of all of them when we want to reclaim some space for the metadata space reservation, we would be blocked for a long time. The performance would drop down suddenly for a long time. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:44 -05:00
Miao Xie	9f3a074d10	Btrfs: don't wait for all the async delalloc when shrinking delalloc It was very likely that there were lots of async delalloc pages in the filesystem, if we waited until all the pages were flushed, we would be blocked for a long time, and the performance would also drop down. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:37 -05:00
Miao Xie	c61a16a701	Btrfs: fix the confusion between delalloc bytes and metadata bytes In shrink_delalloc(), what we need reclaim is the metadata space, so flushing pages by to_reclaim is not reasonable, it is very likely that the pages we flush are not enough. And then we had to invoke the flush function for several times, at the worst, we need call flush_space for several times. It wasted time. We improve this problem by converting the metadata space size we need reserve to the delalloc bytes, By this way, we can flush the pages by a reasonable number. (Now we use a fixed number to do conversion, it is not flexible, maybe we can find a good way to improve it in the future.) Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:30 -05:00
Miao Xie	18cd8ea6df	Btrfs: pick up the code for the item number calculation in flush_space() This patch picked up the code that was used to calculate the number of the items for which we need reserve space, and we will use it in the next patch. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:23 -05:00
Miao Xie	38c135af8e	Btrfs: wait for the ordered extent only when we want Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:15 -05:00
Miao Xie	d3ee29e396	Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc() Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:13:07 -05:00
Wang Shilong	3b7a016f44	Btrfs: avoid unnecessary scrub workers allocation We only allocate scrub workers if we pass all the necessary checks, for example, there are no operation in progress. Besides, move mutex lock protection outside of scrub_workers_get() /scrub_workers_put(). Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:12:58 -05:00
Josef Bacik	007d31f755	Btrfs: check file extent type before anything else I hit this problem with my no holes patch and it made me realize what the problem was for bz 60834. If the first item in the leaf is an inline extent and we try to read anything starting from disk_bytenr onward we will read off the end of the leaf. So we need to check to see what it's type is, and if it's not REG we can just break out. This should fix this problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:12:49 -05:00
Rashika	f570e757b5	btrfs: Remove useless variable in write_ctree_super() The function write_ctree_super() in disk-io.c uses variable ret to return the result of function write_all_supers(). Since, this variable serves no purpose, hence the patch removes it and returns the call of the called function. Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:12:40 -05:00
Dulshani Gunawardhana	678712545b	btrfs: Fix checkpatch.pl warning of spacing issues Fix spacing issues detected via checkpatch.pl in accordance with the kernel style guidelines. Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:12:31 -05:00
Dulshani Gunawardhana	d9b0d9ba04	btrfs: Replace kmalloc with kmalloc_array Replace kmalloc(size * nr, ) with kmalloc_array(nr, size), thus making it easier to check is that the calculation doesn't wrap or return a smaller allocation Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:12:22 -05:00
Dulshani Gunawardhana	e248e04e77	btrfs: Enclose macros with complex values within parenthesis Enclose macros with complex values within parenthesis in accordance to checkpatch.pl. Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:12:06 -05:00
Dulshani Gunawardhana	fae7f21cec	btrfs: Use WARN_ON()'s return value in place of WARN_ON(1) Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source code that outputs a more descriptive warnings. Also fix the styling warning of redundant braces that came up as a result of this fix. Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:11:53 -05:00
Dulshani Gunawardhana	b19e684393	btrfs: Remove redundant local zero structure Remove redundant local zero structure, replacing it by the kernel's global ZERO_PAGE. Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:11:39 -05:00
Dulshani Gunawardhana	3c45bfc152	btrfs: Pack struct btrfs_device Pack the structure btrfs_device in volumes.h to eliminate holes detected by pahole, thus reducing binary memory footprint. Signed-off-by: Dulshani Gunawardhana <dulshani.gunawardhana89@gmail.com> Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:11:26 -05:00
Rashika	95e94d14b4	btrfs: Replace multiple atomic_inc() with atomic_add() This patch replaces multiple atomic_inc() with atomic_add() in delayed-inode.c to reduce source code and have few instructions for compilation. Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:11:19 -05:00
Rashika	2e9f595497	btrfs: Add helper function for free_root_pointers() The function free_root_pointers() in disk-io.h contains redundant code. Therefore, this patch adds a helper function free_root_extent_buffers() to free_root_pointers() to eliminate redundancy. Reviewed-by: Zach Brown <zab@redhat.com> Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:11:13 -05:00
Liu Bo	48ec47364b	Btrfs: fix a crash when running balance and defrag concurrently Running balance and defrag concurrently can end up with a crash: kernel BUG at fs/btrfs/relocation.c:4528! RIP: 0010:[<ffffffffa01ac33b>] [<ffffffffa01ac33b>] btrfs_reloc_cow_block+ 0x1eb/0x230 [btrfs] Call Trace: [<ffffffffa01398c1>] ? update_ref_for_cow+0x241/0x380 [btrfs] [<ffffffffa0180bad>] ? copy_extent_buffer+0xad/0x110 [btrfs] [<ffffffffa0139da1>] __btrfs_cow_block+0x3a1/0x520 [btrfs] [<ffffffffa013a0b6>] btrfs_cow_block+0x116/0x1b0 [btrfs] [<ffffffffa013ddad>] btrfs_search_slot+0x43d/0x970 [btrfs] [<ffffffffa0153c57>] btrfs_lookup_file_extent+0x37/0x40 [btrfs] [<ffffffffa0172a5e>] __btrfs_drop_extents+0x11e/0xae0 [btrfs] [<ffffffffa013b3fd>] ? generic_bin_search.constprop.39+0x8d/0x1a0 [btrfs] [<ffffffff8117d14a>] ? kmem_cache_alloc+0x1da/0x200 [<ffffffffa0138e7a>] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [<ffffffffa0173ef0>] btrfs_drop_extents+0x60/0x90 [btrfs] [<ffffffffa016b24d>] relink_extent_backref+0x2ed/0x780 [btrfs] [<ffffffffa0162fe0>] ? btrfs_submit_bio_hook+0x1e0/0x1e0 [btrfs] [<ffffffffa01b8ed7>] ? iterate_inodes_from_logical+0x87/0xa0 [btrfs] [<ffffffffa016b909>] btrfs_finish_ordered_io+0x229/0xac0 [btrfs] [<ffffffffa016c3b5>] finish_ordered_fn+0x15/0x20 [btrfs] [<ffffffffa018cbe5>] worker_loop+0x125/0x4e0 [btrfs] [<ffffffffa018cac0>] ? btrfs_queue_worker+0x300/0x300 [btrfs] [<ffffffff81075ea0>] kthread+0xc0/0xd0 [<ffffffff81075de0>] ? insert_kthread_work+0x40/0x40 [<ffffffff8164796c>] ret_from_fork+0x7c/0xb0 [<ffffffff81075de0>] ? insert_kthread_work+0x40/0x40 ---------------------------------------------------------------------- It turns out to be that balance operation will bump root's @last_snapshot, which enables snapshot-aware defrag path, and backref walking stuff will find data reloc tree as refs' parent, and hit the BUG_ON() during COW. As data reloc tree's data is just for relocation purpose, and will be deleted right after relocation is done, it's unnecessary to walk those refs belonged to data reloc tree, it'd be better to skip them. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:11:07 -05:00
Liu Bo	6f519564d7	Btrfs: do not run snapshot-aware defragment on error If something wrong happens in write endio, running snapshot-aware defragment can end up with undefined results, maybe a crash, so we should avoid it. In order to share similar code, this also adds a helper to free the struct for snapshot-aware defrag. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:11:00 -05:00
Filipe David Borba Manana	269d040ff2	Btrfs: log recovery, don't unlink inode always on error If we get any error while doing a dir index/item lookup in the log tree, we were always unlinking the corresponding inode in the subvolume. It makes sense to unlink only if the lookup failed to find the dir index/item, which corresponds to NULL or -ENOENT, and not when other errors happen (like a transient -ENOMEM or -EIO). Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:10:48 -05:00
Filipe David Borba Manana	488111aa0e	Btrfs: fix csum search offset/length calculation in log tree We were setting the csums search offset and length to the right values if the extent is compressed, but later on right before doing the csums lookup we were overriding these two parameters regardless of compression being set or not for the extent. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:10:42 -05:00
Filipe David Borba Manana	e46f5388cd	Btrfs: fix verification of dir_item We were ignoring the name component of the dir_item. Both the name and data must fit within BTRFS_MAX_XATTR_SIZE(root). Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:10:36 -05:00
Wang Shilong	9b011adfe1	Btrfs: remove scrub_super_lock holding in btrfs_sync_log() Originally, we introduced scrub_super_lock to synchronize tree log code with scrubbing super. However we can replace scrub_super_lock with device_list_mutex, because writing super will hold this mutex, this will reduce an extra lock holding when writing supers in sync log code. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:10:13 -05:00
Wang Shilong	7fdf4b608d	Btrfs: use 'u64' rather than 'int' to get extent's generation We define a 'int' to get extent's generation by mistake,fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:10:06 -05:00
Miao Xie	9dced186f9	Btrfs: fix the free space write out failure when there is no data space After running space balance on a new fs, the fs check program outputed the following warning message: free space inode generation (0) did not match free space cache generation (20) Steps to reproduce: # mkfs.btrfs -f <dev> # mount <dev> <mnt> # btrfs balance start <mnt> # umount <mnt> # btrfs check <dev> It was because there was no data space after the space balance, and the free space write out task didn't try to allocate a new data chunk for the free space inode when doing the reservation. So the data space reservation failed, and in order to tell the free space loader that this free space inode could not be trusted, the generation of the free space inode wasn't updated. Then the check program found this problem and outputed the above message. But in fact, it is safe that we try to allocate a new data chunk when we find the data space is not enough. The patch fixes the above problem by this way. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:08:49 -05:00
Josef Bacik	9e6a0c52b7	Btrfs: stop committing the transaction so much during relocate I noticed with my horrible snapshot excercisor that we were taking forever to relocate the larger the file system got. This appeared to be because we were committing the transaction _constantly_. There were a few places where we do braindead things with metadata reservation, like start a transaction and then try to refill the block rsv, which not only keeps us from committing a transaction during the enospc stuff, but keeps us from doing some of the harder flushing work which will make us more likely to need to commit the transaction. We also were checking the block rsv and committing the transaction if the block rsv was below a certain threshold, but we were doing this in a place where we don't actually keep anything in the block rsv so this was always ending up false so we always committed the transaction in this case. I tested this to make sure it didn't break anything, but it takes about 10 hours to get the box to this state so I don't know how much of an impact it will really make. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:08:31 -05:00
Josef Bacik	9f23e289ed	Btrfs: make sure the delalloc workers actually flush compressed writes When using delalloc workers in a non-waiting way (like for enospc handling) we can end up not actually waiting for the dirty pages to be started if we have compression. We need to add an extra filemap flush to make sure any async extents that have started are actually moved along before returning. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:08:22 -05:00
Josef Bacik	9385876917	Btrfs: take ordered root lock when removing ordered operations inode A user reported a list corruption warning from btrfs_remove_ordered_extent, it is because we aren't taking the ordered_root_lock when we remove the inode from the ordered operations list. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:08:10 -05:00
Josef Bacik	d788a34929	Btrfs: don't abort transaction in run_delalloc_nocow This is just the write path, the only reason we start a transaction is so we can check cross references, we don't make any actual changes, so there is no reason to abort the transaction if we fail. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:07:58 -05:00
Josef Bacik	02ecd2c278	Btrfs: do not bug_on if we try to cow a free space cache inode We can just return an error and we'll bail out properly. We still want to catch this case to make sure we don't have a bug somewhere, so just warn if this pops up. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:07:49 -05:00
Josef Bacik	0ef8b72607	Btrfs: return an error from btrfs_wait_ordered_range I noticed that if the free space cache has an error writing out it's data it won't actually error out, it will just carry on. This is because it doesn't check the return value of btrfs_wait_ordered_range, which didn't actually return anything. So fix this in order to keep us from making free space cache look valid when it really isnt. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:07:35 -05:00
Josef Bacik	ed2590953b	Btrfs: stop using vfs_read in send Apparently we don't actually close the files until we return to userspace, so stop using vfs_read in send. This is actually better for us since we can avoid all the extra logic of holding the file we're sending open and making sure to clean it up. This will fix people who have been hitting too many files open errors when trying to send. Thanks, Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:07:11 -05:00
Stefan Behrens	301993a4a1	Btrfs: check_int, remove warning for mixed-mode In mixed-mode, when a data-block was later reused for metadata, a warning was printed. This condition is now filtered out and the warning is eliminated in this case. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:04:25 -05:00
Stefan Behrens	a5f519c91d	Btrfs: fix check_int 'leaf item out of bounce' regression Yet another cleanup patch broke code for which no xfstest exists. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:04:06 -05:00
Filipe David Borba Manana	5599488708	Btrfs: optimize extent item search in run_delayed_extent_op Instead of doing another extent tree search if the first search failed to find a metadata item, check if the previous item in the leaf is an extent item and use it if it is, otherwise do the second tree search for an extent item. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:03:53 -05:00
Jeff Mahoney	cab45e22da	btrfs: add tracing for failed reservations When debugging ENOSPC issues, it's nice to be able to see which reservations failed as well as the ones which succeeded. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:03:37 -05:00
Zach Brown	8b558c5f09	btrfs: remove fs/btrfs/compat.h fs/btrfs/compat.h only contained trivial macro wrappers of drop_nlink() and inc_nlink(). This doesn't belong in mainline. Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:03:19 -05:00
Zach Brown	1877e1a747	btrfs: remove move_pages() move_pages() has an inefficient backwards byte copy of regions of two different pages. They're different pages so the regions won't overlap and it could use memcpy(). At that point, though, move_pages() would be a slightly dimmer re-implementation of copy_pages() that lacked the test for overlapping page regions. So remove move_pages() and just call copy_pages(). Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:03:09 -05:00
Zach Brown	4546bcaeba	btrfs: use get_seconds() instead of btrfs wrapper Signed-off-by: Zach Brown <zab@redhat.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:03:00 -05:00
Filipe David Borba Manana	8185554d3e	Btrfs: fix incorrect inode acl reset When a directory has a default ACL and a subdirectory is created under that directory, btrfs_init_acl() is called when the subdirectory's inode is created to initialize the inode's ACL (inherited from the parent directory) but it was clearing the ACL from the inode after setting it if posix_acl_create() returned success, instead of clearing it only if it returned an error. To reproduce this issue: $ mkfs.btrfs -f /dev/loop0 $ mount /dev/loop0 /mnt $ mkdir /mnt/acl $ setfacl -d --set u::rwx,g::rwx,o::- /mnt/acl $ getfacl /mnt/acl user::rwx group::rwx other::r-x default:user::rwx default:group::rwx default:other::--- $ mkdir /mnt/acl/dir1 $ getfacl /mnt/acl/dir1 user::rwx group::rwx other::--- After unmounting and mounting again the filesystem, fgetacl returned the expected ACL: $ umount /mnt/acl $ mount /dev/loop0 /mnt $ getfacl /mnt/acl/dir1 user::rwx group::rwx other::--- default:user::rwx default:group::rwx default:other::--- Meaning that the underlying xattr was persisted. Reported-by: Giuseppe Fierro <giuseppe@fierro.org> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:02:51 -05:00
Stefan Behrens	ff76b05655	Btrfs: Don't allocate inode that is already in use Due to an off-by-one error, it is possible to reproduce a bug when the inode cache is used. The same inode number is assigned twice, the second time this leads to an EEXIST in btrfs_insert_empty_items(). The issue can happen when a file is removed right after a subvolume is created and then a new inode number is created before the inodes in free_inode_pinned are processed. unlink() calls btrfs_return_ino() which calls start_caching() in this case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by searching for the highest inode (which already cannot find the unlinked one anymore in btrfs_find_free_objectid()). So if this unlinked inode's number is equal to the highest_ino + 1 (or >= this value instead of > this value which was the off-by-one error), we mustn't add the inode number to free_ino_pinned (caching_thread() does it right). In this case we need to try directly to add the number to the inode_cache which will fail in this case. When this inode number is allocated while it is still in free_ino_pinned, it is allocated and still added to the free inode cache when the pinned inodes are processed, thus one of the following inode number allocations will get an inode that is already in use and fail with EEXIST in btrfs_insert_empty_items(). One example which was created with the reproducer below: Create a snapshot, work in the newly created snapshot for the rest. In unlink(inode 34284) call btrfs_return_ino() which calls start_caching(). start_caching() calls add_free_space [34284, 18446744073709517077]. In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong. mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284. btrfs_unpin_free_ino calls add_free_space [34284, 1]. mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284. EEXIST when the new inode is inserted. One possible reproducer is this one: #!/bin/sh # preparation TEST_DEV=/dev/sdc1 TEST_MNT=/mnt umount ${TEST_MNT} 2>/dev/null \|\| true mkfs.btrfs -f ${TEST_DEV} mount ${TEST_DEV} ${TEST_MNT} -o \ rw,relatime,compress=lzo,space_cache,inode_cache btrfs subv create ${TEST_MNT}/s1 for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2 FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 \| sed 's\|^./$[^/]$$\|\1\|'` rm ${TEST_MNT}/s2/$FILENAME touch ${TEST_MNT}/s2/$FILENAME # the following steps can be repeated to reproduce the issue again and again [ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3 btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3 rm ${TEST_MNT}/s3/$FILENAME touch ${TEST_MNT}/s3/$FILENAME ls -alFi ${TEST_MNT}/s?/$FILENAME touch ${TEST_MNT}/s3/_1 \|\| logger FAILED ls -alFi ${TEST_MNT}/s?/_1 touch ${TEST_MNT}/s3/_2 \|\| logger FAILED ls -alFi ${TEST_MNT}/s?/_2 touch ${TEST_MNT}/s3/__1 \|\| logger FAILED ls -alFi ${TEST_MNT}/s?/__1 touch ${TEST_MNT}/s3/__2 \|\| logger FAILED ls -alFi ${TEST_MNT}/s?/__2 # if the above is not enough, add the following loop: for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} \|\| logger FAILED; done #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} \|\| logger FAILED; done # one of the touch(1) calls in s3 fail due to EEXIST because the inode is # already in use that btrfs_find_ino_for_alloc() returns. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:02:36 -05:00
Filipe David Borba Manana	e8b0d724d5	Btrfs: fix btrfs_prev_leaf() previous key computation If we decrement the key type, we must reset its offset to the largest possible offset (u64)-1. If we decrement the key's objectid, then we must reset the key's type and offset to their largest possible values, (u8)-1 and (u64)-1 respectively. Not doing so can make us miss an items in the tree. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:02:26 -05:00
Filipe David Borba Manana	e93ae26fe1	Btrfs: optimize tree-log.c:count_inode_refs() Avoid repeated tree searches by processing all inode ref items in a leaf at once instead of processing one at a time, followed by a path release and a tree search for a key with a decremented offset. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:02:19 -05:00
Geyslan G. Bem	229eed4348	btrfs: simplify kmalloc+copy_from_user to memdup_user Use memdup_user rather than duplicating its implementation This is a little bit restricted to reduce false positives The semantic patch that makes this report is available in scripts/coccinelle/api/memdup_user.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/ Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:01:51 -05:00
chandan	5ede859b00	Btrfs: btrfs_add_ordered_operation: Fix last modified transaction comparison. Comparison of an inode's last modified transaction with the last committed transaction is incorrect. Fix it. Signed-off-by: chandan <chandan@linux.vnet.ibm.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:01:37 -05:00
Filipe David Borba Manana	3c77bd94ec	Btrfs: don't leak delayed node on path allocation failure If the path allocation failed, we would return without decrementing the reference count in the delayed node we got before, resulting in a leak. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:01:27 -05:00
Stefan Behrens	361c093d7f	Btrfs: Wait for uuid-tree rebuild task on remount read-only If the user remounts the filesystem read-only while the uuid-tree scan and rebuild task is still running (this happens once after the filesystem was mounted with an old kernel, or when forced with the mount options), the remount should wait on the tasks completion before setting the filesystem read-only. Otherwise the background task continues to write to the filesystem which is apparently not what users expect. The reproducer: TEST_DEV=/dev/sdzzzzz1 TEST_MNT=/mnt mkfs.btrfs -f $TEST_DEV mount $TEST_DEV $TEST_MNT for i in `seq 50000`; do btrfs subvolume create ${TEST_MNT}/$i; done umount $TEST_MNT mount $TEST_DEV $TEST_MNT -o rescan_uuid_tree sleep 1 ps -elf \| fgrep '[btrfs-uuid]' \| grep -v grep mount $TEST_DEV $TEST_MNT -o ro,remount ps -elf \| fgrep '[btrfs-uuid]' \| grep -v grep sleep 1 umount $TEST_MNT Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Chris Mason <chris.mason@fusionio.com>	2013-11-11 22:01:18 -05:00

1 2 3 4 5 ...

34139 Commits