linux

Author	SHA1	Message	Date
Qu Wenruo	8465ecec96	btrfs: Check qgroup level in kernel qgroup assign. Although we have qgroup level check in btrfs-progs, it's not enough since other programe may still call ioctl directly not using btrfs-progs. For example, systemd. But it's btrfs-progs to be blame since we don't provide a full-function(like subvolume create things) btrfs library with enough check, and only rely on kernel ioctl. So Add level checks in kernel too. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:53 -07:00
Dongsheng Yang	f5a6b1c53b	btrfs: qgroup: allow to remove qgroup which has parent but no child. When a qgroup has parents but no child, it should be removable in Theory I think. But currently, we can not remove it when it has either parent or child. Example: # btrfs quota enable /mnt # btrfs qgroup create 1/0 /mnt # btrfs qgroup create 2/0 /mnt # btrfs qgroup assign 1/0 2/0 /mnt # btrfs qgroup show -pcre /mnt qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 16384 16384 0 0 --- --- 1/0 0 0 0 0 2/0 --- 2/0 0 0 0 0 --- 1/0 At this time, there is no subvol or qgroup depending on it. Just a qgroup 2/0 is its parent, but 2/0 can work well without 1/0. So I think 1/0 should be removalbe. But: # btrfs qgroup destroy 1/0 /mnt ERROR: unable to destroy quota group: Device or resource busy This patch remove the check of qgroup->parent in removing it, then we can remove a qgroup when it has a parent. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:52 -07:00
Dongsheng Yang	09870d2772	btrfs: qgroup: return EINVAL if level of parent is not higher than child's. When we create a subvol inheriting a qgroup, we need to check the level of them. Otherwise, there is a chance a qgroup can inherit another qgroup at the same level. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:51 -07:00
Dongsheng Yang	e2d1f92399	btrfs: qgroup: do a reservation in a higher level. There are two problems in qgroup: a). The PAGE_CACHE is 4K, even when we are writing a data of 1K, qgroup will reserve a 4K size. It will cause the last 3K in a qgroup is not available to user. b). When user is writing a inline data, qgroup will not reserve it, it means this is a window we can exceed the limit of a qgroup. The main idea of this patch is reserving the data size of write_bytes rather than the reserve_bytes. It means qgroup will not care about the data size btrfs will reserve for user, but only care about the data size user is going to write. Then reserve it when user want to write and release it in transaction committed. In this way, qgroup can be released from the complex procedure in btrfs and only do the reserve when user want to write and account when the data is written in commit_transaction(). Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:50 -07:00
Dongsheng Yang	237c0e9f1f	Btrfs: qgroup, Account data space in more proper timings. Currenly, in data writing, ->reserved is accounted in fill_delalloc(), but ->may_use is released in clear_bit_hook() which is called by btrfs_finish_ordered_io(). That's too late, that said, between fill_delalloc() and btrfs_finish_ordered_io(), the data is doublely accounted by qgroup. It will cause some unexpected -EDQUOT. Example: # btrfs quota enable /root/btrfs-auto-test/ # btrfs subvolume create /root/btrfs-auto-test//sub Create subvolume '/root/btrfs-auto-test/sub' # btrfs qgroup limit 1G /root/btrfs-auto-test//sub dd if=/dev/zero of=/root/btrfs-auto-test//sub/file bs=1024 count=1500000 dd: error writing '/root/btrfs-auto-test//sub/file': Disk quota exceeded 681353+0 records in 681352+0 records out 697704448 bytes (698 MB) copied, 8.15563 s, 85.5 MB/s It's (698 MB) when we got an -EDQUOT, but we limit it by 1G. This patch move the btrfs_qgroup_reserve/free() for data from btrfs_delalloc_reserve/release_metadata() to btrfs_check_data_free_space() and btrfs_free_reserved_data_space(). Then the accounter in qgroup will be updated at the same time with the accounter in space_info updated. In this way, the unexpected -EDQUOT will be killed. Reported-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:48 -07:00
Dongsheng Yang	31193213f1	Btrfs: qgroup: Introduce a may_use to account space_info->bytes_may_use. Currently, for pre_alloc or delay_alloc, the bytes will be accounted in space_info by the three guys. space_info->bytes_may_use --- space_info->reserved --- space_info->used. But on the other hand, in qgroup, there are only two counters to account the bytes, qgroup->reserved and qgroup->excl. And qg->reserved accounts bytes in space_info->bytes_may_use and qg->excl accounts bytes in space_info->used. So the bytes in space_info->reserved is not accounted in qgroup. If so, there is a window we can exceed the quota limit when bytes is in space_info->reserved. Example: # btrfs quota enable /mnt # btrfs qgroup limit -e 10M /mnt # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done # sync # btrfs qgroup show -pcre /mnt qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 20987904 20987904 0 10485760 --- --- qg->excl is 20987904 larger than max_excl 10485760. This patch introduce a new counter named may_use to qgroup, then there are three counters in qgroup to account bytes in space_info as below. space_info->bytes_may_use --- space_info->reserved --- space_info->used. qgroup->may_use --- qgroup->reserved --- qgroup->excl With this patch applied: # btrfs quota enable /mnt # btrfs qgroup limit -e 10M /mnt # for((i=0;i<20;i++));do fallocate -l 1M /mnt/data$i; done fallocate: /mnt/data9: fallocate failed: Disk quota exceeded fallocate: /mnt/data10: fallocate failed: Disk quota exceeded fallocate: /mnt/data11: fallocate failed: Disk quota exceeded fallocate: /mnt/data12: fallocate failed: Disk quota exceeded fallocate: /mnt/data13: fallocate failed: Disk quota exceeded fallocate: /mnt/data14: fallocate failed: Disk quota exceeded fallocate: /mnt/data15: fallocate failed: Disk quota exceeded fallocate: /mnt/data16: fallocate failed: Disk quota exceeded fallocate: /mnt/data17: fallocate failed: Disk quota exceeded fallocate: /mnt/data18: fallocate failed: Disk quota exceeded fallocate: /mnt/data19: fallocate failed: Disk quota exceeded # sync # btrfs qgroup show -pcre /mnt qgroupid rfer excl max_rfer max_excl parent child -------- ---- ---- -------- -------- ------ ----- 0/5 9453568 9453568 0 10485760 --- --- Reported-by: Cyril SCETBON <cyril.scetbon@free.fr> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:47 -07:00
Dongsheng Yang	804ca127fb	Btrfs: qgroup: free reserved in exceeding quota. When we exceed quota limit in writing, we will free some reserved extent when we need to drop but not free account in qgroup. It means, each time we exceed quota in writing, there will be some remain space in qg->reserved we can not use any more. If things go on like this, the all space will be ate up. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:46 -07:00
Dongsheng Yang	4087cf24ae	Btrfs: qgroup: cleanup, remove an unsued parameter in btrfs_create_qgroup(). Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:44 -07:00
Dongsheng Yang	03477d945f	btrfs: qgroup: fix limit args override whole limit struct btrfs_limit_group use arg limit to override the old qgroup_limit of corresponding qgroup. However, we should override part of old qgroup_limit according to the bit which has been set in arg limit. Signed-off-by: Fan Chengniang <fancn.fnst@cn.fujitsu.com> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:43 -07:00
Dongsheng Yang	d3001ed3a8	btrfs: qgroup: update limit info in function btrfs_run_qgroups(). When we commit_transaction(), qgroups in btree should be updated. But, limit info is not considered currently. It will cause a problem when a qgroup of a snapshot inherit the limit info from srcqgroup, then there is an inconsistency. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:42 -07:00
Dongsheng Yang	1510e71c62	btrfs: qgroup: consolidate the parameter of fucntion update_qgroup_limit_item(). Cleanup: Change the parameter of update_qgroup_limit_item() to the family of update_qgroup_xxx_item(). Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:41 -07:00
Dongsheng Yang	e8c8541ac3	btrfs: qgroup: update qgroup in memory at the same time when we update it in btree. When we call btrfs_qgroup_inherit() with BTRFS_QGROUP_INHERIT_SET_LIMITS, btrfs will update the limit info of qgroup in btree but forget to update the qgroup in rbtree at the same time. It obviousely will cause an inconsistency. This patch fix it by updating the rbtree at the same time. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:40 -07:00
Dongsheng Yang	3eeb4d597e	btrfs: qgroup: inherit limit info from srcgroup in creating snapshot. Currently, when we snapshot a subvol, snapshot will not copy the limits from srcqgroup. This patch make the qgroup in snapshot inherit the limit info when create a snapshot. Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:52:38 -07:00
Namhyung Kim	0d68bc92c4	perf kmem: Analyze page allocator events also The perf kmem command records and analyze kernel memory allocation only for SLAB objects. This patch implement a simple page allocator analyzer using kmem:mm_page_alloc and kmem:mm_page_free events. It adds two new options of --slab and --page. The --slab option is for analyzing SLAB allocator and that's what perf kmem currently does. The new --page option enables page allocator events and analyze kernel memory usage in page unit. Currently, 'stat --alloc' subcommand is implemented only. If none of these --slab nor --page is specified, --slab is implied. First run 'perf kmem record' to generate a suitable perf.data file: # perf kmem record --page sleep 5 Then run 'perf kmem stat' to postprocess the perf.data file: # perf kmem stat --page --alloc --line 10 ------------------------------------------------------------------------------- PFN \| Total alloc (KB) \| Hits \| Order \| Mig.type \| GFP flags ------------------------------------------------------------------------------- 4045014 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 4143980 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 3938658 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 4045400 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 3568708 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 3729824 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 3657210 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 `4120750` \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 3678850 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 3693874 \| 16 \| 1 \| 2 \| RECLAIM \| 00285250 ... \| ... \| ... \| ... \| ... \| ... ------------------------------------------------------------------------------- SUMMARY (page allocator) ======================== Total allocation requests : 44,260 [ 177,256 KB ] Total free requests : 117 [ 468 KB ] Total alloc+freed requests : 49 [ 196 KB ] Total alloc-only requests : 44,211 [ 177,060 KB ] Total free-only requests : 68 [ 272 KB ] Total allocation failures : 0 [ 0 KB ] Order Unmovable Reclaimable Movable Reserved CMA/Isolated ----- ------------ ------------ ------------ ------------ ------------ 0 32 . 44,210 . . 1 . . . . . 2 . 18 . . . 3 . . . . . 4 . . . . . 5 . . . . . 6 . . . . . 7 . . . . . 8 . . . . . 9 . . . . . 10 . . . . . Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1428298576-9785-4-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2015-04-13 11:44:52 -03:00
Namhyung Kim	9fdd8a875c	tracing, mm: Record pfn instead of pointer to struct page The struct page is opaque for userspace tools, so it'd be better to save pfn in order to identify page frames. The textual output of $debugfs/tracing/trace file remains unchanged and only raw (binary) data format is changed - but thanks to libtraceevent, userspace tools which deal with the raw data (like perf and trace-cmd) can parse the format easily. So impact on the userspace will also be minimal. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Based-on-patch-by: Joonsoo Kim <js1304@gmail.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1428298576-9785-3-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2015-04-13 11:44:52 -03:00
Zhao Lei	c99f1b0c6c	btrfs: Support busy loop of write and delete Reproduce: while true; do dd if=/dev/zero of=/mnt/btrfs/file count=[75% fs_size] rm /mnt/btrfs/file done Then we can see above loop failed on NO_SPACE. It it long-term problem since very beginning, because delayed-iput after rm are not run. We already have commit_transaction() in alloc_space code, but it is not triggered in above case. This patch trigger commit_transaction() to run delayed-iput and reflash pinned-space to to make write success. It is based on previous fix of delayed-iput in commit_transaction(), need to be applied on top of: btrfs: Fix NO_SPACE bug caused by delayed-iput Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:31:10 -07:00
Zhao Lei	d7c151717a	btrfs: Fix NO_SPACE bug caused by delayed-iput Steps to reproduce: while true; do dd if=/dev/zero of=/btrfs_dir/file count=[fs_size * 75%] rm /btrfs_dir/file sync done And we'll see dd failed because btrfs return NO_SPACE. Reason: Normally, btrfs_commit_transaction() call btrfs_run_delayed_iputs() in end to free fs space for next write, but sometimes it hadn't done work on time, because btrfs-cleaner thread get delayed-iputs from list before, but do iput() after next write. This is log: [ 2569.050776] comm=btrfs-cleaner func=btrfs_evict_inode() begin [ 2569.084280] comm=sync func=btrfs_commit_transaction() call btrfs_run_delayed_iputs() [ 2569.085418] comm=sync func=btrfs_commit_transaction() done btrfs_run_delayed_iputs() [ 2569.087554] comm=sync func=btrfs_commit_transaction() end [ 2569.191081] comm=dd begin [ 2569.790112] comm=dd func=__btrfs_buffered_write() ret=-28 [ 2569.847479] comm=btrfs-cleaner func=add_pinned_bytes() 0 + 32677888 = 32677888 [ 2569.849530] comm=btrfs-cleaner func=add_pinned_bytes() 32677888 + 23834624 = 56512512 ... [ 2569.903893] comm=btrfs-cleaner func=add_pinned_bytes() 943976448 + 21762048 = 965738496 [ 2569.908270] comm=btrfs-cleaner func=btrfs_evict_inode() end Fix: Make btrfs_commit_transaction() wait current running btrfs-cleaner's delayed-iputs() done in end. Test: Use script similar to above(more complex), before patch: 7 failed in 100 * 20 loop. after patch: 0 failed in 100 * 20 loop. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:27:41 -07:00
Zhao Lei	18d018ad2c	btrfs: add WARN_ON() to check is space_info op current space_info's value calculation is some complex and easy to cause bug, add WARN_ON() to help debug. Changelog v1->v2: Put WARN_ON()s under the ENOSPC_DEBUG mount option. Suggested by: David Sterba <dsterba@suse.cz> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:27:35 -07:00
Zhao Lei	c30666d466	btrfs: Set relative data on clear btrfs_block_group_cache->pinned Bug1: space_info->bytes_readonly was set to very large(negative) value in btrfs_remove_block_group(). Reason: Current code set block_group_cache->pinned = 0 in btrfs_delete_unused_bgs(), but above space was not counted to space_info->bytes_readonly. Then in btrfs_remove_block_group(): block_group->space_info->bytes_readonly -= block_group->key.offset; We can see following value in trace: btrfs_remove_block_group: pid=2677 comm=btrfs-cleaner WARNING: bytes_readonly=12582912, key.offset=134217728 Bug2: space_info->total_bytes_pinned grow to value larger than fs size. In a 1.2G fs, we can get following trace log: at first: ZL_DEBUG: add_pinned_bytes: pid=2710 comm=sync change total_bytes_pinned flags=1 869793792 + 95944704 = 965738496 after some op: ZL_DEBUG: add_pinned_bytes: pid=2770 comm=sync change total_bytes_pinned flags=1 1780178944 + 95944704 = 1876123648 after some op: ZL_DEBUG: add_pinned_bytes: pid=3193 comm=sync change total_bytes_pinned flags=1 2924568576 + 95551488 = 3020120064 ... Reason: Similar to bug1, we also need to adjust space_info->total_bytes_pinned in above code block. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:27:29 -07:00
Zhao Lei	264ca0f60b	btrfs: Adjust commit-transaction condition to avoid NO_SPACE more If we have any chance to make a successful write, we should not give up. This patch adjust commit-transaction condition from: pinned >= wanted to left + pinned >= wanted Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:27:24 -07:00
Zhao Lei	f2ab76188e	btrfs: Fix tail space processing in find_free_dev_extent() It is another reason for NO_SPACE case. When we found enough free space in loop and saved them to max_hole_start/size before, and tail space contains pending extent, origional innocent max_hole_start/size are reset in retry. As a result, find_free_dev_extent() returns less space than it can, and cause NO_SPACE in user program. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:27:18 -07:00
Zhao Lei	94b947b2f3	btrfs: fix condition of commit transaction Old code bypass commit transaction when we don't have enough pinned space, but another case is there exist freed bgs in current transction, it have possibility to make alloc_chunk success. This patch modify the condition to: if (have_free_bg \|\| have_pinned_space) commit_transaction() Confirmed above action by printk before and after patch. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:26:40 -07:00
Patrick McHardy	d07db9884a	netfilter: nf_tables: introduce nft_validate_register_load() Change nft_validate_input_register() to not only validate the input register number, but also the length of the load, and rename it to nft_validate_register_load() to reflect that change. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:50 +02:00
Patrick McHardy	27e6d2017a	netfilter: nf_tables: kill nft_validate_output_register() All users of nft_validate_register_store() first invoke nft_validate_output_register(). There is in fact no use for using it on its own, so simplify the code by folding the functionality into nft_validate_register_store() and kill it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:50 +02:00
Patrick McHardy	58f40ab6e2	netfilter: nft_lookup: use nft_validate_register_store() to validate types In preparation of validating the length of a register store, use nft_validate_register_store() in nft_lookup instead of open coding the validation. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:49 +02:00
Patrick McHardy	1ec10212f9	netfilter: nf_tables: rename nft_validate_data_load() The existing name is ambiguous, data is loaded as well when we read from a register. Rename to nft_validate_register_store() for clarity and consistency with the upcoming patch to introduce its counterpart, nft_validate_register_load(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:49 +02:00
Patrick McHardy	45d9bcda21	netfilter: nf_tables: validate len in nft_validate_data_load() For values spanning multiple registers, we need to validate that enough space is available from the destination register onwards. Add a len argument to nft_validate_data_load() and consolidate the existing length validations in preparation of that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-04-13 16:25:49 +02:00
Chris Mason	de249e66a7	Btrfs: fix uninit variable in clone ioctl Commit 0d97a64e0 creates a new variable but doesn't always set it up. This puts it back to the original method (key.offset + 1) for the cases not covered by Filipe's new logic. Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:03:58 -07:00
Ralf Baechle	3e20a26b02	Merge branch '4.0-fixes' into mips-for-linux-next	2015-04-13 16:03:32 +02:00
Filipe Manana	ccccf3d672	Btrfs: fix inode eviction infinite loop after cloning into it If we attempt to clone a 0 length region into a file we can end up inserting a range in the inode's extent_io tree with a start offset that is greater then the end offset, which triggers immediately the following warning: [ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]() [ 3914.620886] BTRFS: end < start 4095 4096 (...) [ 3914.638093] Call Trace: [ 3914.638636] [<ffffffff81425fd9>] dump_stack+0x4c/0x65 [ 3914.639620] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb [ 3914.640789] [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs] [ 3914.642041] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48 [ 3914.643236] [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs] [ 3914.644441] [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs] [ 3914.645711] [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs] [ 3914.646914] [<ffffffff8142b2fb>] ? _raw_spin_unlock+0x28/0x33 [ 3914.648058] [<ffffffffa03cbac4>] ? test_range_bit+0xcc/0xde [btrfs] [ 3914.650105] [<ffffffffa03cb3c3>] lock_extent+0x13/0x15 [btrfs] [ 3914.651361] [<ffffffffa03db39e>] lock_extent_range+0x3d/0xcd [btrfs] [ 3914.652761] [<ffffffffa03de1fe>] btrfs_ioctl_clone+0x278/0x388 [btrfs] [ 3914.654128] [<ffffffff811226dd>] ? might_fault+0x58/0xb5 [ 3914.655320] [<ffffffffa03e0909>] btrfs_ioctl+0xb51/0x2195 [btrfs] (...) [ 3914.669271] ---[ end trace 14843d3e2e622fc1 ]--- This later makes the inode eviction handler enter an infinite loop that keeps dumping the following warning over and over: [ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]() [ 3915.119913] BTRFS: end < start 4095 4096 (...) [ 3915.137394] Call Trace: [ 3915.137913] [<ffffffff81425fd9>] dump_stack+0x4c/0x65 [ 3915.139154] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb [ 3915.140316] [<ffffffffa03ca44f>] ? insert_state+0x4b/0x10b [btrfs] [ 3915.141505] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48 [ 3915.142709] [<ffffffffa03ca44f>] insert_state+0x4b/0x10b [btrfs] [ 3915.143849] [<ffffffffa03ca729>] __set_extent_bit+0x107/0x3f4 [btrfs] [ 3915.145120] [<ffffffffa038c1e3>] ? btrfs_kill_super+0x17/0x23 [btrfs] [ 3915.146352] [<ffffffff811548f6>] ? deactivate_locked_super+0x3b/0x50 [ 3915.147565] [<ffffffffa03cb256>] lock_extent_bits+0x65/0x1bf [btrfs] [ 3915.148785] [<ffffffff8142b7e2>] ? _raw_write_unlock+0x28/0x33 [ 3915.149931] [<ffffffffa03bc325>] btrfs_evict_inode+0x196/0x482 [btrfs] [ 3915.151154] [<ffffffff81168904>] evict+0xa0/0x148 [ 3915.152094] [<ffffffff811689e5>] dispose_list+0x39/0x43 [ 3915.153081] [<ffffffff81169564>] evict_inodes+0xdc/0xeb [ 3915.154062] [<ffffffff81154418>] generic_shutdown_super+0x49/0xef [ 3915.155193] [<ffffffff811546d1>] kill_anon_super+0x13/0x1e [ 3915.156274] [<ffffffffa038c1e3>] btrfs_kill_super+0x17/0x23 [btrfs] (...) [ 3915.167404] ---[ end trace 14843d3e2e622fc2 ]--- So just bail out of the clone ioctl if the length of the region to clone is zero, without locking any extent range, in order to prevent this issue (same behaviour as a pwrite with a 0 length for example). This is trivial to reproduce. For example, the steps for the test I just made for fstests: mkfs.btrfs -f SCRATCH_DEV mount SCRATCH_DEV $SCRATCH_MNT touch $SCRATCH_MNT/foo touch $SCRATCH_MNT/bar $CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar umount $SCRATCH_MNT A test case for fstests follows soon. CC: <stable@vger.kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:03:28 -07:00
Filipe Manana	113e828386	Btrfs: fix inode eviction infinite loop after extent_same ioctl If we pass a length of 0 to the extent_same ioctl, we end up locking an extent range with a start offset greater then its end offset (if the destination file's offset is greater than zero). This results in a warning from extent_io.c:insert_state through the following call chain: btrfs_extent_same() btrfs_double_lock() lock_extent_range() lock_extent(inode->io_tree, offset, offset + len - 1) lock_extent_bits() __set_extent_bit() insert_state() --> WARN_ON(end < start) This leads to an infinite loop when evicting the inode. This is the same problem that my previous patch titled "Btrfs: fix inode eviction infinite loop after cloning into it" addressed but for the extent_same ioctl instead of the clone ioctl. CC: <stable@vger.kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:03:27 -07:00
Filipe Manana	df858e7672	Btrfs: fix range cloning when same inode used as source and destination While searching for extents to clone we might find one where we only use a part of it coming from its tail. If our destination inode is the same the source inode, we end up removing the tail part of the extent item and insert after a new one that point to the same extent with an adjusted key file offset and data offset. After this we search for the next extent item in the fs/subvol tree with a key that has an offset incremented by one. But this second search leaves us at the new extent item we inserted previously, and since that extent item has a non-zero data offset, it it can make us call btrfs_drop_extents with an empty range (start == end) which causes the following warning: [23978.537119] WARNING: CPU: 6 PID: 16251 at fs/btrfs/file.c:550 btrfs_drop_extent_cache+0x43/0x385 [btrfs]() (...) [23978.557266] Call Trace: [23978.557978] [<ffffffff81425fd9>] dump_stack+0x4c/0x65 [23978.559191] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb [23978.560699] [<ffffffffa047f0ea>] ? btrfs_drop_extent_cache+0x43/0x385 [btrfs] [23978.562389] [<ffffffff8104544d>] warn_slowpath_null+0x1a/0x1c [23978.563613] [<ffffffffa047f0ea>] btrfs_drop_extent_cache+0x43/0x385 [btrfs] [23978.565103] [<ffffffff810e3a18>] ? time_hardirqs_off+0x15/0x28 [23978.566294] [<ffffffff81079ff8>] ? trace_hardirqs_off+0xd/0xf [23978.567438] [<ffffffffa047f73d>] __btrfs_drop_extents+0x6b/0x9e1 [btrfs] [23978.568702] [<ffffffff8107c03f>] ? trace_hardirqs_on+0xd/0xf [23978.569763] [<ffffffff811441c0>] ? ____cache_alloc+0x69/0x2eb [23978.570817] [<ffffffff81142269>] ? virt_to_head_page+0x9/0x36 [23978.571872] [<ffffffff81143c15>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb [23978.573466] [<ffffffff811420d5>] ? kmemleak_alloc_recursive.constprop.52+0x16/0x18 [23978.574962] [<ffffffffa0480d07>] btrfs_drop_extents+0x66/0x7f [btrfs] [23978.576179] [<ffffffffa049aa35>] btrfs_clone+0x516/0xaf5 [btrfs] [23978.577311] [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs] [23978.578520] [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs] [23978.580282] [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs] (...) [23978.591887] ---[ end trace 988ec2a653d03ed3 ]--- Then we attempt to insert a new extent item with a key that already exists, which makes btrfs_insert_empty_item return -EEXIST resulting in abortion of the current transaction: [23978.594355] WARNING: CPU: 6 PID: 16251 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]() (...) [23978.622589] Call Trace: [23978.623181] [<ffffffff81425fd9>] dump_stack+0x4c/0x65 [23978.624359] [<ffffffff81045390>] warn_slowpath_common+0xa1/0xbb [23978.625573] [<ffffffffa044ab6c>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs] [23978.626971] [<ffffffff810453f0>] warn_slowpath_fmt+0x46/0x48 [23978.628003] [<ffffffff8108a6c8>] ? vprintk_default+0x1d/0x1f [23978.629138] [<ffffffffa044ab6c>] __btrfs_abort_transaction+0x52/0x114 [btrfs] [23978.630528] [<ffffffffa049ad1b>] btrfs_clone+0x7fc/0xaf5 [btrfs] [23978.631635] [<ffffffffa04983dc>] ? lock_extent_range+0x7b/0xcd [btrfs] [23978.632886] [<ffffffffa049b2a2>] btrfs_ioctl_clone+0x28e/0x39f [btrfs] [23978.634119] [<ffffffffa049d9ae>] btrfs_ioctl+0xb51/0x219a [btrfs] (...) [23978.647714] ---[ end trace 988ec2a653d03ed4 ]--- This is wrong because we should not process the extent item that we just inserted previously, and instead process the extent item that follows it in the tree For example for the test case I wrote for fstests: bs=$((64 * 1024)) mkfs.btrfs -f -l $bs -O ^no-holes /dev/sdc mount /dev/sdc /mnt xfs_io -f -c "pwrite -S 0xaa $(($bs * 2)) $(($bs * 2))" /mnt/foo $CLONER_PROG -s $((3 * $bs)) -d $((267 * $bs)) -l 0 /mnt/foo /mnt/foo $CLONER_PROG -s $((217 * $bs)) -d $((95 * $bs)) -l 0 /mnt/foo /mnt/foo The second clone call fails with -EEXIST, because when we process the first extent item (offset 262144), we drop part of it (counting from the end) and then insert a new extent item with a key greater then the key we found. The next time we search the tree we search for a key with offset 262144 + 1, which leaves us at the new extent item we have just inserted but we think it refers to an extent that we need to clone. Fix this by ensuring the next search key uses an offset corresponding to the offset of the key we found previously plus the data length of the corresponding extent item. This ensures we skip new extent items that we inserted and works for the case of implicit holes too (NO_HOLES feature). A test case for fstests follows soon. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2015-04-13 07:03:26 -07:00
Ralf Baechle	98b0429b7a	Merge branch '4.1-fp' into mips-for-linux-next	2015-04-13 16:01:37 +02:00
Tomas Winkler	ea5505fabd	mei: trace: remove unused TRACE_SYSTEM_STRING fix warning: include/trace/ftrace.h:28:0: note: this is the location of the previous definition ^ In file included from include/trace/define_trace.h:90:0, from drivers/misc/mei/mei-trace.h:76, from drivers/misc/mei/mei-trace.c:21: include/trace/ftrace.h:28:0: warning: "TRACE_SYSTEM_STRING" redefined Cc: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2015-04-13 15:27:19 +02:00
Jani Nikula	9a436ee6c3	drm: make crtc/encoder/connector/plane helper_private a const pointer They're only used to store const pointers anyway. This helps to keep Ville and the compiler happy. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-13 15:00:13 +02:00
Jani Nikula	0d4d936f49	drm/armada: constify struct drm_encoder_helper_funcs pointer Not to be modified. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-13 15:00:13 +02:00
Jani Nikula	16bb079e45	drm/radeon: constify more struct drm_*_helper funcs pointers Some non-const pointers were added since the last constification, fix them. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-13 15:00:12 +02:00
Jani Nikula	1b54bdb8cc	drm/edid: add #defines for ELD versions Add ELD versions according to HDA Specification v1.0a. 2 indicates version 2, which supports CEA_Ver 861D or below. Maximum Baseline ELD size of 80 bytes (15 SAD count). 31 indicates an ELD that has been partially populated through implementation specific mean of default programming before an external graphics driver is loaded. Only the field that is called out as "canned" field will be populated, and audio driver should ignore the non "canned" field. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>	2015-04-13 14:59:58 +02:00
Ander Conselvan de Oliveira	08d9bc920d	drm/i915: Allocate connector state together with the connectors Connector states were being allocated in intel_setup_outputs() in loop over all connectors. That meant hot-added connectors would have a NULL state. Since the change to use a struct drm_atomic_state for the legacy modeset, connector states are necessary for the i915 driver to function properly, so that would lead to oopses. Broken by commit `944b0c7657` Author: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Date: Fri Mar 20 16:18:07 2015 +0200 drm/i915: Copy the staged connector config to the legacy atomic state v2: Fix test for intel_connector_init() success in lvds and sdvo (PRTS) Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reported-and-tested-by: Nicolas Kalkhof <nkalkhof@web.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2015-04-13 15:21:21 +03:00
Takashi Iwai	ce4524e5a7	ASoC: Updates for v4.1 More updates for v4.1, pretty much all drivers: - Lots of cleanups from Lars, mainly moving things from the CODEC level to the card level. - Continuing improvements to rcar from Morimoto-san, pcm512x from Howard and Peter, the Intel platforms from Vinod, Jie, Jin and Han, and to rt5670 from Bard. - Support for some non-DSP Qualcomm platforms, Google's Storm platform, Maxmim MAX98925 CODECs and the Ingenic JZ4780 SoC. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVK7E7AAoJECTWi3JdVIfQeRwH+wZHxh5yXBwSysga6ITc8GzZ Swy9LCdVr4uDXBrzn6qusfpgyiy6i09aB3tlr02j1gnhaA6tZ52Xi5S5RGds2nQL Qi+Nmt/7Expys09mJrE1Z8ZBRXnSbKw36odNxHiVVPVSfBGEXeQErDmLzsQ3ccqA 8HDC2TGRjjal9ZVW9kNsi5EkR9z8dRlkymAvzlpozs4aLwaOsH/xiF+4xI3zh0xZ +rG8HH9w3/yePVKiKZGjToNgzZ2ATLB5s+JZyFiDn0uXMo3UZWnQItPv8KJE0/FR A7D7XyMN66WSTnWHMIPetrJbgyNP9cM/Wk+prn/PKvObsqYiP7lJikzEAlklfJE= =LjN/ -----END PGP SIGNATURE----- Merge tag 'asoc-v4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v4.1 More updates for v4.1, pretty much all drivers: - Lots of cleanups from Lars, mainly moving things from the CODEC level to the card level. - Continuing improvements to rcar from Morimoto-san, pcm512x from Howard and Peter, the Intel platforms from Vinod, Jie, Jin and Han, and to rt5670 from Bard. - Support for some non-DSP Qualcomm platforms, Google's Storm platform, Maxmim MAX98925 CODECs and the Ingenic JZ4780 SoC.	2015-04-13 14:14:29 +02:00
Alexander Duyck	9e1a27ea42	virtio_ring: Update weak barriers to use dma_wmb/rmb This change makes it so that instead of using smp_wmb/rmb which varies depending on the kernel configuration we can can use dma_wmb/rmb which for most architectures should be equal to or slightly more strict than smp_wmb/rmb. The advantage to this is that these barriers are available to uniprocessor builds as well so the performance should improve under such a configuration. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-04-13 21:04:16 +09:30
Quentin Casasnovas	e5d8f59a5c	modpost: document the use of struct section_check. struct section_check is used as a generic way of describing what relocations are authorized/forbidden when running modpost. This commit tries to describe how each field is used. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (Fixed "mist"ake)	2015-04-13 21:03:04 +09:30
Quentin Casasnovas	52dc0595d5	modpost: handle relocations mismatch in __ex_table. __ex_table is a simple table section where each entry is a pair of addresses - the first address is an address which can fault in kernel space, and the second address points to where the kernel should jump to when handling that fault. This is how copy_from_user() does not crash the kernel if userspace gives a borked pointer for example. If one of these addresses point to a non-executable section, something is seriously wrong since it either means the kernel will never fault from there or it will not be able to jump to there. As both cases are serious enough, we simply error out in these cases so the build fails and the developper has to fix the issue. In case the section is executable, but it isn't referenced in our list of authorized sections to point to from __ex_table, we just dump a warning giving more information about it. We do this in case the new section is executable but isn't supposed to be executed by the kernel. This happened with .altinstr_replacement, which is executable but is only used to copy instructions from - we should never have our instruction pointer pointing in .altinstr_replacement. Admitedly, a proper fix in that case would be to just set .altinstr_replacement NX, but we need to warn about future cases like this. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added long casts)	2015-04-13 21:03:03 +09:30
Quentin Casasnovas	c31e4b832f	scripts: add check_extable.sh script. This shell script can be used to sanity check the __ex_table section on an object file, making sure the relocations in there are pointing to valid executable sections. If it finds some suspicious relocations, it'll use addr2line to try and dump where this is coming from. This works best with CONFIG_DEBUG_INFO. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-04-13 21:03:02 +09:30
Quentin Casasnovas	c7a65e0645	modpost: mismatch_handler: retrieve tosym information only when needed. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-04-13 21:03:01 +09:30
Quentin Casasnovas	356ad53812	modpost: factorize symbol pretty print in get_pretty_name(). Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-04-13 21:03:00 +09:30
Quentin Casasnovas	644e8f14cb	modpost: add handler function pointer to sectioncheck. This will be useful when we want to have special handlers which need to go through more hops to print useful information to the user. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-04-13 21:02:59 +09:30
Quentin Casasnovas	157d1972d0	modpost: add .sched.text and .kprobes.text to the TEXT_SECTIONS list. sched.text and .kprobes.text should behave exactly like .text with regards to how we should warn about referencing sections which might get discarded at runtime. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-04-13 21:02:58 +09:30
Quentin Casasnovas	050e57fd59	modpost: add strict white-listing when referencing sections. Prints a warning when a section references a section outside a strict white-list. This will be useful to print a warning if __ex_table references a non-executable section. Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2015-04-13 21:02:57 +09:30
Jo-Philipp Wich	f2aa111041	ALSA: hda/realtek - Enable the ALC292 dock fixup on the Thinkpad T450 The Lenovo Thinkpad T450 requires the ALC292_FIXUP_TPT440_DOCK as well in order to get working sound output on the docking stations headphone jack. Patch tested on a Thinkpad T450 (20BVCTO1WW) using kernel 4.0-rc7 in conjunction with a ThinkPad Ultradock. Signed-off-by: Jo-Philipp Wich <jow@openwrt.org> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2015-04-13 13:23:40 +02:00

... 56 57 58 59 60 ...

520225 Commits