linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-16 08:02:17 +00:00

Author	SHA1	Message	Date
Steven Rostedt (VMware)	0e684b6578	tracing: Reset parser->buffer to allow multiple "puts" trace_parser_put() simply frees the allocated parser buffer. But it does not reset the pointer that was freed. This means that if trace_parser_put() is called on the same parser more than once, it will corrupt the allocation system. Setting parser->buffer to NULL after free allows it to be called more than once without any ill effect. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:31 -05:00
Steven Rostedt (VMware)	ae98d27afc	ftrace: Have set_graph_functions handle write with RDWR Since reading the set_graph_functions uses seq functions, which sets the file->private_data pointer to a seq_file descriptor. On writes the ftrace_graph_data descriptor is set to file->private_data. But if the file is opened for RDWR, the ftrace_graph_write() will incorrectly use the file->private_data descriptor instead of ((struct seq_file *)file->private_data)->private pointer, and this can crash the kernel. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:23 -05:00
Steven Rostedt (VMware)	d4ad9a1cca	ftrace: Reset fgd->hash in ftrace_graph_write() fgd->hash is saved and then freed, but is never reset to either ftrace_graph_hash nor ftrace_graph_notrace_hash. But if multiple writes are performed, then the freed hash could be accessed again. # cd /sys/kernel/debug/tracing # head -1000 available_filter_functions > /tmp/funcs # cat /tmp/funcs > set_graph_function Causes: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: [...] CPU: 2 PID: 1337 Comm: cat Not tainted 4.10.0-rc2-test-00010-g6b052e9 #32 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 task: ffff880113a12200 task.stack: ffffc90001940000 RIP: 0010:free_ftrace_hash+0x7c/0x160 RSP: 0018:ffffc90001943db0 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 6b6b6b6b6b6b6b6b RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8800ce1e1d40 RBP: ffff8800ce1e1d50 R08: 0000000000000000 R09: 0000000000006400 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8800ce1e1d40 R14: 0000000000004000 R15: 0000000000000001 FS: 00007f9408a07740(0000) GS:ffff88011e500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000aee1f0 CR3: 0000000116bb4000 CR4: 00000000001406e0 Call Trace: ? ftrace_graph_write+0x150/0x190 ? __vfs_write+0x1f6/0x210 ? __audit_syscall_entry+0x17f/0x200 ? rw_verify_area+0xdb/0x210 ? _cond_resched+0x2b/0x50 ? __sb_start_write+0xb4/0x130 ? vfs_write+0x1c8/0x330 ? SyS_write+0x62/0xf0 ? do_syscall_64+0xa3/0x1b0 ? entry_SYSCALL64_slow_path+0x25/0x25 Code: 01 48 85 db 0f 84 92 00 00 00 b8 01 00 00 00 d3 e0 85 c0 7e 3f 83 e8 01 48 8d 6f 10 45 31 e4 4c 8d 34 c5 08 00 00 00 49 8b 45 08 <4a> 8b 34 20 48 85 f6 74 13 48 8b 1e 48 89 ef e8 20 fa ff ff 48 RIP: free_ftrace_hash+0x7c/0x160 RSP: ffffc90001943db0 ---[ end trace 999b48216bf4b393 ]--- Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:59:06 -05:00
Steven Rostedt (VMware)	555fc7813e	ftrace: Replace (void )1 with a meaningful macro name FTRACE_GRAPH_EMPTY When the set_graph_function or set_graph_notrace contains no records, a banner is displayed of either "#### all functions enabled ####" or "#### all functions disabled ####" respectively. To tell the seq operations to do this, (void )1 is passed as a return value. Instead of using a hardcoded meaningless variable, define it as a macro. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:58:48 -05:00
Steven Rostedt (VMware)	2b2c279c81	ftrace: Create a slight optimization on searching the ftrace_hash This is a micro-optimization, but as it has to deal with a fast path of the function tracer, these optimizations can be noticed. The ftrace_lookup_ip() returns true if the given ip is found in the hash. If it's not found or the hash is NULL, it returns false. But there's some cases that a NULL hash is a true, and the ftrace_hash_empty() is tested before calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests that first, that adds a few extra unneeded instructions in those cases. A new static "always_inlined" function is created that does not perform the hash empty test. This most only be used by callers that do the check first anyway, as an empty or NULL hash could cause a crash if a lookup is performed on it. Also add kernel doc for the ftrace_lookup_ip() main function. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:58:22 -05:00
Steven Rostedt (VMware)	2b0cce0e19	tracing: Add ftrace_hash_key() helper function Replace the couple of use cases that has small logic to produce the ftrace function key id with a helper function. No need for duplicate code. Acked-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-02-03 10:58:05 -05:00
Namhyung Kim	b9b0c831be	ftrace: Convert graph filter to use hash tables Use ftrace_hash instead of a static array of a fixed size. This is useful when a graph filter pattern matches to a large number of functions. Now hash lookup is done with preemption disabled to protect from the hash being changed/freed. Link: http://lkml.kernel.org/r/20170120024447.26097-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-20 14:50:58 -05:00
Namhyung Kim	4046bf023b	ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip It will be used when checking graph filter hashes later. Link: http://lkml.kernel.org/r/20170120024447.26097-2-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> [ Moved ftrace_hash dec and functions outside of FUNCTION_GRAPH define ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-20 14:50:21 -05:00
Namhyung Kim	3e278c0dc1	ftrace: Factor out __ftrace_hash_move() The __ftrace_hash_move() is to allocates properly-sized hash and move entries in the src ftrace_hash. It will be used to set function graph filters which has nothing to do with the dyn_ftrace records. Link: http://lkml.kernel.org/r/20170120024447.26097-1-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-20 11:40:07 -05:00
Steven Rostedt (VMware)	068f530b3f	tracing: Add the constant count for branch tracer The unlikely/likely branch profiler now gets called even if the if statement is a constant (always goes in one direction without a compare). Add a value to denote this in the likely/unlikely tracer as well. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-19 08:57:41 -05:00
Steven Rostedt (VMware)	134e6a034c	tracing: Show number of constants profiled in likely profiler Now that constants are traced, it is useful to see the number of constants that are traced in the likely/unlikely profiler in order to know if they should be ignored or not. The likely/unlikely will display a number after the "correct" number if a "constant" count exists. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-19 08:57:14 -05:00
Steven Rostedt (VMware)	d45ae1f704	tracing: Process constants for (un)likely() profiler When running the likely/unlikely profiler, one of the results did not look accurate. It noted that the unlikely() in link_path_walk() was 100% incorrect. When I added a trace_printk() to see what was happening there, it became 80% correct! Looking deeper into what whas happening, I found that gcc split that if statement into two paths. One where the if statement became a constant, the other path a variable. The other path had the if statement always hit (making the unlikely there, always false), but since the #define unlikely() has: #define unlikely() (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0)) Where constants are ignored by the branch profiler, the "constant" path made by the compiler was ignored, even though it was hit 80% of the time. By just passing the constant value to the __branch_check__() function and tracing it out of line (as always correct, as likely/unlikely isn't a factor for constants), then we get back the accurate readings of branches that were optimized by gcc causing part of the execution to become constant. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-17 15:13:05 -05:00
Kenny Yu	6496bb72bf	uprobe: Find last occurrence of ':' when parsing uprobe PATH:OFFSET Previously, `create_trace_uprobe` found the first occurence of the ':' character when parsing `PATH:OFFSET` for a uprobe. However, if the path contains a ':' character, then the function would parse the path incorrectly. Even worse, if the path does not exist, the subsequent call to `kern_path()` would set `ret` to `ENOENT`, leading to very cryptic errno values in user space. The fix is to find the last occurence of ':'. How to repro:: The write fails with "No such file or directory", suggesting incorrectly that the `uprobe_events` file does not exist. $ mkdir testing && cd testing $ cp /bin/bash . $ cp /bin/bash ./bash:with:colon $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this works $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this doesn't -bash: echo: write error: No such file or directory With the patch: $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this still works $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this works now too! $ cat /sys/kernel/debug/tracing/uprobe_events p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x0000000000000006 p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x0000000000000006 Link: http://lkml.kernel.org/r/20170113165834.4081016-1-kennyyu@fb.com Signed-off-by: Kenny Yu <kennyyu@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2017-01-17 12:57:47 -05:00
Linus Torvalds	0c744ea4f7	Linux 4.10-rc2	2017-01-01 14:31:53 -08:00
Linus Torvalds	4759d386d5	Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull DAX updates from Dan Williams: "The completion of Jan's DAX work for 4.10. As I mentioned in the libnvdimm-for-4.10 pull request, these are some final fixes for the DAX dirty-cacheline-tracking invalidation work that was merged through the -mm, ext4, and xfs trees in -rc1. These patches were prepared prior to the merge window, but we waited for 4.10-rc1 to have a stable merge base after all the prerequisites were merged. Quoting Jan on the overall changes in these patches: "So I'd like all these 6 patches to go for rc2. The first three patches fix invalidation of exceptional DAX entries (a bug which is there for a long time) - without these patches data loss can occur on power failure even though user called fsync(2). The other three patches change locking of DAX faults so that ->iomap_begin() is called in a more relaxed locking context and we are safe to start a transaction there for ext4" These have received a build success notification from the kbuild robot, and pass the latest libnvdimm unit tests. There have not been any -next releases since -rc1, so they have not appeared there" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: ext4: Simplify DAX fault path dax: Call ->iomap_begin without entry lock during dax fault dax: Finish fault completely when loading holes dax: Avoid page invalidation races and unnecessary radix tree traversals mm: Invalidate DAX radix tree entries only if appropriate ext2: Return BH_New buffers for zeroed blocks	2017-01-01 12:27:05 -08:00
Linus Torvalds	238d1d0f79	Two small fixes: - A merge error on my part broke the DocBook build. I've requisitioned one of tglx's frozen sharks for appropriate disciplinary action and resolved to be more careful about testing the DocBook stuff as long as it's still around. - Fix an error in unaligned-memory-access.txt -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYZoL/AAoJEI3ONVYwIuV6zCQP/2q0yjKH0qVe+FQqjqBkAcfs m6B/8QXKLEiTfS4sthvIa2iQgN6ZKWTg8DyVndXoGhE5qgZZAuctxxNj2nwpt0XD jjTrfLVhyASXCCCfSDPcujtgrQElNOm814+ZsAo5e4KCJk1V7Ap1NlwJpd6beld5 ac6INH3r8X3Qgm2BNbiMJ3VApubcjAkjJRi75mc1JIZGtIAf8ePzjkKVpGyTsaaM qQQDbx5mXNrYPXFJYHgCOGcVGkWRAqIcDiYscS6XHpmj5DWfz/x6OdicdeB9l7b4 Jw4TWVNTXTWz0THRUD74N1SdHwwetnWnZ3abcaorFIA/yr8tw1OuMmEqGn5VNGX5 Bxk8YYMjjezH4y7+XC7IGhZ1dcrGrdc3AJTqZYIoTOI+HxGJ4RjZsqHghgHy3Qvi YgLumVmXLQQ6gSTE7sRixq/2hpLUlUrR0po+zQcwO/8Ka0JA860qhYcTjNzRdO1s csQIJtybytbtX28lz4eMxadFcwTtMyYMyVTslUsP/HJQefNcpPz97OQ0zV7UTmi8 zpHRBo5GnDJNL0u/R6tNDsDo5XaXzy+uR/HDUz/oBn7QofhtrUsgBX2DHeV6pqr7 guvfVVhQ4I3r7/BGyFaiXf28W9f3ADwE1jbibLjEA2IrLxaClIGLxLS/fGql5MAI FQhqlQhsKFeVTAxMtC9j =W4e5 -----END PGP SIGNATURE----- Merge tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux Pull documentation fixes from Jonathan Corbet: "Two small fixes: - A merge error on my part broke the DocBook build. I've requisitioned one of tglx's frozen sharks for appropriate disciplinary action and resolved to be more careful about testing the DocBook stuff as long as it's still around. - Fix an error in unaligned-memory-access.txt" * tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux: Documentation/unaligned-memory-access.txt: fix incorrect comparison operator docs: Fix build failure	2016-12-30 09:32:26 -08:00
Linus Torvalds	f3de082c12	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a boot failure on some platforms when crypto self test is enabled along with the new acomp interface" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: testmgr - Use heap buffer for acomp test input	2016-12-30 09:29:50 -08:00
Olof Johansson	98473f9f3f	mm/filemap: fix parameters to test_bit() mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte': mm/filemap.c:933:9: error: too few arguments to function 'test_bit' return test_bit(PG_waiters); ^~~~~~~~ Fixes: `b91e1302ad` ('mm: optimize PageWaiters bit use for unlock_page()') Signed-off-by: Olof Johansson <olof@lixom.net> Brown-paper-bag-by: Linus Torvalds <dummy@duh.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-29 14:46:39 -08:00
Linus Torvalds	b91e1302ad	mm: optimize PageWaiters bit use for unlock_page() In commit `6290602709` ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: Nick Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-29 11:03:15 -08:00
Linus Torvalds	2d706e790f	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a hash corruption bug in the marvell driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: marvell - Copy IVDIG before launching partial DMA ahash requests	2016-12-27 17:51:36 -08:00
Linus Torvalds	8f18e4d03e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar. The most important is to not assume the packet is RX just because the destination address matches that of the device. Such an assumption causes problems when an interface is put into loopback mode. 2) If we retry when creating a new tc entry (because we dropped the RTNL mutex in order to load a module, for example) we end up with -EAGAIN and then loop trying to replay the request. But we didn't reset some state when looping back to the top like this, and if another thread meanwhile inserted the same tc entry we were trying to, we re-link it creating an enless loop in the tc chain. Fix from Daniel Borkmann. 3) There are two different WRITE bits in the MDIO address register for the stmmac chip, depending upon the chip variant. Due to a bug we could set them both, fix from Hock Leong Kweh. 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: stmmac: fix incorrect bit set in gmac4 mdio addr register r8169: add support for RTL8168 series add-on card. net: xdp: remove unused bfp_warn_invalid_xdp_buffer() openvswitch: upcall: Fix vlan handling. ipv4: Namespaceify tcp_tw_reuse knob net: korina: Fix NAPI versus resources freeing net, sched: fix soft lockup in tc_classify net/mlx4_en: Fix user prio field in XDP forward tipc: don't send FIN message from connectionless socket ipvlan: fix multicast processing ipvlan: fix various issues in ipvlan_process_multicast()	2016-12-27 16:04:37 -08:00
Cihangir Akturk	36f671be1d	Documentation/unaligned-memory-access.txt: fix incorrect comparison operator In the actual implementation ether_addr_equal function tests for equality to 0 when returning. It seems in commit 0d74c4 it is somehow overlooked to change this operator to reflect the actual function. Signed-off-by: Cihangir Akturk <cakturk@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2016-12-27 13:08:42 -07:00
John Brooks	66115335fb	docs: Fix build failure The 80211.tmpl DocBook file was removed in commit `819bf59376` ("docs-rst: sphinxify 802.11 documentation"), but the 80211.xml target was re-added to the Makefile by commit `7ddedebb03` ("ALSA: doc: ReSTize writing-an-alsa-driver document"), leading to a failure when building the documentation: *** No rule to make target 'Documentation/DocBook/80211.xml', needed by 'Documentation/DocBook/80211.aux.xml'. cc: stable@vger.kernel.org Signed-off-by: John Brooks <john@fastquake.com> Mea-culpa-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2016-12-27 13:05:36 -07:00
Jonathan Corbet	54ab6db090	Linux 4.10-rc1 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJYYGCgAAoJEHm+PkMAQRiGjVMH/R1WKLSCVyU2QboSTZVyBGqU 6E42pMalPNaY72uxf29ZmUzds1uV5KyFn7OntsyD4qc+sQb2wxG5PvMSYAsL7HKN lTFiW738zC9Hfx8MzC/fHLGm/7HTHpPFndZJkDOJjIPnS0MeTHAmOFM+RwCRq+px 5uvRHV4Z8yibHtijET6GqCywV0gw/uyXCi6xJfJNAspnj3hsm3ZXKJ0JPvP2ja+V yhdnWYHDEQwRs6FyNtIWnfjH92XilVn4KcOtwnb1pFahALiTmmVqJVMiGartagqJ fPRw98B3YHwmZpEc2SDbXaZi36WLu4hcWvvDa22SN/srXwYIzzblEwuNq1+fiBw= =X7z+ -----END PGP SIGNATURE----- Merge tag 'v4.10-rc1' into docs-next Linux 4.10-rc1	2016-12-27 12:53:44 -07:00
Kweh, Hock Leong	5799fc9059	net: stmmac: fix incorrect bit set in gmac4 mdio addr register Fixing the gmac4 mdio write access to use MII_GMAC4_WRITE only instead of OR together with MII_WRITE. Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-27 12:28:08 -05:00
Chun-Hao Lin	610c908773	r8169: add support for RTL8168 series add-on card. This chip is the same as RTL8168, but its device id is 0x8161. Signed-off-by: Chun-Hao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-27 12:28:07 -05:00
Jason Wang	be26727772	net: xdp: remove unused bfp_warn_invalid_xdp_buffer() After commit `73b62bd085` ("virtio-net: remove the warning before XDP linearizing"), there's no users for bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for commit `f23bc46c30`. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-27 12:28:07 -05:00
pravin shelar	df30f7408b	openvswitch: upcall: Fix vlan handling. Networking stack accelerate vlan tag handling by keeping topmost vlan header in skb. This works as long as packet remains in OVS datapath. But during OVS upcall vlan header is pushed on to the packet. When such packet is sent back to OVS datapath, core networking stack might not handle it correctly. Following patch avoids this issue by accelerating the vlan tag during flow key extract. This simplifies datapath by bringing uniform packet processing for packets from all code paths. Fixes: `5108bbaddc` ("openvswitch: add processing of L3 packets"). CC: Jarno Rajahalme <jarno@ovn.org> CC: Jiri Benc <jbenc@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-27 12:28:07 -05:00
Haishuang Yan	56ab6b9300	ipv4: Namespaceify tcp_tw_reuse knob Different namespaces might have different requirements to reuse TIME-WAIT sockets for new connections. This might be required in cases where different namespace applications are in place which require TIME_WAIT socket connections to be reduced independently of the host. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-27 12:28:07 -05:00
Laura Abbott	02608e02fb	crypto: testmgr - Use heap buffer for acomp test input Christopher Covington reported a crash on aarch64 on recent Fedora kernels: kernel BUG at ./include/linux/scatterlist.h:140! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc #162 Hardware name: linux,dummy-virt (DT) task: ffff80007c650080 task.stack: ffff800008910000 PC is at sg_init_one+0xa0/0xb8 LR is at sg_init_one+0x24/0xb8 ... [<ffff000008398db8>] sg_init_one+0xa0/0xb8 [<ffff000008350a44>] test_acomp+0x10c/0x438 [<ffff000008350e20>] alg_test_comp+0xb0/0x118 [<ffff00000834f28c>] alg_test+0x17c/0x2f0 [<ffff00000834c6a4>] cryptomgr_test+0x44/0x50 [<ffff0000080dac70>] kthread+0xf8/0x128 [<ffff000008082ec0>] ret_from_fork+0x10/0x50 The test vectors used for input are part of the kernel image. These inputs are passed as a buffer to sg_init_one which eventually blows up with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns false for the kernel image since virt_to_page will not return the correct page. Fix this by copying the input vectors to heap buffer before setting up the scatterlist. Reported-by: Christopher Covington <cov@codeaurora.org> Fixes: `d7db7a882d` ("crypto: acomp - update testmgr with support for acomp") Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2016-12-27 17:32:11 +08:00
Jan Kara	1db175428e	ext4: Simplify DAX fault path Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we can use transaction starting in ext4_iomap_begin() and thus simplify ext4_dax_fault(). It also provides us proper retries in case of ENOSPC. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:25 -08:00
Jan Kara	9f141d6ef6	dax: Call ->iomap_begin without entry lock during dax fault Currently ->iomap_begin() handler is called with entry lock held. If the filesystem held any locks between ->iomap_begin() and ->iomap_end() (such as ext4 which will want to hold transaction open), this would cause lock inversion with the iomap_apply() from standard IO path which first calls ->iomap_begin() and only then calls ->actor() callback which grabs entry locks for DAX (if it faults when copying from/to user provided buffers). Fix the problem by nesting grabbing of entry lock inside ->iomap_begin() - ->iomap_end() pair. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:25 -08:00
Jan Kara	f449b936f1	dax: Finish fault completely when loading holes The only case when we do not finish the page fault completely is when we are loading hole pages into a radix tree. Avoid this special case and finish the fault in that case as well inside the DAX fault handler. It will allow us for easier iomap handling. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:25 -08:00
Jan Kara	e3fce68cdb	dax: Avoid page invalidation races and unnecessary radix tree traversals Currently dax_iomap_rw() takes care of invalidating page tables and evicting hole pages from the radix tree when write(2) to the file happens. This invalidation is only necessary when there is some block allocation resulting from write(2). Furthermore in current place the invalidation is racy wrt page fault instantiating a hole page just after we have invalidated it. So perform the page invalidation inside dax_iomap_actor() where we can do it only when really necessary and after blocks have been allocated so nobody will be instantiating new hole pages anymore. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:24 -08:00
Jan Kara	c6dcf52c23	mm: Invalidate DAX radix tree entries only if appropriate Currently invalidate_inode_pages2_range() and invalidate_mapping_pages() just delete all exceptional radix tree entries they find. For DAX this is not desirable as we track cache dirtiness in these entries and when they are evicted, we may not flush caches although it is necessary. This can for example manifest when we write to the same block both via mmap and via write(2) (to different offsets) and fsync(2) then does not properly flush CPU caches when modification via write(2) was the last one. Create appropriate DAX functions to handle invalidation of DAX entries for invalidate_inode_pages2_range() and invalidate_mapping_pages() and wire them up into the corresponding mm functions. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:24 -08:00
Jan Kara	e568df6b84	ext2: Return BH_New buffers for zeroed blocks So far we did not return BH_New buffers from ext2_get_blocks() when we allocated and zeroed-out a block for DAX inode to avoid racy zeroing in DAX code. This zeroing is gone these days so we can remove the workaround. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-26 20:29:24 -08:00
Thomas Gleixner	0dad3a3014	x86/mce/AMD: Make the init code more robust If mce_device_init() fails then the mce device pointer is NULL and the AMD mce code happily dereferences it. Add a sanity check. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-26 17:30:24 -08:00
Thomas Gleixner	b9d9d6911b	smp/hotplug: Undo tglxs brainfart The attempt to prevent overwriting an active state resulted in a disaster which effectively disables all dynamically allocated hotplug states. Cleanup the mess. Fixes: `dc280d9362` ("cpu/hotplug: Prevent overwriting of callbacks") Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-26 17:30:24 -08:00
Al Viro	b4b8664d29	arm64: don't pull uaccess.h into .S Split asm-only parts of arm64 uaccess.h into a new header and use that from .S. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-26 13:05:17 -05:00
Florian Fainelli	e6afb1ad88	net: korina: Fix NAPI versus resources freeing Commit `beb0babfb7` ("korina: disable napi on close and restart") introduced calls to napi_disable() that were missing before, unfortunately this leaves a small window during which NAPI has a chance to run, yet we just freed resources since korina_free_ring() has been called: Fix this by disabling NAPI first then freeing resource, and make sure that we also cancel the restart task before doing the resource freeing. Fixes: `beb0babfb7` ("korina: disable napi on close and restart") Reported-by: Alexandros C. Couloumbis <alex@ozo.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-26 11:26:16 -05:00
Daniel Borkmann	628185cfdd	net, sched: fix soft lockup in tc_classify Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems to trigger under load in the tc control path. What happens is that in tc_ctl_tfilter(), thread A allocates a new tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() with it. In that classifier callback we had to unlock/lock the rtnl mutex and returned with -EAGAIN. One reason why we need to drop there is, for example, that we need to request an action module to be loaded. This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning after we loaded and found the requested action, we need to redo the whole request so we don't race against others. While we had to unlock rtnl in that time, thread B's request was processed next on that CPU. Thread B added a new tp instance successfully to the classifier chain. When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN and destroying its tp instance which never got linked, we goto replay and redo A's request. This time when walking the classifier chain in tc_ctl_tfilter() for checking for existing tp instances we had a priority match and found the tp instance that was created and linked by thread B. Now calling again into tp->ops->change() with that tp was successful and returned without error. tp_created was never cleared in the second round, thus kernel thinks that we need to link it into the classifier chain (once again). tp and *back point to the same object due to the match we had earlier on. Thus for thread B's already public tp, we reset tp->next to tp itself and link it into the chain, which eventually causes the mentioned endless loop in tc_classify() once a packet hits the data path. Fix is to clear tp_created at the beginning of each request, also when we replay it. On the paths that can cause -EAGAIN we already destroy the original tp instance we had and on replay we really need to start from scratch. It seems that this issue was first introduced in commit `12186be7d2` ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup"). Fixes: `12186be7d2` ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup") Reported-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-26 11:24:10 -05:00
Linus Torvalds	7ce7d89f48	Linux 4.10-rc1	2016-12-25 16:13:08 -08:00
Larry Finger	8ae679c4bc	powerpc: Fix build warning on 32-bit PPC I am getting the following warning when I build kernel 4.9-git on my PowerBook G4 with a 32-bit PPC processor: AS arch/powerpc/kernel/misc_32.o arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef] This problem is evident after commit `989cea5c14` ("kbuild: prevent lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an error that has been in the code since 2005 when this source file was created. That was with commit `9994a33865` ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S"). The offending line does not make a lot of sense. This error does not seem to cause any errors in the executable, thus I am not recommending that it be applied to any stable versions. Thanks to Nicholas Piggin for suggesting this solution. Fixes: `9994a33865` ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-25 16:12:20 -08:00
Linus Torvalds	d33d5a6c88	avoid spurious "may be used uninitialized" warning The timer type simplifications caused a new gcc warning: drivers/base/power/domain.c: In function ‘genpd_runtime_suspend’: drivers/base/power/domain.c:562:14: warning: ‘time_start’ may be used uninitialized in this function [-Wmaybe-uninitialized] elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start)); despite the actual use of "time_start" not having changed in any way. It appears that simply changing the type of ktime_t from a union to a plain scalar type made gcc check the use. The variable wasn't actually used uninitialized, but gcc apparently failed to notice that the conditional around the use was exactly the same as the conditional around the initialization of that variable. Add an unnecessary initialization just to shut up the compiler. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-25 14:56:58 -08:00
Linus Torvalds	3ddc76dfc7	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t	2016-12-25 14:30:04 -08:00
Linus Torvalds	b272f732f8	Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug notifier removal from Thomas Gleixner: "This is the final cleanup of the hotplug notifier infrastructure. The series has been reintgrated in the last two days because there came a new driver using the old infrastructure via the SCSI tree. Summary: - convert the last leftover drivers utilizing notifiers - fixup for a completely broken hotplug user - prevent setup of already used states - removal of the notifiers - treewide cleanup of hotplug state names - consolidation of state space There is a sphinx based documentation pending, but that needs review from the documentation folks" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-xp: Consolidate hotplug state space irqchip/gic: Consolidate hotplug state space coresight/etm3/4x: Consolidate hotplug state space cpu/hotplug: Cleanup state names cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions staging/lustre/libcfs: Convert to hotplug state machine scsi/bnx2i: Convert to hotplug state machine scsi/bnx2fc: Convert to hotplug state machine cpu/hotplug: Prevent overwriting of callbacks x86/msr: Remove bogus cleanup from the error path bus: arm-ccn: Prevent hotplug callback leak perf/x86/intel/cstate: Prevent hotplug callback leak ARM/imx/mmcd: Fix broken cpu hotplug handling scsi: qedi: Convert to hotplug state machine	2016-12-25 14:05:56 -08:00
Linus Torvalds	10bbe7599e	Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown. * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: remove obsolete -M, -m, -C, -c options tools/power turbostat: Make extensible via the --add parameter tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz tools/power turbostat: line up headers when -M is used tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding tools/power turbostat: Support Knights Mill (KNM) tools/power turbostat: Display HWP OOB status tools/power turbostat: fix Denverton BCLK tools/power turbostat: use intel-family.h model strings tools/power/turbostat: Add Denverton RAPL support tools/power/turbostat: Add Denverton support tools/power/turbostat: split core MSR support into status + limit tools/power turbostat: fix error case overflow read of slm_freq_table[] tools/power turbostat: Allocate correct amount of fd and irq entries tools/power turbostat: switch to tab delimited output tools/power turbostat: Gracefully handle ACPI S3 tools/power turbostat: tidy up output on Joule counter overflow	2016-12-25 14:01:28 -08:00
Nicholas Piggin	6290602709	mm: add PageWaiters indicating tasks are waiting for a page bit Add a new page flag, PageWaiters, to indicate the page waitqueue has tasks waiting. This can be tested rather than testing waitqueue_active which requires another cacheline load. This bit is always set when the page has tasks on page_waitqueue(page), and is set and cleared under the waitqueue lock. It may be set when there are no tasks on the waitqueue, which will cause a harmless extra wakeup check that will clears the bit. The generic bit-waitqueue infrastructure is no longer used for pages. Instead, waitqueues are used directly with a custom key type. The generic code was not flexible enough to have PageWaiters manipulation under the waitqueue lock (which simplifies concurrency). This improves the performance of page lock intensive microbenchmarks by 2-3%. Putting two bits in the same word opens the opportunity to remove the memory barrier between clearing the lock bit and testing the waiters bit, after some work on the arch primitives (e.g., ensuring memory operand widths match and cover both bits). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-25 11:54:48 -08:00
Nicholas Piggin	6326fec112	mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked A page is not added to the swap cache without being swap backed, so PageSwapBacked mappings can use PG_owner_priv_1 for PageSwapCache. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-25 11:54:48 -08:00
Thomas Gleixner	1f3a8e49d8	ktime: Get rid of ktime_equal() No point in going through loops and hoops instead of just comparing the values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>	2016-12-25 17:21:23 +01:00

1 2 3 4 5 ...

647891 Commits