linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-25 20:32:22 +00:00

Author	SHA1	Message	Date
Srikar Dronamraju	a4739eca44	sched/numa: Stop multiple tasks from moving to the CPU at the same time Task migration under NUMA balancing can happen in parallel. More than one task might choose to migrate to the same CPU at the same time. This can result in: - During task swap, choosing a task that was not part of the evaluation. - During task swap, task which just got moved into its preferred node, moving to a completely different node. - During task swap, task failing to move to the preferred node, will have to wait an extra interval for the next migrate opportunity. - During task movement, multiple task movements can cause load imbalance. This problem is more likely if there are more cores per node or more nodes in the system. Use a per run-queue variable to check if NUMA-balance is active on the run-queue. Specjbb2005 results (8 warehouses) Higher bops are better 2 Socket - 2 Node Haswell - X86 JVMS Prev Current %Change 4 200194 203353 1.57797 1 311331 328205 5.41995 2 Socket - 4 Node Power8 - PowerNV JVMS Prev Current %Change 1 197654 214384 8.46429 2 Socket - 2 Node Power9 - PowerNV JVMS Prev Current %Change 4 192605 188553 -2.10379 1 213402 196273 -8.02664 4 Socket - 4 Node Power7 - PowerVM JVMS Prev Current %Change 8 52227.1 57581.2 10.2516 1 102529 103468 0.915838 There is a regression on power 9 box. If we look at the details, that box has a sudden jump in cache-misses with this patch. All other parameters seem to be pointing towards NUMA consolidation. perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86 Event Before After cs 13,345,784 13,941,377 migrations 1,127,820 1,157,323 faults 374,736 382,175 cache-misses 55,132,054,603 54,993,823,500 sched:sched_move_numa 1,923 2,005 sched:sched_stick_numa 52 14 sched:sched_swap_numa 595 529 migrate:mm_migrate_pages 1,932 1,573 vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Haswell - X86 Event Before After numa_hint_faults 60605 67099 numa_hint_faults_local 51804 58456 numa_hit 239945 240416 numa_huge_pte_updates 14 18 numa_interleave 60 65 numa_local 239865 240339 numa_other 80 77 numa_pages_migrated 1931 1574 numa_pte_updates 67823 77182 perf stats 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86 Event Before After cs 3,016,467 3,176,453 migrations 37,326 30,238 faults 115,342 87,869 cache-misses 11,692,155,554 12,544,479,391 sched:sched_move_numa 965 23 sched:sched_stick_numa 8 0 sched:sched_swap_numa 35 6 migrate:mm_migrate_pages 1,168 10 vmstat 8th warehouse Single JVM 2 Socket - 2 Node Haswell - X86 Event Before After numa_hint_faults 16286 236 numa_hint_faults_local 11863 201 numa_hit 112482 72293 numa_huge_pte_updates 33 0 numa_interleave 20 26 numa_local 112419 72233 numa_other 63 60 numa_pages_migrated 1144 8 numa_pte_updates 32859 0 perf stats 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV Event Before After cs 8,629,724 8,478,820 migrations 221,052 171,323 faults 308,661 307,499 cache-misses 135,574,913 240,353,599 sched:sched_move_numa 147 214 sched:sched_stick_numa 0 0 sched:sched_swap_numa 2 4 migrate:mm_migrate_pages 64 89 vmstat 8th warehouse Multi JVM 2 Socket - 2 Node Power9 - PowerNV Event Before After numa_hint_faults 11481 5301 numa_hint_faults_local 10968 4745 numa_hit 89773 92943 numa_huge_pte_updates 0 0 numa_interleave 1116 899 numa_local 89220 92345 numa_other 553 598 numa_pages_migrated 62 88 numa_pte_updates 11694 5505 perf stats 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV Event Before After cs 2,272,887 2,066,172 migrations 12,206 11,076 faults 163,704 149,544 cache-misses 4,801,186 10,398,067 sched:sched_move_numa 44 43 sched:sched_stick_numa 0 0 sched:sched_swap_numa 0 0 migrate:mm_migrate_pages 17 6 vmstat 8th warehouse Single JVM 2 Socket - 2 Node Power9 - PowerNV Event Before After numa_hint_faults 2261 3552 numa_hint_faults_local 1993 3347 numa_hit 25726 25611 numa_huge_pte_updates 0 0 numa_interleave 239 213 numa_local 25498 25583 numa_other 228 28 numa_pages_migrated 17 6 numa_pte_updates 2266 3535 perf stats 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM Event Before After cs 117,980,962 99,358,136 migrations 3,950,220 4,041,607 faults 736,979 749,653 cache-misses 224,976,072,879 225,562,543,251 sched:sched_move_numa 504 771 sched:sched_stick_numa 50 14 sched:sched_swap_numa 239 204 migrate:mm_migrate_pages 1,260 1,180 vmstat 8th warehouse Multi JVM 4 Socket - 4 Node Power7 - PowerVM Event Before After numa_hint_faults 18293 27409 numa_hint_faults_local 11969 20677 numa_hit 240854 239988 numa_huge_pte_updates 0 0 numa_interleave 0 0 numa_local 240851 239983 numa_other 3 5 numa_pages_migrated 1190 1016 numa_pte_updates 18106 27916 perf stats 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM Event Before After cs 61,053,158 60,899,307 migrations 551,586 544,668 faults 244,174 270,834 cache-misses 74,326,766,973 74,543,455,635 sched:sched_move_numa 344 735 sched:sched_stick_numa 24 25 sched:sched_swap_numa 140 174 migrate:mm_migrate_pages 568 816 vmstat 8th warehouse Single JVM 4 Socket - 4 Node Power7 - PowerVM Event Before After numa_hint_faults 6461 11059 numa_hint_faults_local 2283 4733 numa_hit 35661 41384 numa_huge_pte_updates 0 0 numa_interleave 0 0 numa_local 35661 41383 numa_other 0 1 numa_pages_migrated 568 815 numa_pte_updates 6518 11323 Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Jirka Hladky <jhladky@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1537552141-27815-2-git-send-email-srikar@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:42:20 +02:00
Natarajan, Janakarajan	d7cbbe49a9	perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events In Family 17h, some L3 Cache Performance events require the ThreadMask and SliceMask to be set. For other events, these fields do not affect the count either way. Set ThreadMask and SliceMask to 0xFF and 0xF respectively. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H . Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/Message-ID: Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:38:04 +02:00
Kan Liang	9d92cfeaf5	perf/x86/intel/uncore: Fix PCI BDF address of M3UPI on SKX The counters on M3UPI Link 0 and Link 3 don't count properly, and writing 0 to these counters may causes system crash on some machines. The PCI BDF addresses of the M3UPI in the current code are incorrect. The correct addresses should be: D18:F1 0x204D D18:F2 0x204E D18:F5 0x204D Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: `cd34cd97b7` ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1537538826-55489-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:38:02 +02:00
Jiri Olsa	cd6fb677ce	perf/ring_buffer: Prevent concurent ring buffer access Some of the scheduling tracepoints allow the perf_tp_event code to write to ring buffer under different cpu than the code is running on. This results in corrupted ring buffer data demonstrated in following perf commands: # perf record -e 'sched:sched_switch,sched:sched_wakeup' perf bench sched messaging # Running 'sched/messaging' benchmark: # 20 sender and receiver processes per group # 10 groups == 400 processes run Total time: 0.383 [sec] [ perf record: Woken up 8 times to write data ] 0x42b890 [0]: failed to process type: -1765585640 [ perf record: Captured and wrote 4.825 MB perf.data (29669 samples) ] # perf report --stdio 0x42b890 [0]: failed to process type: -1765585640 The reason for the corruption are some of the scheduling tracepoints, that have __perf_task dfined and thus allow to store data to another cpu ring buffer: sched_waking sched_wakeup sched_wakeup_new sched_stat_wait sched_stat_sleep sched_stat_iowait sched_stat_blocked The perf_tp_event function first store samples for current cpu related events defined for tracepoint: hlist_for_each_entry_rcu(event, head, hlist_entry) perf_swevent_event(event, count, &data, regs); And then iterates events of the 'task' and store the sample for any task's event that passes tracepoint checks: ctx = rcu_dereference(task->perf_event_ctxp[perf_sw_context]); list_for_each_entry_rcu(event, &ctx->event_list, event_entry) { if (event->attr.type != PERF_TYPE_TRACEPOINT) continue; if (event->attr.config != entry->type) continue; perf_swevent_event(event, count, &data, regs); } Above code can race with same code running on another cpu, ending up with 2 cpus trying to store under the same ring buffer, which is specifically not allowed. This patch prevents the problem, by allowing only events with the same current cpu to receive the event. NOTE: this requires the use of (per-task-)per-cpu buffers for this feature to work; perf-record does this. Signed-off-by: Jiri Olsa <jolsa@kernel.org> [peterz: small edits to Changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Vagin <avagin@openvz.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: `e6dab5ffab` ("perf/trace: Add ability to set a target task for events") Link: http://lkml.kernel.org/r/20180923161343.GB15054@krava Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:37:59 +02:00
Masayoshi Mizuma	6265adb972	perf/x86/intel/uncore: Use boot_cpu_data.phys_proc_id instead of hardcorded physical package ID 0 Physical package id 0 doesn't always exist, we should use boot_cpu_data.phys_proc_id here. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masayoshi Mizuma <msys.mizuma@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20180910144750.6782-1-msys.mizuma@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:37:58 +02:00
Peter Zijlstra	a9f9772114	perf/core: Fix perf_pmu_unregister() locking When we unregister a PMU, we fail to serialize the @pmu_idr properly. Fix that by doing the entire thing under pmu_lock. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: `2e80a82a49` ("perf: Dynamic pmu types") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-10-02 09:37:56 +02:00
Andy Lutomirski	7c03e7035a	selftests/x86: Add clock_gettime() tests to test_vdso Now that the vDSO implementation of clock_gettime() is getting reworked, add a selftest for it. This tests that its output is consistent with the syscall version. This is marked for stable to serve as a test for commit `715bd9d12f` ("x86/vdso: Fix asm constraints on vDSO syscall fallbacks") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/082399674de2619b2befd8c0dde49b260605b126.1538422295.git.luto@kernel.org	2018-10-02 08:28:32 +02:00
Heiner Kallweit	ad5f97faff	r8169: fix network stalls due to missing bit TXCFG_AUTO_FIFO Some of the chip-specific hw_start functions set bit TXCFG_AUTO_FIFO in register TxConfig. The original patch changed the order of some calls resulting in these changes being overwritten by rtl_set_tx_config_registers() in rtl_hw_start(). This eventually resulted in network stalls especially under high load. Analyzing the chip-specific hw_start functions all chip version from 34, with the exception of version 39, need this bit set. This patch moves setting this bit to rtl_set_tx_config_registers(). Fixes: `4fd48c4ac0` ("r8169: move common initializations to tp->hw_start") Reported-by: Ortwin Glück <odi@odi.ch> Reported-by: David Arendt <admin@prnet.org> Root-caused-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Tested-by: Tony Atkinson <tatkinson@linux.com> Tested-by: David Arendt <admin@prnet.org> Tested-by: Ortwin Glück <odi@odi.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:28:21 -07:00
Andy Lutomirski	715bd9d12f	x86/vdso: Fix asm constraints on vDSO syscall fallbacks The syscall fallbacks in the vDSO have incorrect asm constraints. They are not marked as writing to their outputs -- instead, they are marked as clobbering "memory", which is useless. In particular, gcc is smart enough to know that the timespec parameter hasn't escaped, so a memory clobber doesn't clobber it. And passing a pointer as an asm input does not tell gcc that the pointed-to value is changed. Add in the fact that the asm instructions weren't volatile, and gcc was free to omit them entirely unless their sole output (the return value) is used. Which it is (phew!), but that stops happening with some upcoming patches. As a trivial example, the following code: void test_fallback(struct timespec *ts) { vdso_fallback_gettime(CLOCK_MONOTONIC, ts); } compiles to: 00000000000000c0 <test_fallback>: c0: c3 retq To add insult to injury, the RCX and R11 clobbers on 64-bit builds were missing. The "memory" clobber is also unnecessary -- no ordering with respect to other memory operations is needed, but that's going to be fixed in a separate not-for-stable patch. Fixes: `2aae950b21` ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu") Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.1538422295.git.luto@kernel.org	2018-10-02 08:28:15 +02:00
David S. Miller	2547496eb1	Merge branch 'tun-races' Eric Dumazet says: ==================== tun: address two syzbot reports Small changes addressing races discovered by syzbot. First patch is a cleanup. Second patch moves a mutex init sooner. Third patch makes sure each tfile gets its own napi enable flags. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:27:28 -07:00
Eric Dumazet	af3fb24eec	tun: napi flags belong to tfile Since tun->flags might be shared by multiple tfile structures, it is better to make sure tun_get_user() is using the flags for the current tfile. Presence of the READ_ONCE() in tun_napi_frags_enabled() gave a hint of what could happen, but we need something stronger to please syzbot. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 13647 Comm: syz-executor5 Not tainted 4.19.0-rc5+ #59 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427 Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4 RSP: 0018:ffff8801c400f410 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325 RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0 RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000 R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358 R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004 FS: 00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: napi_gro_frags+0x3f4/0xc90 net/core/dev.c:5715 tun_get_user+0x31d5/0x42a0 drivers/net/tun.c:1922 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1967 call_write_iter include/linux/fs.h:1808 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6b8/0x9f0 fs/read_write.c:487 vfs_write+0x1fc/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457579 Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fe003614c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457579 RDX: 0000000000000012 RSI: 0000000020000000 RDI: 000000000000000a RBP: 000000000072c040 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0036156d4 R13: 00000000004c5574 R14: 00000000004d8e98 R15: 00000000ffffffff Modules linked in: RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427 Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4 RSP: 0018:ffff8801c400f410 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325 RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0 RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000 R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358 R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004 FS: 00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: `90e33d4594` ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:27:28 -07:00
Eric Dumazet	c7256f579f	tun: initialize napi_mutex unconditionally This is the first part to fix following syzbot report : console output: https://syzkaller.appspot.com/x/log.txt?x=145378e6400000 kernel config: https://syzkaller.appspot.com/x/.config?x=443816db871edd66 dashboard link: https://syzkaller.appspot.com/bug?extid=e662df0ac1d753b57e80 Following patch is fixing the race condition, but it seems safer to initialize this mutex at tfile creation anyway. Fixes: `90e33d4594` ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+e662df0ac1d753b57e80@syzkaller.appspotmail.com Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:27:28 -07:00
Eric Dumazet	06e55addd3	tun: remove unused parameters tun_napi_disable() and tun_napi_del() do not need a pointer to the tun_struct Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:27:28 -07:00
Dave Jones	6fe9487892	bond: take rcu lock in netpoll_send_skb_on_dev The bonding driver lacks the rcu lock when it calls down into netdev_lower_get_next_private_rcu from bond_poll_controller, which results in a trace like: WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40 CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1 Workqueue: bond0 bond_mii_monitor RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40 Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8> RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0 RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000 R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980 R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff FS: 0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0 Call Trace: bond_poll_controller+0x52/0x170 netpoll_poll_dev+0x79/0x290 netpoll_send_skb_on_dev+0x158/0x2c0 netpoll_send_udp+0x2d5/0x430 write_ext_msg+0x1e0/0x210 console_unlock+0x3c4/0x630 vprintk_emit+0xfa/0x2f0 printk+0x52/0x6e ? __netdev_printk+0x12b/0x220 netdev_info+0x64/0x80 ? bond_3ad_set_carrier+0xe9/0x180 bond_select_active_slave+0x1fc/0x310 bond_mii_monitor+0x709/0x9b0 process_one_work+0x221/0x5e0 worker_thread+0x4f/0x3b0 kthread+0x100/0x140 ? process_one_work+0x5e0/0x5e0 ? kthread_delayed_work_timer_fn+0x90/0x90 ret_from_fork+0x24/0x30 We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev before we call down into netpoll_poll_dev, so just take the lock there. Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:25:25 -07:00
David Ahern	893626d6a3	rtnetlink: Fail dump if target netnsid is invalid Link dumps can return results from a target namespace. If the namespace id is invalid, then the dump request should fail if get_target_net fails rather than continuing with a dump of the current namespace. Fixes: `79e1ad148c` ("rtnetlink: use netnsid to query interface") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:22:04 -07:00
Flavio Leitner	7f6d6558ae	Revert "openvswitch: Fix template leak in error cases." This reverts commit `90c7afc96c`. When the commit was merged, the code used nf_ct_put() to free the entry, but later on commit `76644232e6` ("openvswitch: Free tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which is a more appropriate. Now the original problem is removed. Then `44d6e2f273` ("net: Replace NF_CT_ASSERT() with WARN_ON().") replaced a debug assert with a WARN_ON() which is trigged now. Signed-off-by: Flavio Leitner <fbl@redhat.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 23:20:13 -07:00
David S. Miller	92d7c74b6f	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2018-09-27 Here's one more Bluetooth fix for 4.19, fixing the handling of an attempt to unpair a device while pairing is in progress. Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 22:40:39 -07:00
LUU Duc Canh	d949cfedbc	tipc: ignore STATE_MSG on wrong link session The initial session number when a link is created is based on a random value, taken from struct tipc_net->random. It is then incremented for each link reset to avoid mixing protocol messages from different link sessions. However, when a bearer is reset all its links are deleted, and will later be re-created using the same random value as the first time. This means that if the link never went down between creation and deletion we will still sometimes have two subsequent sessions with the same session number. In virtual environments with potentially long transmission times this has turned out to be a real problem. We now fix this by randomizing the session number each time a link is created. With a session number size of 16 bits this gives a risk of session collision of 1/64k. To reduce this further, we also introduce a sanity check on the very first STATE message arriving at a link. If this has an acknowledge value differing from 0, which is logically impossible, we ignore the message. The final risk for session collision is hence reduced to 1/4G, which should be sufficient. Signed-off-by: LUU Duc Canh <canh.d.luu@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 22:35:30 -07:00
Dan Carpenter	aeadd93f2b	net: sched: act_ipt: check for underflow in __tcf_ipt_init() If "td->u.target_size" is larger than sizeof(struct xt_entry_target) we return -EINVAL. But we don't check whether it's smaller than sizeof(struct xt_entry_target) and that could lead to an out of bounds read. Fixes: `7ba699c604` ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 22:34:14 -07:00
David S. Miller	ee0b6f4834	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-10-01 1) Validate address prefix lengths in the xfrm selector, otherwise we may hit undefined behaviour in the address matching functions if the prefix is too big for the given address family. 2) Fix skb leak on local message size errors. From Thadeu Lima de Souza Cascardo. 3) We currently reset the transport header back to the network header after a transport mode transformation is applied. This leads to an incorrect transport header when multiple transport mode transformations are applied. Reset the transport header only after all transformations are already applied to fix this. From Sowmini Varadhan. 4) We only support one offloaded xfrm, so reset crypto_done after the first transformation in xfrm_input(). Otherwise we may call the wrong input method for subsequent transformations. From Sowmini Varadhan. 5) Fix NULL pointer dereference when skb_dst_force clears the dst_entry. skb_dst_force does not really force a dst refcount anymore, it might clear it instead. xfrm code did not expect this, add a check to not dereference skb_dst() if it was cleared by skb_dst_force. 6) Validate xfrm template mode, otherwise we can get a stack-out-of-bounds read in xfrm_state_find. From Sean Tranchetti. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 22:29:25 -07:00
Chunfeng Yun	555df5820e	usb: xhci-mtk: resume USB3 roothub first Give USB3 devices a better chance to enumerate at USB3 speeds if they are connected to a suspended host. Porting from "671ffdff5b13 xhci: resume USB 3 roothub first" Cc: <stable@vger.kernel.org> Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-10-01 17:59:02 -07:00
Mathias Nyman	ffe84e01bb	xhci: Add missing CAS workaround for Intel Sunrise Point xHCI The workaround for missing CAS bit is also needed for xHC on Intel sunrisepoint PCH. For more details see: Intel 100/c230 series PCH specification update Doc #332692-006 Errata #8 Cc: <stable@vger.kernel.org> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-10-01 17:59:02 -07:00
Romain Izard	f2924d4b16	usb: cdc_acm: Do not leak URB buffers When the ACM TTY port is disconnected, the URBs it uses must be killed, and then the buffers must be freed. Unfortunately a previous refactor removed the code freeing the buffers because it looked extremely similar to the code killing the URBs. As a result, there were many new leaks for each plug/unplug cycle of a CDC-ACM device, that were detected by kmemleak. Restore the missing code, and the memory leak is removed. Fixes: `ba8c931ded` ("cdc-acm: refactor killing urbs") Signed-off-by: Romain Izard <romain.izard.pro@gmail.com> Acked-by: Oliver Neukum <oneukum@suse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-10-01 17:59:02 -07:00
Greg Kroah-Hartman	dcb44ac060	USB-serial fixes for v4.19-rc7 Here are some device-id patches for 4.19-rc7. Some Quectel modems have a vendor command which can be used to disable certain interfaces in their configurations, but unlike some other modems this also causes the interface numbers to change. These patches allow us to support all such interface permutations at least for the Quectel EP06. All have been in linux-next with no reported issues. Signed-off-by: Johan Hovold <johan@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEHszNKQClByu0A+9RQQ3kT97htJUFAluw954RHGpvaGFuQGtl cm5lbC5vcmcACgkQQQ3kT97htJVNxRAAgkWqHZ1KyAaIiG1Y5O5aTDOYq+y4qWoH 1zBdJXw5c3ziRsIOGg6eXuWIc7NrP7D+B2sggMY3oya5BwHJWL4z+7HsDgnHGtH5 t4lKgnHqxvQLNmo2b4iqksQB8yj1M5EOdBnHhJpVp6SIgZWJobbrQ/rjUNNGCwSQ PTB+Nx3dK1MrW1SJRdyEdzxFw0XiPxvNMbN/EKqVZRsj/mmL01SaBtXkuwqPApsa zt9GC7jLs9F5J3iY44xvZIZGfcLIa0Bf4AvgtVGpdhivlT0rdMvUr69q2xG4YOx5 Hiazv+kxF5YXX73dzj8uD+yQdHhJ6VkPKNk+JedVS3k0IrlsoIAVbf4WXTjzQkWa I65bOLy/tQJaDjqK3WNfAMa6ncRo+ZY/+v4aiPIIl+/xHCWuLeX8Auc9MGYoXCPI Ks9ZoHPjzD0u5+UaqpNa34Ip5YWaLC19/KffepkfjTStiMQ0lESpWDY3BkavmEcN 0UvyZdzcZImOXvJO6IfWv0Zkdg5z80XFYi+rw85lSnVdAP0YOmYEEpty+6uw7EUf 4PZl4y6lAXOv0fQasFLMYznqk19cHCyJN9dc43uwIR36RVKSTPYCka8Bgek5Jlrm e4jPLltMgBe7JGUzz836BF2IB++XHA5z7w34PrxT23H6QPW3EKCsIefjAPbu7L8M 2g42BZkCTH4= =82HD -----END PGP SIGNATURE----- Merge tag 'usb-serial-4.19-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial fixes for v4.19-rc7 Here are some device-id patches for 4.19-rc7. Some Quectel modems have a vendor command which can be used to disable certain interfaces in their configurations, but unlike some other modems this also causes the interface numbers to change. These patches allow us to support all such interface permutations at least for the Quectel EP06. All have been in linux-next with no reported issues. Signed-off-by: Johan Hovold <johan@kernel.org> * tag 'usb-serial-4.19-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: simple: add Motorola Tetra MTP6550 id USB: serial: option: add two-endpoints device-id flag USB: serial: option: improve Quectel EP06 detection	2018-10-01 17:53:29 -07:00
Greg Kroah-Hartman	385afbf8c3	Late arm64 fixes - Fix handling of young contiguous ptes for hugetlb mappings - Fix livelock when taking access faults on contiguous hugetlb mappings - Tighten up register accesses via KVM SET_ONE_REG ioctl()s -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJbslhqAAoJELescNyEwWM0FIoH/2fQYrzEZk+zjcJxIxwZOVn8 L1lpSb4+xa0OPLvHU/TEvPCo2B7J3R9jisqQKcqe0MeOvqRThfIsYOWfcFf5NoX8 K4ysmaVk6treS1IJ9ZK+2g5pSuKpvFNQ0euBdoolCe4wV/ZDTH2dNlovdIvnucV2 ybpwUptTK33tpUAlkadGsFo/O8Qdsu3MhQD4ymDZXNj8N7L9lrIwCX42wDZpvcFd XR2O0/tAOtbz1n7PBmtCehenS0BzU5877MAmQsb9c93qyyZ37cMhS1L1RCPqhXV9 TfX/+nyjkRpt+gaMJTV39JjMTBcbtVVHNe32cC470H5OvgK6SNELcJsIlEeUFbo= =Subb -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Will writes: "Late arm64 fixes - Fix handling of young contiguous ptes for hugetlb mappings - Fix livelock when taking access faults on contiguous hugetlb mappings - Tighten up register accesses via KVM SET_ONE_REG ioctl()s" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: KVM: Sanitize PSTATE.M when being set from userspace arm64: KVM: Tighten guest core register access from userspace arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags arm64: hugetlb: Fix handling of young ptes	2018-10-01 17:24:20 -07:00
Greg Kroah-Hartman	b62e425593	ARM: SoC fixes A handful of fixes that have been coming in the last couple of weeks: - Freescale fixes for on-chip accellerators - A DT fix for stm32 to avoid fallback to non-DMA SPI mode - Fixes for badly specified interrupts on BCM63xx SoCs - Allwinner A64 HDMI was incorrectly specified as fully compatble with R40 - Drive strength fix for SAMA5D2 NAND pins on one board -----BEGIN PGP SIGNATURE----- iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAluxIecPHG9sb2ZAbGl4 b20ubmV0AAoJEIwa5zzehBx3KswP/iJT6PRSv2OiZq5UyUPhAOx9dW+9uQP5qCYO 43hRkEhUQEbHAibjd4jKq7r2jNfOEeoZARyhE89tQc+RxwU7oOxH5Aohbmk1o4TQ bQ8AQHoofdNerwr8LKWAWvXe6Ff74d6NIJEQZ1ampndt7pul6LDJbLGg503tDPKZ fomG/W50id7xA8xexEfZZRXZu9HSRqNk6/wZYycUhsreZZ30nSQwJTJvLiSiTTAh qWleTc0dD3BazQBEf8VJwLSu3UfigXF+dP7p/joElgULhk00fHYrhWdAa8d0F3ib tS0foD/alLVslnjIDh8baEkErfqDvtZlpRCinNob1R56yzmkSxjBqCb6kSt4jCN8 o+rlNnmnJPRH/qj0wdjd9phw5AWyZw1V1lSRvZGPacG6i7ZYb02Sj13u05k8826m hIpnryhrwuO8lKrDUCV4GT/oDpKS7ujskJZFWEUgjXHZA/XDodNXN5Rkuw8LeJmh HJx1Ef5v/RLbdoIl3Ybs1zDdbg9rmxdaqfDs3Ukka9doZGB1wtZh+GbF1v6u6GZi zmrcu3jzhDVek7Lw1ZWUCUBCxmYLbcg2txd6ZtkCV09M/fuSnQuxF/mLqiq03YAL ASy7ejKc5tf8DPnHKlZ7KIR4eMXEhxUFOpKblAQktHvREel2zC5xjOQjQvCTm1hD w5rDtaPt =+/9J -----END PGP SIGNATURE----- Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Olof writes: "ARM: SoC fixes A handful of fixes that have been coming in the last couple of weeks: - Freescale fixes for on-chip accellerators - A DT fix for stm32 to avoid fallback to non-DMA SPI mode - Fixes for badly specified interrupts on BCM63xx SoCs - Allwinner A64 HDMI was incorrectly specified as fully compatble with R40 - Drive strength fix for SAMA5D2 NAND pins on one board" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: stm32: update SPI6 dmas property on stm32mp157c soc: fsl: qe: Fix copy/paste bug in ucc_get_tdm_sync_shift() soc: fsl: qbman: qman: avoid allocating from non existing gen_pool ARM: dts: BCM63xx: Fix incorrect interrupt specifiers MAINTAINERS: update the Annapurna Labs maintainer email ARM: dts: sun8i: drop A64 HDMI PHY fallback compatible from R40 DT ARM: dts: at91: sama5d2_ptc_ek: fix nand pinctrl	2018-10-01 17:23:27 -07:00
Greg Kroah-Hartman	ef0f2584c2	Fixes for v4.19-rc7 - Fix failure-path memory leak in ramoops_init (nixiaoming) -----BEGIN PGP SIGNATURE----- Comment: Kees Cook <kees@outflux.net> iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAluxB3cWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJidvEACZlqemGsSVQ4elXWCW9EPqyVSn lbPg5ONurG/51J6313Ankgn7PrOI7WRd3iAXUWYByoc0DXn/jgz0i/B1+MtSnH4g fMeZ3DMnwmuC4h8/50/xmxbjKj2+vW7tX+978wWbYnvYNC+UXf8CN4J4MwBFmNR3 DGH+oVCx2MITbIYQ3u5FIUgRJl0sD15GtxHg/l5Ff78dtUJVvlnD6ZAa9/rDBCPu DD0IqjJ5BqTmGw98L2tG0I2SDSrC8TGAYdQlZK/k7vHUqCWP6QspCpQQy3x6bh+W QRL6gCNEPZIGB+uAVbueCo80zRrx1NltbkbO4n9zn9ItYxpvOXJwGT4peYYXwabO nn+cWwRAJGITp9BFKIk/V8p05rpRhw8oeOBHIzwylb3C6bqK3ijnkk9mNSmUnLPG FtzRs/cYEMHi7n7+aygm1lNHn98PAWGLmomXLyxIUtsSH1jvFWG+jxLiRih/Oah5 qjSGw2r681vLKsj2tQV4hpFvWaV83t2cO1QOldWSSkVHKJO9+pglEUGz9gfCvBRz bCvA0aPNGmMGTO0faQMB/TG2pMtOK94UU6kBIALxlEeHrtdpoOaQN6g++R7sY3/y SWo8BeGlNMSAtkWtd6Y7xxIF5N4PDDc6iMyjHGM/KCptANVu03BWlfOn32tel9jR 17wiJi6SjqQjHVp7jw== =bzfz -----END PGP SIGNATURE----- Merge tag 'pstore-v4.19-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Kees writes: "Pstore fixes for v4.19-rc7 - Fix failure-path memory leak in ramoops_init (nixiaoming)" * tag 'pstore-v4.19-rc7' of https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Fix failure-path memory leak in ramoops_init	2018-10-01 17:22:36 -07:00
Joel Stanley	242cdad873	lib/xz: Put CRC32_POLY_LE in xz_private.h This fixes a regression introduced by `faa16bc404` ("lib: Use existing define with polynomial"). The cleanup added a dependency on include/linux, which broke the PowerPC boot wrapper/decompresser when KERNEL_XZ is enabled: BOOTCC arch/powerpc/boot/decompress.o In file included from arch/powerpc/boot/../../../lib/decompress_unxz.c:233, from arch/powerpc/boot/decompress.c:42: arch/powerpc/boot/../../../lib/xz/xz_crc32.c:18:10: fatal error: linux/crc32poly.h: No such file or directory #include <linux/crc32poly.h> ^~~~~~~~~~~~~~~~~~~ The powerpc decompresser is a hairy corner of the kernel. Even while building a 64-bit kernel it needs to build a 32-bit binary and therefore avoid including files from include/linux. This allows users of the xz library to avoid including headers from 'include/linux/' while still achieving the cleanup of the magic number. Fixes: `faa16bc404` ("lib: Use existing define with polynomial") Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: kbuild test robot <lkp@intel.com> Suggested-by: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: Joel Stanley <joel@jms.id.au> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-10-02 08:44:59 +10:00
Eric Dumazet	1ad98e9d1b	tcp/dccp: fix lockdep issue when SYN is backlogged In normal SYN processing, packets are handled without listener lock and in RCU protected ingress path. But syzkaller is known to be able to trick us and SYN packets might be processed in process context, after being queued into socket backlog. In commit `06f877d613` ("tcp/dccp: fix other lockdep splats accessing ireq_opt") I made a very stupid fix, that happened to work mostly because of the regular path being RCU protected. Really the thing protecting ireq->ireq_opt is RCU read lock, and the pseudo request refcnt is not relevant. This patch extends what I did in commit `449809a66c` ("tcp/dccp: block BH for SYN processing") by adding an extra rcu_read_{lock\|unlock} pair in the paths that might be taken when processing SYN from socket backlog (thus possibly in process context) Fixes: `06f877d613` ("tcp/dccp: fix other lockdep splats accessing ireq_opt") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 15:42:13 -07:00
David S. Miller	c8424ddd97	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree: 1) Skip ip_sabotage_in() for packet making into the VRF driver, otherwise packets are dropped, from David Ahern. 2) Clang compilation warning uncovering typo in the nft_validate_register_store() call from nft_osf, from Stefan Agner. 3) Double sizeof netlink message length calculations in ctnetlink, from zhong jiang. 4) Missing rb_erase() on batch full in rbtree garbage collector, from Taehee Yoo. 5) Calm down compilation warning in nf_hook(), from Florian Westphal. 6) Missing check for non-null sk in xt_socket before validating netns procedence, from Flavio Leitner. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-10-01 15:41:01 -07:00
Thomas Petazzoni	95375f2ab2	PCI: mvebu: Fix PCI I/O mapping creation sequence Commit `ee1604381a` ("PCI: mvebu: Only remap I/O space if configured") had the side effect that the PCI I/O mapping was created much earlier than before, at a point where the probe() of the driver could still fail. This is for example a problem if one gets an -EPROBE_DEFER at some point during probe(), after pci_ioremap_io() has been called. Indeed, there is currently no function to undo what pci_ioremap_io() did, and switching to pci_remap_iospace() is not an option in pci-mvebu due to the need for special memory attributes on Armada 38x. Reverting `ee1604381a` ("PCI: mvebu: Only remap I/O space if configured") would be a possibility, but it would require also reverting `42342073e3` ("PCI: mvebu: Convert to use pci_host_bridge directly"). So instead, we use an open-coded version of pci_host_probe() that creates the PCI I/O mapping at a point where we are guaranteed not to fail anymore. Fixes: `ee1604381a` ("PCI: mvebu: Only remap I/O space if configured") Reported-by: Jan Kundrát <jan.kundrat@cesnet.cz> Tested-by: Jan Kundrát <jan.kundrat@cesnet.cz> Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>	2018-10-01 15:42:09 -05:00
Jianbo Liu	cee2648762	net/mlx5e: Set vlan masks for all offloaded TC rules In flow steering, if asked to, the hardware matches on the first ethertype which is not vlan. It's possible to set a rule as follows, which is meant to match on untagged packet, but will match on a vlan packet: tc filter add dev eth0 parent ffff: protocol ip flower ... To avoid this for packets with single tag, we set vlan masks to tell hardware to check the tags for every matched packet. Fixes: `095b6cfd69` ('net/mlx5e: Add TC vlan match parsing') Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-01 10:58:00 -07:00
Eran Ben Elisha	11aa5800ed	net/mlx5: E-Switch, Fix out of bound access when setting vport rate The code that deals with eswitch vport bw guarantee was going beyond the eswitch vport array limit, fix that. This was pointed out by the kernel address sanitizer (KASAN). The error from KASAN log: [2018-09-15 15:04:45] BUG: KASAN: slab-out-of-bounds in mlx5_eswitch_set_vport_rate+0x8c1/0xae0 [mlx5_core] Fixes: `c9497c9890` ("net/mlx5: Add support for setting VF min rate") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-01 10:58:00 -07:00
Alaa Hleihel	4d8fcf216c	net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules If the peer device was already unbound, then do not attempt to modify it's resources, otherwise we will crash on dereferencing non-existing device. Fixes: `5c65c564c9` ("net/mlx5e: Support offloading TC NIC hairpin flows") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-10-01 10:58:00 -07:00
Chris Wilson	4ca8ca9fe7	drm/i915: Avoid compiler warning for maybe unused gu_misc_iir /kisskb/src/drivers/gpu/drm/i915/i915_irq.c: warning: 'gu_misc_iir' may be used uninitialized in this function [-Wuninitialized]: => 3120:10 Silence the compiler warning by ensuring that the local variable is initialised and removing the guard that is confusing the older gcc. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Fixes: `df0d28c185` ("drm/i915/icl: GSE interrupt moves from DE_MISC to GU_MISC") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180926104718.17462-1-chris@chris-wilson.co.uk (cherry picked from commit `7a90938332`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2018-10-01 10:19:05 -07:00
Anusha Srivatsa	bda6b1c957	drm/i915: Do not redefine the has_csr parameter. Let us reuse the already defined has_csr check and not redefine it. The main difference is that in effect this will flip .has_csr to 1 (via GEN9_FEATURES which GEN11_FEATURES pulls in). Suggested-by: Imre Deak <imre.deak@intel.com> Cc: Imre Deak <imre.deak@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107382 Reviewed-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1534527210-16841-1-git-send-email-anusha.srivatsa@intel.com (cherry picked from commit `da4468a1aa`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2018-10-01 10:18:41 -07:00
Sean Christopherson	daa07cbc9a	KVM: x86: fix L1TF's MMIO GFN calculation One defense against L1TF in KVM is to always set the upper five bits of the legal physical address in the SPTEs for non-present and reserved SPTEs, e.g. MMIO SPTEs. In the MMIO case, the GFN of the MMIO SPTE may overlap with the upper five bits that are being usurped to defend against L1TF. To preserve the GFN, the bits of the GFN that overlap with the repurposed bits are shifted left into the reserved bits, i.e. the GFN in the SPTE will be split into high and low parts. When retrieving the GFN from the MMIO SPTE, e.g. to check for an MMIO access, get_mmio_spte_gfn() unshifts the affected bits and restores the original GFN for comparison. Unfortunately, get_mmio_spte_gfn() neglects to mask off the reserved bits in the SPTE that were used to store the upper chunk of the GFN. As a result, KVM fails to detect MMIO accesses whose GPA overlaps the repurprosed bits, which in turn causes guest panics and hangs. Fix the bug by generating a mask that covers the lower chunk of the GFN, i.e. the bits that aren't shifted by the L1TF mitigation. The alternative approach would be to explicitly zero the five reserved bits that are used to store the upper chunk of the GFN, but that requires additional run-time computation and makes an already-ugly bit of code even more inscrutable. I considered adding a WARN_ON_ONCE(low_phys_bits-1 <= PAGE_SHIFT) to warn if GENMASK_ULL() generated a nonsensical value, but that seemed silly since that would mean a system that supports VMX has less than 18 bits of physical address space... Reported-by: Sakari Ailus <sakari.ailus@iki.fi> Fixes: d9b47449c1a1 ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs") Cc: Junaid Shahid <junaids@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Reviewed-by: Junaid Shahid <junaids@google.com> Tested-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:41:00 +02:00
Stefan Raspl	fe804cd677	tools/kvm_stat: cut down decimal places in update interval dialog We currently display the default number of decimal places for floats in _show_set_update_interval(), which is quite pointless. Cutting down to a single decimal place. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:40:59 +02:00
Liran Alon	62cf9bd811	KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS L2 IA32_BNDCFGS should be updated with vmcs12->guest_bndcfgs only when VM_ENTRY_LOAD_BNDCFGS is specified in vmcs12->vm_entry_controls. Otherwise, L2 IA32_BNDCFGS should be set to vmcs01->guest_bndcfgs which is L1 IA32_BNDCFGS. Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:40:59 +02:00
Liran Alon	503234b3fd	KVM: x86: Do not use kvm_x86_ops->mpx_supported() directly Commit `a87036add0` ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") introduced kvm_mpx_supported() to return true iff MPX is enabled in the host. However, that commit seems to have missed replacing some calls to kvm_x86_ops->mpx_supported() to kvm_mpx_supported(). Complete original commit by replacing remaining calls to kvm_mpx_supported(). Fixes: `a87036add0` ("KVM: x86: disable MPX if host did not enable MPX XSAVE features") Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:40:57 +02:00
Liran Alon	5f76f6f5ff	KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled Before this commit, KVM exposes MPX VMX controls to L1 guest only based on if KVM and host processor supports MPX virtualization. However, these controls should be exposed to guest only in case guest vCPU supports MPX. Without this change, a L1 guest running with kernel which don't have commit `691bd4340b` ("kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS") asserts in QEMU on the following: qemu-kvm: error: failed to set MSR 0xd90 to 0x0 qemu-kvm: .../qemu-2.10.0/target/i386/kvm.c:1801 kvm_put_msrs: Assertion 'ret == cpu->kvm_msr_buf->nmsrs failed' This is because L1 KVM kvm_init_msr_list() will see that vmx_mpx_supported() (As it only checks MPX VMX controls support) and therefore KVM_GET_MSR_INDEX_LIST IOCTL will include MSR_IA32_BNDCFGS. However, later when L1 will attempt to set this MSR via KVM_SET_MSRS IOCTL, it will fail because !guest_cpuid_has_mpx(vcpu). Therefore, fix the issue by exposing MPX VMX controls to L1 guest only when vCPU supports MPX. Fixes: `36be0b9deb` ("KVM: x86: Add nested virtualization support for MPX") Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-10-01 15:40:57 +02:00
Marc Zyngier	2a3f93459d	arm64: KVM: Sanitize PSTATE.M when being set from userspace Not all execution modes are valid for a guest, and some of them depend on what the HW actually supports. Let's verify that what userspace provides is compatible with both the VM settings and the HW capabilities. Cc: <stable@vger.kernel.org> Fixes: `0d854a60b1` ("arm64: KVM: enable initialization of a 32bit vcpu") Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-10-01 14:38:26 +01:00
Dave Martin	d26c25a9d1	arm64: KVM: Tighten guest core register access from userspace We currently allow userspace to access the core register file in about any possible way, including straddling multiple registers and doing unaligned accesses. This is not the expected use of the ABI, and nobody is actually using it that way. Let's tighten it by explicitly checking the size and alignment for each field of the register file. Cc: <stable@vger.kernel.org> Fixes: `2f4a07c5f9` ("arm64: KVM: guest one-reg interface") Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Dave Martin <Dave.Martin@arm.com> [maz: rewrote Dave's initial patch to be more easily backported] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-10-01 14:38:05 +01:00
Yu Zhao	1db5852945	cfg80211: fix use-after-free in reg_process_hint() reg_process_hint_country_ie() can free regulatory_request and return REG_REQ_ALREADY_SET. We shouldn't use regulatory_request after it's called. KASAN error was observed when this happens. BUG: KASAN: use-after-free in reg_process_hint+0x839/0x8aa [cfg80211] Read of size 4 at addr ffff8800c430d434 by task kworker/1:3/89 <snipped> Workqueue: events reg_todo [cfg80211] Call Trace: dump_stack+0xc1/0x10c ? _atomic_dec_and_lock+0x1ad/0x1ad ? _raw_spin_lock_irqsave+0xa0/0xd2 print_address_description+0x86/0x26f ? reg_process_hint+0x839/0x8aa [cfg80211] kasan_report+0x241/0x29b reg_process_hint+0x839/0x8aa [cfg80211] reg_todo+0x204/0x5b9 [cfg80211] process_one_work+0x55f/0x8d0 ? worker_detach_from_pool+0x1b5/0x1b5 ? _raw_spin_unlock_irq+0x65/0xdd ? _raw_spin_unlock_irqrestore+0xf3/0xf3 worker_thread+0x5dd/0x841 ? kthread_parkme+0x1d/0x1d kthread+0x270/0x285 ? pr_cont_work+0xe3/0xe3 ? rcu_read_unlock_sched_notrace+0xca/0xca ret_from_fork+0x22/0x40 Allocated by task 2718: set_track+0x63/0xfa __kmalloc+0x119/0x1ac regulatory_hint_country_ie+0x38/0x329 [cfg80211] __cfg80211_connect_result+0x854/0xadd [cfg80211] cfg80211_rx_assoc_resp+0x3bc/0x4f0 [cfg80211] smsc95xx v1.0.6 ieee80211_sta_rx_queued_mgmt+0x1803/0x7ed5 [mac80211] ieee80211_iface_work+0x411/0x696 [mac80211] process_one_work+0x55f/0x8d0 worker_thread+0x5dd/0x841 kthread+0x270/0x285 ret_from_fork+0x22/0x40 Freed by task 89: set_track+0x63/0xfa kasan_slab_free+0x6a/0x87 kfree+0xdc/0x470 reg_process_hint+0x31e/0x8aa [cfg80211] reg_todo+0x204/0x5b9 [cfg80211] process_one_work+0x55f/0x8d0 worker_thread+0x5dd/0x841 kthread+0x270/0x285 ret_from_fork+0x22/0x40 <snipped> Signed-off-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2018-10-01 09:14:03 +02:00
Felix Fietkau	211710ca74	mac80211: fix setting IEEE80211_KEY_FLAG_RX_MGMT for AP mode keys key->sta is only valid after ieee80211_key_link, which is called later in this function. Because of that, the IEEE80211_KEY_FLAG_RX_MGMT is never set when management frame protection is enabled. Fixes: `e548c49e6d` ("mac80211: add key flag for management keys") Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2018-10-01 09:13:48 +02:00
Stefan Seyfried	848e616e66	cfg80211: fix wext-compat memory leak cfg80211_wext_giwrate and sinfo.pertid might allocate sinfo.pertid via rdev_get_station(), but never release it. Fix that. Fixes: `8689c051a2` ("cfg80211: dynamically allocate per-tid stats for station info") Signed-off-by: Stefan Seyfried <seife+kernel@b1-systems.com> [johannes: fix error path, use cfg80211_sinfo_release_content(), add Fixes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2018-10-01 09:11:36 +02:00
Marek Szyprowski	1feda5eb77	drm/exynos: Use selected dma_dev default iommu domain instead of a fake one Instead of allocating a fake IOMMU domain for all Exynos DRM components, simply reuse the default IOMMU domain of the already selected DMA device. This allows some design changes in IOMMU framework without breaking IOMMU support in Exynos DRM. Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>	2018-10-01 09:25:31 +09:00
Edgar Cherkasov	08d9db00fe	i2c: i2c-scmi: fix for i2c_smbus_write_block_data The i2c-scmi driver crashes when the SMBus Write Block transaction is executed: WARNING: CPU: 9 PID: 2194 at mm/page_alloc.c:3931 __alloc_pages_slowpath+0x9db/0xec0 Call Trace: ? get_page_from_freelist+0x49d/0x11f0 ? alloc_pages_current+0x6a/0xe0 ? new_slab+0x499/0x690 __alloc_pages_nodemask+0x265/0x280 alloc_pages_current+0x6a/0xe0 kmalloc_order+0x18/0x40 kmalloc_order_trace+0x24/0xb0 ? acpi_ut_allocate_object_desc_dbg+0x62/0x10c __kmalloc+0x203/0x220 acpi_os_allocate_zeroed+0x34/0x36 acpi_ut_copy_eobject_to_iobject+0x266/0x31e acpi_evaluate_object+0x166/0x3b2 acpi_smbus_cmi_access+0x144/0x530 [i2c_scmi] i2c_smbus_xfer+0xda/0x370 i2cdev_ioctl_smbus+0x1bd/0x270 i2cdev_ioctl+0xaa/0x250 do_vfs_ioctl+0xa4/0x600 SyS_ioctl+0x79/0x90 do_syscall_64+0x73/0x130 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ACPI Error: Evaluating _SBW: 4 (20170831/smbus_cmi-185) This problem occurs because the length of ACPI Buffer object is not defined/initialized in the code before a corresponding ACPI method is called. The obvious patch below fixes this issue. Signed-off-by: Edgar Cherkasov <echerkasov@dev.rtsoft.ru> Acked-by: Viktor Krasnov <vkrasnov@dev.rtsoft.ru> Acked-by: Michael Brunner <Michael.Brunner@kontron.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>	2018-10-01 01:04:07 +02:00
Dave Chinner	e55ec4ddbe	xfs: fix error handling in xfs_bmap_extents_to_btree Commit `01239d77b9` ("xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree") attempted to fix a null pointer dreference when a fuzzing corruption of some kind was found. This fix was flawed, resulting in assert failures like: XFS: Assertion failed: ifp->if_broot == NULL, file: fs/xfs/libxfs/xfs_bmap.c, line: 715 ..... Call Trace: xfs_bmap_extents_to_btree+0x6b9/0x7b0 __xfs_bunmapi+0xae7/0xf00 ? xfs_log_reserve+0x1c8/0x290 xfs_reflink_remap_extent+0x20b/0x620 xfs_reflink_remap_blocks+0x7e/0x290 xfs_reflink_remap_range+0x311/0x530 vfs_dedupe_file_range_one+0xd7/0xe0 vfs_dedupe_file_range+0x15b/0x1a0 do_vfs_ioctl+0x267/0x6c0 The problem is that the error handling code now asserts that the inode fork is not in btree format before the error handling code undoes the modifications that put the fork back in extent format. Fix this by moving the assert back to after the xfs_iroot_realloc() call that returns the fork to extent format, and clean up the jump labels to be meaningful. Also, returning ENOSPC when xfs_btree_get_bufl() fails to instantiate the buffer that was allocated (the actual fix in the commit mentioned above) is incorrect. This is a fatal error - only an invalid block address or a filesystem shutdown can result in failing to get a buffer here. Hence change this to EFSCORRUPTED so that the higher layer knows this was a corruption related failure and should not treat it as an ENOSPC error. This should result in a shutdown (via cancelling a dirty transaction) which is necessary as we do not attempt to clean up the (invalid) block that we have already allocated. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2018-10-01 08:11:07 +10:00
Kees Cook	bac6f6cda2	pstore/ram: Fix failure-path memory leak in ramoops_init As reported by nixiaoming, with some minor clarifications: 1) memory leak in ramoops_register_dummy(): dummy_data = kzalloc(sizeof(*dummy_data), GFP_KERNEL); but no kfree() if platform_device_register_data() fails. 2) memory leak in ramoops_init(): Missing platform_device_unregister(dummy) and kfree(dummy_data) if platform_driver_register(&ramoops_driver) fails. I've clarified the purpose of ramoops_register_dummy(), and added a common cleanup routine for all three failure paths to call. Reported-by: nixiaoming <nixiaoming@huawei.com> Cc: stable@vger.kernel.org Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2018-09-30 10:15:41 -07:00

... 2 3 4 5 6 ...

783401 Commits