Commit Graph

589714 Commits

Author SHA1 Message Date
Jason Gunthorpe
e6bd18f57a IB/security: Restrict use of the write() interface
The drivers/infiniband stack uses write() as a replacement for
bi-directional ioctl().  This is not safe. There are ways to
trigger write calls that result in the return structure that
is normally written to user space being shunted off to user
specified kernel memory instead.

For the immediate repair, detect and deny suspicious accesses to
the write API.

For long term, update the user space libraries and the kernel API
to something that doesn't present the same security vulnerabilities
(likely a structured ioctl() interface).

The impacted uAPI interfaces are generally only available if
hardware from drivers/infiniband is installed in the system.

Reported-by: Jann Horn <jann@thejh.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
[ Expanded check to all known write() entry points ]
Cc: stable@vger.kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:03:16 -04:00
Dean Luick
7723d8c244 IB/hfi1: Use kernel default llseek for ui device
The ui device llseek had a mistake with SEEK_END and did
not fully follow seek semantics.  Correct all this by
using a kernel supplied function for fixed size devices.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:39 -04:00
Mitko Haralanov
94158442eb IB/hfi1: Don't attempt to free resources if initialization failed
Attempting to free resources which have not been allocated and
initialized properly led to the following kernel backtrace:

    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
    PGD 852a43067 PUD 85d4a6067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 0 PID: 2831 Comm: osu_bw Tainted: G          IO 3.12.18-wfr+ #1
    task: ffff88085b15b540 ti: ffff8808588fe000 task.ti: ffff8808588fe000
    RIP: 0010:[<ffffffffa09658fe>]  [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
    RSP: 0018:ffff8808588ffde0  EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff880858a31800 RCX: 0000000000000000
    RDX: ffff88085d971bc0 RSI: ffff880858a318f8 RDI: ffff880858a318c0
    RBP: ffff8808588ffe20 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff88087ffd6f40 R11: 0000000001100348 R12: ffff880852900000
    R13: ffff880858a318c0 R14: 0000000000000000 R15: ffff88085d971be8
    FS:  00007f4674e83740(0000) GS:ffff88087f400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000085c377000 CR4: 00000000001407f0
    Stack:
     ffffffffa0941a71 ffff880858a318f8 ffff88085d971bc0 ffff880858a31800
     ffff880852900000 ffff880858a31800 00000000003ffff7 ffff88085d971bc0
     ffff8808588ffe60 ffffffffa09663fc ffff8808588ffe60 ffff880858a31800
    Call Trace:
     [<ffffffffa0941a71>] ? find_mmu_handler+0x51/0x70 [hfi1]
     [<ffffffffa09663fc>] hfi1_user_exp_rcv_free+0x6c/0x120 [hfi1]
     [<ffffffffa0932809>] hfi1_file_close+0x1a9/0x340 [hfi1]
     [<ffffffff8116c189>] __fput+0xe9/0x270
     [<ffffffff8116c35e>] ____fput+0xe/0x10
     [<ffffffff81065707>] task_work_run+0xa7/0xe0
     [<ffffffff81002969>] do_notify_resume+0x59/0x80
     [<ffffffff814ffc1a>] int_signal+0x12/0x17

This commit re-arranges the context initialization code in a way that
would allow for context event flags to be used to determine whether
the context has been successfully initialized.

In turn, this can be used to skip the resource de-allocation if they
were never allocated in the first place.

Fixes: 3abb33ac65 ("staging/hfi1: Add TID cache receive init and free funcs")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com.
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:39 -04:00
Mike Marciniszyn
b9b06cb6fe IB/hfi1: Fix missing lock/unlock in verbs drain callback
The iowait_sdma_drained() callback lacked locking to
protect the qp s_flags field.

This causes the s_flags to be out of sync
on multiple CPUs, potentially corrupting the s_flags.

Fixes: a545f5308b ("staging/rdma/hfi: fix CQ completion order issue")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:39 -04:00
Jubin John
e6d2e0176e IB/rdmavt: Fix send scheduling
call_send is used to determine whether to send immediately or schedule
a send for later. The current logic in rdmavt is inverted and has a
negative impact on the latency of the hfi1 and qib drivers. Fix this
regression by correctly calling send immediately when call_send is set.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jubin John <jubin.john@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:39 -04:00
Mitko Haralanov
849e3e9398 IB/hfi1: Prevent unpinning of wrong pages
The routine used by the SDMA cache to handle already
cached nodes can extend an already existing node.

In its error handling code, the routine will unpin pages
when not all pages of the buffer extension were pinned.

There was a bug in that part of the routine, which would
mistakenly unpin pages from the original set rather than
the newly pinned pages.

This commit fixes that bug by offsetting the page array
to the proper place pointing at the beginning of the newly
pinned pages.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:38 -04:00
Mitko Haralanov
de82bdff62 IB/hfi1: Fix deadlock caused by locking with wrong scope
The locking around the interval RB tree is designed to prevent
access to the tree while it's being modified. The locking in its
current form is too overzealous, which is causing a deadlock in
certain cases with the following backtrace:

    Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0
    CPU: 0 PID: 5836 Comm: IMB-MPI1 Tainted: G           O 3.12.18-wfr+ #1
     0000000000000000 ffff88087f206c50 ffffffff814f1caa ffffffff817b53f0
     ffff88087f206cc8 ffffffff814ecd56 0000000000000010 ffff88087f206cd8
     ffff88087f206c78 0000000000000000 0000000000000000 0000000000001662
    Call Trace:
     <NMI>  [<ffffffff814f1caa>] dump_stack+0x45/0x56
     [<ffffffff814ecd56>] panic+0xc2/0x1cb
     [<ffffffff810d4370>] ? restart_watchdog_hrtimer+0x50/0x50
     [<ffffffff810d4432>] watchdog_overflow_callback+0xc2/0xd0
     [<ffffffff81109b4e>] __perf_event_overflow+0x8e/0x2b0
     [<ffffffff8110a714>] perf_event_overflow+0x14/0x20
     [<ffffffff8101c906>] intel_pmu_handle_irq+0x1b6/0x390
     [<ffffffff814f927b>] perf_event_nmi_handler+0x2b/0x50
     [<ffffffff814f8ad8>] nmi_handle.isra.3+0x88/0x180
     [<ffffffff814f8d39>] do_nmi+0x169/0x310
     [<ffffffff814f8177>] end_repeat_nmi+0x1e/0x2e
     [<ffffffff81272600>] ? unmap_single+0x30/0x30
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     [<ffffffff814f780d>] ? _raw_spin_lock_irqsave+0x2d/0x40
     <<EOE>>  <IRQ>  [<ffffffffa056c4a8>] hfi1_mmu_rb_search+0x38/0x70 [hfi1]
     [<ffffffffa05919cb>] user_sdma_free_request+0xcb/0x120 [hfi1]
     [<ffffffffa0593393>] user_sdma_txreq_cb+0x263/0x350 [hfi1]
     [<ffffffffa057fad7>] ? sdma_txclean+0x27/0x1c0 [hfi1]
     [<ffffffffa0593130>] ? user_sdma_send_pkts+0x1710/0x1710 [hfi1]
     [<ffffffffa057fdd6>] sdma_make_progress+0x166/0x480 [hfi1]
     [<ffffffff810762c9>] ? ttwu_do_wakeup+0x19/0xd0
     [<ffffffffa0581c7e>] sdma_engine_interrupt+0x8e/0x100 [hfi1]
     [<ffffffffa0546bdd>] sdma_interrupt+0x5d/0xa0 [hfi1]
     [<ffffffff81097e57>] handle_irq_event_percpu+0x47/0x1d0
     [<ffffffff81098017>] handle_irq_event+0x37/0x60
     [<ffffffff8109aa5f>] handle_edge_irq+0x6f/0x120
     [<ffffffff810044af>] handle_irq+0xbf/0x150
     [<ffffffff8104c9b7>] ? irq_enter+0x17/0x80
     [<ffffffff8150168d>] do_IRQ+0x4d/0xc0
     [<ffffffff814f7c6a>] common_interrupt+0x6a/0x6a
     <EOI>  [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
     [<ffffffff814f56c6>] __schedule+0x3b6/0x7e0
     [<ffffffff810763a6>] __cond_resched+0x26/0x30
     [<ffffffff814f5eda>] _cond_resched+0x3a/0x50
     [<ffffffff814f4f82>] down_write+0x12/0x30
     [<ffffffffa0591619>] hfi1_release_user_pages+0x69/0x90 [hfi1]
     [<ffffffffa059173a>] sdma_rb_remove+0x9a/0xc0 [hfi1]
     [<ffffffffa056c00d>] __mmu_rb_remove.isra.5+0x5d/0x70 [hfi1]
     [<ffffffffa056c536>] hfi1_mmu_rb_remove+0x56/0x70 [hfi1]
     [<ffffffffa059427b>] hfi1_user_sdma_process_request+0x74b/0x1160 [hfi1]
     [<ffffffffa055c763>] hfi1_aio_write+0xc3/0x100 [hfi1]
     [<ffffffff8116a14c>] do_sync_readv_writev+0x4c/0x80
     [<ffffffff8116b58b>] do_readv_writev+0xbb/0x230
     [<ffffffff811a9da1>] ? fsnotify+0x241/0x320
     [<ffffffff81073524>] ? finish_task_switch+0x54/0xe0
     [<ffffffff8116b795>] vfs_writev+0x35/0x60
     [<ffffffff8116b8c9>] SyS_writev+0x49/0xc0
     [<ffffffff810cd876>] ? __audit_syscall_exit+0x1f6/0x2a0
     [<ffffffff814ff992>] system_call_fastpath+0x16/0x1b

As evident from the backtrace above, the process was being put to sleep
while holding the lock.

Limiting the scope of the lock only to the RB tree operation fixes the
above error allowing for proper locking and the process being put to
sleep when needed.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:38 -04:00
Mitko Haralanov
f19bd643db IB/hfi1: Prevent NULL pointer deferences in caching code
There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.

The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    15
    task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
    RIP: 0010:[<ffffffffa041f75a>]  [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    RSP: 0000:ffff88085c245878  EFLAGS: 00010002
    RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
    RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
    RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
    R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
    R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
    FS:  0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
    Stack:
     ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
     ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
     0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
    Call Trace:
     [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
     [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
     [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
     [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
     [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
     [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
     [<ffffffff81141fad>] try_to_unmap+0x4d/0x60
     [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
     [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
     [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
     [<ffffffff81121641>] shrink_zone+0x31/0x100
     [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
     [<ffffffff811229e3>] kswapd+0x403/0x7e0
     [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
     [<ffffffff81068ac0>] kthread+0xc0/0xd0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
     [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40

To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 12:00:38 -04:00
Sagi Grimberg
e7d2c25d94 MAINTAINERS: Update iser/isert maintainer contact info
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 11:32:50 -04:00
Sagi Grimberg
986ef95ecd IB/mlx5: Expose correct max_sge_rd limit
mlx5 devices (Connect-IB, ConnectX-4, ConnectX-4-LX) has a limitation
where rdma read work queue entries cannot exceed 512 bytes.
A rdma_read wqe needs to fit in 512 bytes:
- wqe control segment (16 bytes)
- rdma segment (16 bytes)
- scatter elements (16 bytes each)

So max_sge_rd should be: (512 - 16 - 16) / 16 = 30.

Cc: linux-stable@vger.kernel.org
Reported-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagig@grimberg.me>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-04-28 10:49:17 -04:00
Chen-Yu Tsai
2963070a0f mmc: sunxi: Disable eMMC HS-DDR (MMC_CAP_1_8V_DDR) for Allwinner A80
eMMC HS-DDR no longer works on the A80, despite it working when support
for this developed.

Disable it for now.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2016-04-28 11:43:54 +02:00
Kan Liang
cf3beb7c90 perf/x86/intel: Fix incorrect lbr_sel_mask value
This patch fixes a bug which was introduced by:

 b16a5b52eb ("perf/x86: Add option to disable reading branch flags/cycles")

In this patch, lbr_sel_mask is used to mask the lbr_select. But LBR_SEL_MASK
doesn't include the bit for LBR_CALL_STACK. So LBR call stack will never be
set in lbr_select.

This patch corrects the LBR_SEL_MASK by including all valid bits in
LBR_SELECT. Also, the LBR_CALL_STACK bit is different as other bit in
LBR_SELECT. It does not operate in suppress mode, so it needs to be
specially handled in intel_pmu_setup_hw_lbr_filter.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461231010-4399-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:43 +02:00
Alexander Shishkin
1c5ac21a0e perf/x86/intel/pt: Don't die on VMXON
Some versions of Intel PT do not support tracing across VMXON, more
specifically, VMXON will clear TraceEn control bit and any attempt to
set it before VMXOFF will throw a #GP, which in the current state of
things will crash the kernel. Namely:

  $ perf record -e intel_pt// kvm -nographic

on such a machine will kill it.

To avoid this, notify the intel_pt driver before VMXON and after
VMXOFF so that it knows when not to enable itself.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:42 +02:00
Peter Zijlstra
79c9ce57eb perf/core: Fix perf_event_open() vs. execve() race
Jann reported that the ptrace_may_access() check in
find_lively_task_by_vpid() is racy against exec().

Specifically:

  perf_event_open()		execve()

  ptrace_may_access()
				commit_creds()
  ...				if (get_dumpable() != SUID_DUMP_USER)
				  perf_event_exit_task();
  perf_install_in_context()

would result in installing a counter across the creds boundary.

Fix this by wrapping lots of perf_event_open() in cred_guard_mutex.
This should be fine as perf_event_exit_task() is already called with
cred_guard_mutex held, so all perf locks already nest inside it.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:41 +02:00
Adam Borowski
0a25556f84 perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
The entry for PERF_COUNT_HW_REF_CPU_CYCLES is not used on AMD, but is
referenced by filter_events() which expects undefined events to have a
value of 0.

Found via KASAN:

  UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:30
  index 9 is out of range for type 'u64 [9]'
  UBSAN: Undefined behaviour in arch/x86/events/amd/core.c:132:9
  load of address ffffffff81c021c8 with insufficient space for an object of type 'const u64'

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1461749731-30979-1-git-send-email-kilobyte@angband.pl
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:20:25 +02:00
Ilya Dryomov
d3767f0fae rbd: report unsupported features to syslog
... instead of just returning an error.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:43 +02:00
Ilya Dryomov
811c668877 rbd: fix rbd map vs notify races
A while ago, commit 9875201e10 ("rbd: fix use-after free of
rbd_dev->disk") fixed rbd unmap vs notify race by introducing
an exported wrapper for flushing notifies and sticking it into
do_rbd_remove().

A similar problem exists on the rbd map path, though: the watch is
registered in rbd_dev_image_probe(), while the disk is set up quite
a few steps later, in rbd_dev_device_setup().  Nothing prevents
a notify from coming in and crashing on a NULL rbd_dev->disk:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
    Call Trace:
     [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd]
     [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph]
     [<ffffffff8109d5db>] process_one_work+0x17b/0x470
     [<ffffffff8109e3ab>] worker_thread+0x11b/0x400
     [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
     [<ffffffff810a5acf>] kthread+0xcf/0xe0
     [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
     [<ffffffff81645dd8>] ret_from_fork+0x58/0x90
     [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
    RIP  [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd]

If an error occurs during rbd map, we have to error out, potentially
tearing down a watch.  Just like on rbd unmap, notifies have to be
flushed, otherwise rbd_watch_cb() may end up trying to read in the
image header after rbd_dev_image_release() has run:

    Assertion failure in rbd_dev_header_info() at line 4722:

     rbd_assert(rbd_image_format_valid(rbd_dev->image_format));

    Call Trace:
     [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150
     [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390
     [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290
     [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0
     [<ffffffff81107799>] process_one_work+0x689/0x1a80
     [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80
     [<ffffffff81132065>] ? finish_task_switch+0x225/0x640
     [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0
     [<ffffffff81108c69>] worker_thread+0xd9/0x1320
     [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80
     [<ffffffff8111b02d>] kthread+0x21d/0x2e0
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
     [<ffffffff82022802>] ret_from_fork+0x22/0x40
     [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550
    RIP  [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30

To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling
revalidate_disk(), b) move ceph_osdc_flush_notifies() call into
rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn
header read-in into a critical section.  The latter also happens to
take care of rbd map foo@bar vs rbd snap rm foo@bar race.

Fixes: http://tracker.ceph.com/issues/15490

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
2016-04-28 10:07:22 +02:00
Keith Busch
1bdb897039 x86/apic: Handle zero vector gracefully in clear_vector_irq()
If x86_vector_alloc_irq() fails x86_vector_free_irqs() is invoked to cleanup
the already allocated vectors. This subsequently calls clear_vector_irq().

The failed irq has no vector assigned, which triggers the BUG_ON(!vector) in
clear_vector_irq().

We cannot suppress the call to x86_vector_free_irqs() for the failed
interrupt, because the other data related to this irq must be cleaned up as
well. So calling clear_vector_irq() with vector == 0 is legitimate.

Remove the BUG_ON and return if vector is zero,

[ tglx: Massaged changelog ]

Fixes: b5dc8e6c21 "x86/irq: Use hierarchical irqdomain to manage CPU interrupt vectors"
Signed-off-by: Keith Busch <keith.busch@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-04-28 09:53:06 +02:00
Leo Yan
15333e3af1 thermal: use %d to print S32 parameters
Power allocator's parameters are S32 type, so use %d to print them.

Acked-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-27 15:54:51 -07:00
Leo Yan
5fdfc48bb0 thermal: hisilicon: increase temperature resolution
When calculate temperature, old code firstly do division and then
convert to "millicelsius" unit. This will lose resolution and only can
read back temperature with "Celsius" unit.

So firstly scale step value to "millicelsius" and then do division, so
finally we can increase resolution for temperature value. Also refine
the calculation from temperature value to step value.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2016-04-27 15:54:01 -07:00
Linus Torvalds
b75a2bf899 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
 "So, it turns out we had a silly bug in the most fundamental part of
  workqueue for a very long time.  AFAICS, this dates back to pre-git
  era and has quite likely been there from the time workqueue was first
  introduced.

  A work item uses its PENDING bit to synchronize multiple queuers.
  Anyone who wins the PENDING bit owns the pending state of the work
  item.  Whether a queuer wins or loses the race, one thing should be
  guaranteed - there will soon be at least one execution of the work
  item - where "after" means that the execution instance would be able
  to see all the changes that the queuer has made prior to the queueing
  attempt.

  Unfortunately, we were missing a smp_mb() after clearing PENDING for
  execution, so nothing guaranteed visibility of the changes that a
  queueing loser has made, which manifested as a reproducible blk-mq
  stall.

  Lots of kudos to Roman for debugging the problem.  The patch for
  -stable is the minimal one.  For v3.7, Peter is working on a patch to
  make the code path slightly more efficient and less fragile"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix ghost PENDING flag while doing MQ IO
2016-04-27 12:03:59 -07:00
Linus Torvalds
763cfc86ee Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Two patches to fix a deadlock which can be easily triggered if memcg
  charge moving is used.

  This bug was introduced while converting threadgroup locking to a
  global percpu_rwsem and is caused by cgroup controller task migration
  path depending on the ability to create new kthreads.  cpuset had a
  similar issue which was fixed by performing heavy-lifting operations
  asynchronous to task migration.  The two patches fix the same issue in
  memcg in a similar way.  The first patch makes the mechanism generic
  and the second relocates memcg charge moving outside the migration
  path.

  Given that we don't want to perform heavy operations while
  writelocking threadgroup lock anyway, moving them out of the way is a
  desirable solution.  One thing to note is that the problem was
  difficult to debug because lockdep couldn't figure out the deadlock
  condition.  Looking into how to improve that"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  memcg: relocate charge moving from ->attach to ->post_attach
  cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
2016-04-27 11:41:14 -07:00
Linus Torvalds
3118e5f966 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "I2C has one buildfix, one ABBA deadlock fix, and three simple 'add ID'
  patches"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
  i2c: cpm: Fix build break due to incompatible pointer types
  i2c: ismt: Add Intel DNV PCI ID
  i2c: xlp9xx: add support for Broadcom Vulcan
  i2c: rk3x: add support for rk3228
2016-04-27 11:34:45 -07:00
Linus Torvalds
24131a61ec ARC fixes for 4.6-rc6
- LOCKDEP now words for ARCv2 builds
  - Enabling DT reserved-memory binding to work (for forthcoming HDMI driver)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXIMkcAAoJEGnX8d3iisJeHdkP/1Eb+V6asWOPVGinbdKs0nhs
 xdnFiBVesgdfXKBxY/K1GN6bVNUqaPNVRk7Y1fWFzS6O2tgVMMDFz+4FebU6sDMv
 wiQuxqzEYhyaiqNvw2JUH60Y5GiCjWFLMfgR2FKCkUM99NZm/2DFos2hPE81K2yc
 4JfEQoPcsuYYN6nheMKa0sjomGHg6qIL4wi3uvB3RCrbqs1MbauKpsbdbNF3thqZ
 xpeHavHLRLUTnN/lIf7Z1SwSh6S0Ey7YFLePxsC48vZCL0a8L1HfFSfvjcxuHBMU
 cGsRXWQmwVjPUeEC9JIPcDkbEfQ1nlezU8lZcg7PJHOt4DByxN3sCngAPKt6Skls
 I2Ql5tP12IUmuWv4zpM7VP/ZZvC5EOh4RmG2xQwuV+rtDilkYHZrUPx7PdilRt3S
 a+A+FoYMgczdTTSCJnI0kJYADmPtz/6e1N9rSyzzmVnDmSPR9hClO9dENtpSwiXD
 Jtpo4tBsMLyw6+Oj68e70c58t8ek9PObR1lIqZ3zJ97hvgACjjpeDytVjv7oPpYH
 scTUfx69s6qF96Wn+y44Iw1gRV896UFlLmHtX9Hk0BtuVdjZuwOIF1Kqfsk6SYsJ
 0WFdnxoNJHPJVWkp+Pmrz7g8BaUxfAtc7Ly6C+8yUUa1nQswYQmrK+84uKLVjiWl
 E2sBDIJl2Xu2E7amTJss
 =rlDa
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - lockdep now works for ARCv2 builds

 - enable DT reserved-memory binding (for forthcoming HDMI driver)

* tag 'arc-4.6-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: add support for reserved memory defined by device tree
  ARC: support generic per-device coherent dma mem
  Documentation: dt: arc: fix spelling mistakes
  ARCv2: Enable LOCKDEP
2016-04-27 09:46:21 -07:00
Linus Torvalds
508fea71c6 nios2 fix for v4.6
nios2: memset: use the right constraint modifier for the %4 output operand
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXIH+FAAoJEFWoEK+e3syCfRoQAMPXIiWR/V/dLn3OX8f8CeA6
 I9duVqMrKrVh/a+bwxzmVJkumm0xzqYnOyhOpX5fZd3Nx44Q4NynJakwgWpMDVAI
 +xXxNtHZUhjcRC4EuqJW677plR0Uq8bWY2UibpARPHfB9d0arJOCuL11vdGCAjkg
 lWeVzUFg7iB9n0tRFwvsN29EcRZDo7+WbJh3cGIfTYNbcihJfiAAlmNyXS7XFiBY
 DqSYyTXIc8scH1q66gArTnyDryvM7cEZ+zyYoX9v8/E/+xPLLLhogthtqf3u4opn
 J/70k1LBgzgHCYrlEG8vvd1kCr114PLo7RlgkwJqdAtVhtMAtcGZFfkVQkSo7R3h
 gHQHf8f0exg3JKp0VesB443FyaIvpCNkth3eGdNMWunhCPB4bXE5W+hg5J5gZBNi
 1Ft9VB9Ug/8sh9Es4muinNX1kR5Fc8IWQqIa2U/OCt4O2wR1aFanJvaRqeCozbES
 SpbRAOoXtzOZ0xRPZGPQpqP6ggfizq9Zil2ZTeXbNPBRybFvmEpgewdJYqvrNwZj
 pbgB1+7zVcsfGTiMhJ+d1rLUX/oeMuUWT+eHY1jM1k+gTQVUo6k8KNVoaiNKa9z5
 cZh+XqgWKE6Qv4DVRnj+ouvDuglhwqIAyI3oXElghrYmCWxjvVKCXowQhnSq727M
 th2iyocnFMjIVWd5u3b7
 =PRJ4
 -----END PGP SIGNATURE-----

Merge tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2

Pull arch/nios2 fix from Ley Foon Tan:
 "memset: use the right constraint modifier for the %4 output operand"

* tag 'nios2-v4.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  nios2: memset: use the right constraint modifier for the %4 output operand
2016-04-27 09:33:24 -07:00
Flora Cui
afc4542105 drm/amdgpu: disable vm interrupts with vm_fault_stop=2
V2: disable all vm interrupts in late_init()

Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-27 12:27:10 -04:00
Alex Deucher
c8791a13d2 drm/amdgpu: print a message if ATPX dGPU power control is missing
It will help identify problematic boards.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-27 12:27:09 -04:00
Alex Deucher
e9bef455af Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
This reverts commit bedf2a65c1.

See the radeon revert for an extended description.

Cc: stable@vger.kernel.org
2016-04-27 12:27:09 -04:00
Vitaly Prosyak
5d5b7803c4 drm/radeon: fix vertical bars appear on monitor (v2)
When crtc/timing is disabled on boot the dig block
should be stopped in order ignore timing from crtc,
reset the steering fifo otherwise we get display
corruption or hung in dp sst mode.

v2: agd: fix coding style

Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2016-04-27 12:27:08 -04:00
Flora Cui
56fc350224 drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail
Fixes the following scenario:

1. Page table bo allocated in vram and linked to man->lru.
   tbo->list_kref.refcount=2
2. Page table bo is swapped out and removed from man->lru.
   tbo->list_kref.refcount=1
3. Command submission from userspace.  Page table bo is moved
   to vram.  ttm_bo_move_to_lru_tail() link it to man->lru and
   don't increase the kref count.

Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2016-04-27 12:26:50 -04:00
Linus Torvalds
9453203bf8 platform-drivers-x86 for 4.6-3
toshiba_acpi:
  - Fix regression caused by hotkey enabling value
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXIEZSAAoJEKbMaAwKp364mLwH/3j01EDn0JF1FIIP+kxVgeeL
 g8xI+0tlFzxmdcBqW3n4q0apzVuCmHr0pbOik289l3dv7hQ5PEvdmK/VhVPYmJDL
 2u/4EWmW7cvYMUAVhGB499pKac38fMUN5y97dkmoikiTQO6VaWsvdczvXuhuz/dP
 OcQzRR/UttCLMe/ERxz3xh4R9kbY5Hzh4slW8Ay/sGDRrgOUFRLT8Zg3Uo7MY27i
 Kq++SrH96edL1dW6XkWFIqO7NzWGlbBxTMlTlh+xmGUkOtVxUyzAID3NEDIaw6zC
 7QU61eyfIJToa2SxHZ/mT9bEFNHNbJR4KoLREG6K2LbRyMhsQfMxaTym8MNzT/Q=
 =+IXa
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver fix from Darren Hart:
 "Fix regression caused by hotkey enabling value in toshiba_acpi"

* tag 'platform-drivers-x86-v4.6-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  toshiba_acpi: Fix regression caused by hotkey enabling value
2016-04-27 08:57:11 -07:00
Takashi Iwai
af9cc93c0d ASoC: Fixes for v4.6
This is a fairly large collection of fixes but almost all driver
 specific ones, especially to the new Intel drivers which have had a lot
 of recent development.  The one core fix is a change to the debugfs code
 to avoid crashes in some relatively unusual configurations.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXINZtAAoJECTWi3JdVIfQ+BwH/1eLqMfCSZM9nsDr1QMvOCDP
 SO4ZoWqvYplBcS8pYKbJmqtuo8jMxT3VIQF+b5hPAVhgpLwMmy9qeFtatqCQ2WDC
 GfCqW8LSKtrzwUwmoRrtHx7vfBLP1/z78F8ORQzwhrplTCBhvPLbUOrV51EFj6tf
 Dfo2tW0uxww9iCZduYu4LadOhFOfuw+5shUrJk5A5f975Zbdgyke4CbRnlbDPXLq
 d4i7bNfiISkSJiKMpdZFeiOQCd0+uXHh2WkMtVYSGVTA2Kf7d7HtX+JpEFFmaJgJ
 8CndjgNJ1ZXtMHl1pMYmNqKJ5mEgmVtbGGJWY4QmQBva0EfQ+vLZt78BG3qvJwk=
 =SXH2
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v4.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v4.6

This is a fairly large collection of fixes but almost all driver
specific ones, especially to the new Intel drivers which have had a lot
of recent development.  The one core fix is a change to the debugfs code
to avoid crashes in some relatively unusual configurations.
2016-04-27 17:30:49 +02:00
Alexey Brodkin
1b10cb21d8 ARC: add support for reserved memory defined by device tree
Enable reserved memory initialization from device tree.

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-04-27 17:06:56 +05:30
Alexey Brodkin
32ed9a0e0d ARC: support generic per-device coherent dma mem
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-04-27 17:06:55 +05:30
Romain Perier
a8950e49bd nios2: memset: use the right constraint modifier for the %4 output operand
Depending on the size of the area to be memset'ed, the nios2 memset implementation
either uses a naive loop (for buffers smaller or equal than 8 bytes) or a more optimized
implementation (for buffers larger than 8 bytes). This implementation does 4-byte stores
rather than 1-byte stores to speed up memset.

However, we discovered that on our nios2 platform, memset() was not properly setting the
buffer to the expected value. A memset of 0xff would not set the entire buffer to 0xff, but to:

0xff 0x00 0xff 0x00 0xff 0x00 0xff 0x00 ...

Which is obviously incorrect. Our investigation has revealed that the problem lies in the
incorrect constraints used in the inline assembly.

The following piece of assembly, from the nios2 memset implementation, is supposed to
create a 4-byte value that repeats 4 times the 1-byte pattern passed as memset argument:

/* fill8 %3, %5 (c & 0xff) */
"       slli    %4, %5, 8\n"
"       or      %4, %4, %5\n"
"       slli    %3, %4, 16\n"
"       or      %3, %3, %4\n"

However, depending on the compiler and optimization level, this code might be compiled as:

34:	280a923a 	slli	r5,r5,8
38:	294ab03a 	or	r5,r5,r5
3c:	2808943a 	slli	r4,r5,16
40:	2148b03a 	or	r4,r4,r5

This is wrong because r5 gets used both for %5 and %4, which leads to the final pattern
stored in r4 to be 0xff00ff00 rather than the expected 0xffffffff.

%4 is defined with the "=r" constraint, i.e as an output operand. However, as explained in
http://www.ethernut.de/en/documents/arm-inline-asm.html, this does not prevent gcc from
using the same register for an output operand (%4) and input operand (%5). By using the
constraint modifier '&', we indicate that the register should be used for output only. With this
change, we get the following assembly output:

34:	2810923a 	slli	r8,r5,8
38:	4150b03a 	or	r8,r8,r5
3c:	400e943a 	slli	r7,r8,16
40:	3a0eb03a 	or	r7,r7,r8

Which correctly produces the 0xffffffff pattern when 0xff is passed as the memset() pattern.

It is worth mentioning the observed consequence of this bug: we were hitting the kernel
BUG() in mm/bootmem.c:__free() that verifies when marking a page as free that it was
previously marked as occupied (i.e that the bit was set to 1). The entire bootmem bitmap is
set to 0xff bit via a memset() during the bootmem initialization. The bootmem_free() call right
after the initialization was finding some bits to be set to 0, which didn't make sense since the
bitmap has just been memset'ed to 0xff. Except that due to the bug explained above, the
bitmap was in fact initialized to 0xff00ff00.

Thanks to Marek Vasut for his help and feedback.

Signed-off-by: Romain Perier <romain.perier@free-electrons.com>
Acked-by: Marek Vasut <marex@denx.de>
Acked-by: Ley Foon Tan <lftan@altera.com>
2016-04-27 16:35:55 +08:00
Martin Schwidefsky
532c34b5fb s390/sclp_ctl: fix potential information leak with /dev/sclp
The sclp_ctl_ioctl_sccb function uses two copy_from_user calls to
retrieve the sclp request from user space. The first copy_from_user
fetches the length of the request which is stored in the first two
bytes of the request. The second copy_from_user gets the complete
sclp request, but this copies the length field a second time.
A malicious user may have changed the length in the meantime.

Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-04-27 09:33:39 +02:00
Rui Salvaterra
d701cca674 powerpc: wire up preadv2 and pwritev2 syscalls
Wire up preadv2/pwritev2 in the same way as preadv/pwritev. Fixes two
build warnings on ppc64.

mpe: Lightly tested with fio (slightly hacked to add the syscall
wrappers):

  fio-4217  [009] ....  1304.635300: sys_preadv2(fd: 3, vec:
  10025821de0, vlen: 1, pos_l: 6253000, pos_h: 0, flags: 1)
  fio-4217  [009] ....  1304.635474: sys_preadv2 -> 0x1000

Signed-off-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27 16:47:55 +10:00
Michael Neuling
2bc79ffcbb cxl: Poll for outstanding IRQs when detaching a context
When detaching contexts, we may still have interrupts in the system
which are yet to be delivered to any CPU and be acked in the PSL.
This can result in a subsequent unrelated process getting an spurious
IRQ or an interrupt for a non-existent context.

This polls the PSL to ensure that the PSL is clear of IRQs for the
detached context, before removing the context from the idr.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27 12:04:48 +10:00
Michael Neuling
d6776bba44 cxl: Keep IRQ mappings on context teardown
Keep IRQ mappings on context teardown.  This won't leak IRQs as if we
allocate the mapping again, the generic code will give the same
mapping used last time.

Doing this works around a race in the generic code. Masking the
interrupt introduces a race which can crash the kernel or result in
IRQ that is never EOIed. The lost of EOI results in all subsequent
mappings to the same HW IRQ never receiving an interrupt.

We've seen this race with cxl test cases which are doing heavy context
startup and teardown at the same time as heavy interrupt load.

A fix to the generic code is being investigated also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: stable@vger.kernel.org # 3.8
Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-04-27 12:04:31 +10:00
Gustavo Padovan
9a11d2e7e6 drm/virtio: send vblank event after crtc updates
virtio_gpu was failing to send vblank events when using the atomic IOCTL
with the DRM_MODE_PAGE_FLIP_EVENT flag set. This patch fixes each and
enables atomic pageflips updates.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:32:04 +10:00
Lyude
9dc0487d96 drm/dp/mst: Restore primary hub guid on resume
Some hubs are forgetful, and end up forgetting whatever GUID we set
previously after we do a suspend/resume cycle. This can lead to
hotplugging breaking (along with probably other things) since the hub
will start sending connection notifications with the wrong GUID. As
such, we need to check on resume whether or not the GUID the hub is
giving us is valid.

Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460580618-7421-1-git-send-email-cpaul@redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:29:40 +10:00
cpaul@redhat.com
263efde31f drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
We can thank KASAN for finding this, otherwise I probably would have spent
hours on it. This fixes a somewhat harder to trigger kernel panic, occuring
while enabling MST where the port we were currently updating the payload on
would have all of it's refs dropped before we finished what we were doing:

==================================================================
BUG: KASAN: use-after-free in drm_dp_update_payload_part1+0xb3f/0xdb0 [drm_kms_helper] at addr ffff8800d29de018
Read of size 4 by task Xorg/973
=============================================================================
BUG kmalloc-2048 (Tainted: G    B   W      ): kasan: bad access detected
-----------------------------------------------------------------------------

INFO: Allocated in drm_dp_add_port+0x1aa/0x1ed0 [drm_kms_helper] age=16477 cpu=0 pid=2175
	___slab_alloc+0x472/0x490
	__slab_alloc+0x20/0x40
	kmem_cache_alloc_trace+0x151/0x190
	drm_dp_add_port+0x1aa/0x1ed0 [drm_kms_helper]
	drm_dp_send_link_address+0x526/0x960 [drm_kms_helper]
	drm_dp_check_and_send_link_address+0x1ac/0x210 [drm_kms_helper]
	drm_dp_mst_link_probe_work+0x77/0xd0 [drm_kms_helper]
	process_one_work+0x562/0x1350
	worker_thread+0xd9/0x1390
	kthread+0x1c5/0x260
	ret_from_fork+0x22/0x40
INFO: Freed in drm_dp_free_mst_port+0x50/0x60 [drm_kms_helper] age=7521 cpu=0 pid=2175
	__slab_free+0x17f/0x2d0
	kfree+0x169/0x180
	drm_dp_free_mst_port+0x50/0x60 [drm_kms_helper]
	drm_dp_destroy_connector_work+0x2b8/0x490 [drm_kms_helper]
	process_one_work+0x562/0x1350
	worker_thread+0xd9/0x1390
	kthread+0x1c5/0x260
	ret_from_fork+0x22/0x40

which on this T460s, would eventually lead to kernel panics in somewhat
random places later in intel_mst_enable_dp() if we got lucky enough.

Signed-off-by: Lyude <cpaul@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-04-27 09:26:12 +10:00
Linus Torvalds
f28f20da70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Handle v4/v6 mixed sockets properly in soreuseport, from Craig
    Gallak.

 2) Bug fixes for the new macsec facility (missing kmalloc NULL checks,
    missing locking around netdev list traversal, etc.) from Sabrina
    Dubroca.

 3) Fix handling of host routes on ifdown in ipv6, from David Ahern.

 4) Fix double-fdput in bpf verifier.  From Jann Horn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
  bpf: fix double-fdput in replace_map_fd_with_map_ptr()
  net: ipv6: Delete host routes on an ifdown
  Revert "ipv6: Revert optional address flusing on ifdown."
  net/mlx4_en: fix spurious timestamping callbacks
  net: dummy: remove note about being Y by default
  cxgbi: fix uninitialized flowi6
  ipv6: Revert optional address flusing on ifdown.
  ipv4/fib: don't warn when primary address is missing if in_dev is dead
  net/mlx5: Add pci shutdown callback
  net/mlx5_core: Remove static from local variable
  net/mlx5e: Use vport MTU rather than physical port MTU
  net/mlx5e: Fix minimum MTU
  net/mlx5e: Device's mtu field is u16 and not int
  net/mlx5_core: Add ConnectX-5 to list of supported devices
  net/mlx5e: Fix MLX5E_100BASE_T define
  net/mlx5_core: Fix soft lockup in steering error flow
  qlcnic: Update version to 5.3.64
  net: stmmac: socfpga: Remove re-registration of reset controller
  macsec: fix netlink attribute validation
  macsec: add missing macsec prefix in uapi
  ...
2016-04-26 16:25:51 -07:00
Dave Airlie
bd0b560a75 Merge branch 'drm-etnaviv-fixes' of git://git.pengutronix.de:/git/lst/linux into drm-fixes
just a single fix to not move the GPU linear window on cores where it
might lead to inconsistent views of the memory by different engines in
the core, thus breaking relocs and possibly causing other fun.

* 'drm-etnaviv-fixes' of git://git.pengutronix.de:/git/lst/linux:
  drm/etnaviv: don't move linear memory window on 3D cores without MC2.0
2016-04-27 09:19:06 +10:00
Linus Torvalds
91ea692f87 Here are the latest bug fixes for ARM SoCs, mostly addressing
recent regressions. Changes are across several platforms, so
 I'm listing every change separately here.
 
 Regressions since 4.5:
 
  - A correction of the psci firmware DT binding, to prevent
    users from relying on unintended semantics
 
  - Actually getting the newly merged clock driver for some OMAP
    platforms to work
 
  - A revert of patches for the Qualcomm BAM, these need to be
    reworked for 4.7 to avoid breaking boards other than the one
    they were intended for
 
  - A correction for the I2C device nodes on the Socionext Uniphier
    platform
 
  - i.MX SDHCI was broken for non-DT platforms due to a change
    with the setting of the DMA mask
 
  - A revert of a patch that accidentally added a nonexisting
    clock on the Rensas "Porter" board
 
  - A couple of OMAP fixes that are all related to suspend after
    the power domain changes for dra7
 
  - On Mediatek, revert part of the power domain initialization
    changes that broke mt8173-evb
 
 Fixes for older bugs:
 
  - Workaround for an "external abort" in the omap34xx
    suspend/resume code.
 
  - The USB1/eSATA should not be listed as an excon device on
    am57xx-beagle-x15 (broken since v4.0)
 
  - A v4.5 regression in the TI AM33xx and AM43XX DT specifying
    incorrect DMA request lines for the GPMC
 
  - The jiffies calibration on Renesas platforms was incorrect
    for some modern CPU cores.
 
  - A hardware errata woraround for clockdomains on TI DRA7
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVx+5v2CrR//JCVInAQJ/ZBAArI3ZiR+Jj2dZCm9c7+PjlDWngJpBME3V
 o4aF9CeyuA/eyx+QtKAq1ScG2eRIbfab03XGBMEHXpKmmiTXYFIcLFHewwSGBYsy
 XUsNO+ZKsw92ImSdcX9p45BjkAADJvUwX5BzDlfOQ5mNX+o0Godb/8Mi2Y6RIqTK
 5C0xQ0YE8ZN7xtyNzFylaI+CL6wsVLy6PUKig7UIrOOXQK3Tzt4mEz2ksrSBJzON
 RiG7kPLf+Zd013WyF/ZUdC3VErDOP7C1Z+YRcK+2rxjlL+4oJUznsoaBYJgLUV+T
 GmcD0TZNwt6x6FWF6cSiUa+gl+6oWRZwTGfUooS1zEcuLHBsONdMtVat4Z01RYos
 rdMvFgZ6bxG7n4tajI2jg1gokGfyMfYuKwnHuA8Ynzn4N/VcnnbfxPRyV/RMLN0W
 ad/e12SlLMX1XahrD9uo/oH/X73gHPnbHlLLzWfDfnyvNGvWiW3SNklFT03q/Yn+
 fgfB0OnzG8+a3c/LHZbtAo/yYYLdqIuOg8I40AizN3CKHamUWPAjgFfdHdQADVV8
 yC5ugVB6x7RYID/49IPT1C3n/SjoypYyRbo30ipqyz2dTf6kz35SY/YjYNSaIYvY
 QfnGFuywsKsTprGAzI+x/fGo61Ve0/XkK9RPt0opU1+WdYr3sE+ufGVLVn4g4Cw3
 wfd20UTVwGs=
 =YgL2
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here are the latest bug fixes for ARM SoCs, mostly addressing recent
  regressions.  Changes are across several platforms, so I'm listing
  every change separately here.

  Regressions since 4.5:

   - A correction of the psci firmware DT binding, to prevent users from
     relying on unintended semantics

   - Actually getting the newly merged clock driver for some OMAP
     platforms to work

   - A revert of patches for the Qualcomm BAM, these need to be reworked
     for 4.7 to avoid breaking boards other than the one they were
     intended for

   - A correction for the I2C device nodes on the Socionext Uniphier
     platform

   - i.MX SDHCI was broken for non-DT platforms due to a change with the
     setting of the DMA mask

   - A revert of a patch that accidentally added a nonexisting clock on
     the Rensas "Porter" board

   - A couple of OMAP fixes that are all related to suspend after the
     power domain changes for dra7

   - On Mediatek, revert part of the power domain initialization changes
     that broke mt8173-evb

  Fixes for older bugs:

   - Workaround for an "external abort" in the omap34xx suspend/resume
     code.

   - The USB1/eSATA should not be listed as an excon device on
     am57xx-beagle-x15 (broken since v4.0)

   - A v4.5 regression in the TI AM33xx and AM43XX DT specifying
     incorrect DMA request lines for the GPMC

   - The jiffies calibration on Renesas platforms was incorrect for some
     modern CPU cores.

   - A hardware errata woraround for clockdomains on TI DRA7"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  drivers: firmware: psci: unify enable-method binding on ARM {64,32}-bit systems
  arm64: dts: uniphier: fix I2C nodes of PH1-LD20
  ARM: shmobile: timer: Fix preset_lpj leading to too short delays
  Revert "ARM: dts: porter: Enable SCIF_CLK frequency and pins"
  ARM: dts: r8a7791: Don't disable referenced optional clocks
  Revert "ARM: OMAP: Catch callers of revision information prior to it being populated"
  ARM: OMAP3: Fix external abort on 36xx waking from off mode idle
  ARM: dts: am57xx-beagle-x15: remove extcon_usb1
  ARM: dts: am437x: Fix GPMC dma properties
  ARM: dts: am33xx: Fix GPMC dma properties
  Revert "soc: mediatek: SCPSYS: Fix double enabling of regulators"
  ARM: mach-imx: sdhci-esdhc-imx: initialize DMA mask
  ARM: DRA7: clockdomain: Implement timer workaround for errata i874
  ARM: OMAP: Catch callers of revision information prior to it being populated
  ARM: dts: dra7: Correct clock tree for sys_32k_ck
  ARM: OMAP: DRA7: Provide proper class to omap2_set_globals_tap
  ARM: OMAP: DRA7: wakeupgen: Skip SAR save for wakeupgen
  Revert "dts: msm8974: Add dma channels for blsp2_i2c1 node"
  Revert "dts: msm8974: Add blsp2_bam dma node"
  ARM: dts: Add clocks for dm814x ADPLL
2016-04-26 16:17:01 -07:00
Linus Torvalds
8ead9dd547 devpts: more pty driver interface cleanups
This is more prep-work for the upcoming pty changes.  Still just code
cleanup with no actual semantic changes.

This removes a bunch pointless complexity by just having the slave pty
side remember the dentry associated with the devpts slave rather than
the inode.  That allows us to remove all the "look up the dentry" code
for when we want to remove it again.

Together with moving the tty pointer from "inode->i_private" to
"dentry->d_fsdata" and getting rid of pointless inode locking, this
removes about 30 lines of code.  Not only is the end result smaller,
it's simpler and easier to understand.

The old code, for example, depended on the d_find_alias() to not just
find the dentry, but also to check that it is still hashed, which in
turn validated the tty pointer in the inode.

That is a _very_ roundabout way to say "invalidate the cached tty
pointer when the dentry is removed".

The new code just does

	dentry->d_fsdata = NULL;

in devpts_pty_kill() instead, invalidating the tty pointer rather more
directly and obviously.  Don't do something complex and subtle when the
obvious straightforward approach will do.

The rest of the patch (ie apart from code deletion and the above tty
pointer clearing) is just switching the calling convention to pass the
dentry or file pointer around instead of the inode.

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Cc: Jann Horn <jann@thejh.net>
Cc: Greg KH <greg@kroah.com>
Cc: Jiri Slaby <jslaby@suse.com>
Cc: Florian Weimer <fw@deneb.enyo.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-26 15:47:32 -07:00
Jann Horn
8358b02bf6 bpf: fix double-fdput in replace_map_fd_with_map_ptr()
When bpf(BPF_PROG_LOAD, ...) was invoked with a BPF program whose bytecode
references a non-map file descriptor as a map file descriptor, the error
handling code called fdput() twice instead of once (in __bpf_map_get() and
in replace_map_fd_with_map_ptr()). If the file descriptor table of the
current task is shared, this causes f_count to be decremented too much,
allowing the struct file to be freed while it is still in use
(use-after-free). This can be exploited to gain root privileges by an
unprivileged user.

This bug was introduced in
commit 0246e64d9a ("bpf: handle pseudo BPF_LD_IMM64 insn"), but is only
exploitable since
commit 1be7f75d16 ("bpf: enable non-root eBPF programs") because
previously, CAP_SYS_ADMIN was required to reach the vulnerable code.

(posted publicly according to request by maintainer)

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-26 17:37:21 -04:00
Mark Brown
8c0f551004 Merge remote-tracking branches 'asoc/fix/rt5640' and 'asoc/fix/wm8962' into asoc-linus 2016-04-26 19:25:18 +01:00
Mark Brown
78cfca32ca Merge remote-tracking branches 'asoc/fix/arizona', 'asoc/fix/cs35l32', 'asoc/fix/hdac', 'asoc/fix/nau8825' and 'asoc/fix/rt5616' into asoc-linus 2016-04-26 19:25:15 +01:00
Mark Brown
e408057767 Merge remote-tracking branch 'asoc/fix/intel' into asoc-linus 2016-04-26 19:25:14 +01:00