Commit Graph

634330 Commits

Author SHA1 Message Date
Boris Brezillon
40b6e61ac7 ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()
Commit e96a8a3bb6 ("UBI: Fastmap: Do not add vol if it already
exists") introduced a bug by changing the possible error codes returned
by add_vol():
- this function no longer returns NULL in case of allocation failure
  but return ERR_PTR(-ENOMEM)
- when a duplicate entry in the volume RB tree is found it returns
  ERR_PTR(-EEXIST) instead of ERR_PTR(-EINVAL)

Fix the tests done on add_vol() return val to match this new behavior.

Fixes: e96a8a3bb6 ("UBI: Fastmap: Do not add vol if it already exists")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Acked-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-28 14:48:18 +02:00
Gabriel Krisman Bertazi
a7d5afe82d MAINTAINERS: Add entry for genwqe driver
Frank and I maintain this

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: haver@linux.vnet.ibm.com
Acked-by: Frank Haverkamp <haver@linux.vnet.ibm.com>=
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 08:28:24 -04:00
Jorgen Hansen
eb94cd68ab VMCI: Doorbell create and destroy fixes
This change consists of two changes:

1) If vmci_doorbell_create is called when neither guest nor
   host personality as been initialized, vmci_get_context_id
   will return VMCI_INVALID_ID. In that case, we should fail
   the create call.
2) In doorbell destroy, we assume that vmci_guest_code_active()
   has the same return value on create and destroy. That may not
   be the case, so we may end up with the wrong refcount.
   Instead, destroy should check explicitly whether the doorbell
   is in the index table as an indicator of whether the guest
   code was active at create time.

Reviewed-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 08:26:21 -04:00
Gerald Schaefer
a7a7aeefbc GenWQE: Fix bad page access during abort of resource allocation
When interrupting an application which was allocating DMAable
memory, it was possible, that the DMA memory was deallocated
twice, leading to the error symptoms below.

Thanks to Gerald, who analyzed the problem and provided this
patch.

I agree with his analysis of the problem: ddcb_cmd_fixups() ->
genwqe_alloc_sync_sgl() (fails in f/lpage, but sgl->sgl != NULL
and f/lpage maybe also != NULL) -> ddcb_cmd_cleanup() ->
genwqe_free_sync_sgl() (double free, because sgl->sgl != NULL and
f/lpage maybe also != NULL)

In this scenario we would have exactly the kind of double free that
would explain the WARNING / Bad page state, and as expected it is
caused by broken error handling (cleanup).

Using the Ubuntu git source, tag Ubuntu-4.4.0-33.52, he was able to reproduce
the "Bad page state" issue, and with the patch on top he could not reproduce
it any more.

------------[ cut here ]------------
WARNING: at /build/linux-o03cxz/linux-4.4.0/arch/s390/include/asm/pci_dma.h:141
Modules linked in: qeth_l2 ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common genwqe_card qeth crc_itu_t qdio ccwgroup vmur dm_multipath dasd_eckd_mod dasd_mod
CPU: 2 PID: 3293 Comm: genwqe_gunzip Not tainted 4.4.0-33-generic #52-Ubuntu
task: 0000000032c7e270 ti: 00000000324e4000 task.ti: 00000000324e4000
Krnl PSW : 0404c00180000000 0000000000156346 (dma_update_cpu_trans+0x9e/0xa8)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS: 00000000324e7bcd 0000000000c3c34a 0000000027628298 000000003215b400
           0000000000000400 0000000000001fff 0000000000000400 0000000116853000
           07000000324e7b1e 0000000000000001 0000000000000001 0000000000000001
           0000000000001000 0000000116854000 0000000000156402 00000000324e7a38
Krnl Code: 000000000015633a: 95001000           cli     0(%r1),0
           000000000015633e: a774ffc3           brc     7,1562c4
          #0000000000156342: a7f40001           brc     15,156344
          >0000000000156346: 92011000           mvi     0(%r1),1
           000000000015634a: a7f4ffbd           brc     15,1562c4
           000000000015634e: 0707               bcr     0,%r7
           0000000000156350: c00400000000       brcl    0,156350
           0000000000156356: eb7ff0500024       stmg    %r7,%r15,80(%r15)
Call Trace:
([<00000000001563e0>] dma_update_trans+0x90/0x228)
 [<00000000001565dc>] s390_dma_unmap_pages+0x64/0x160
 [<00000000001567c2>] s390_dma_free+0x62/0x98
 [<000003ff801310ce>] __genwqe_free_consistent+0x56/0x70 [genwqe_card]
 [<000003ff801316d0>] genwqe_free_sync_sgl+0xf8/0x160 [genwqe_card]
 [<000003ff8012bd6e>] ddcb_cmd_cleanup+0x86/0xa8 [genwqe_card]
 [<000003ff8012c1c0>] do_execute_ddcb+0x110/0x348 [genwqe_card]
 [<000003ff8012c914>] genwqe_ioctl+0x51c/0xc20 [genwqe_card]
 [<000000000032513a>] do_vfs_ioctl+0x3b2/0x518
 [<0000000000325344>] SyS_ioctl+0xa4/0xb8
 [<00000000007b86c6>] system_call+0xd6/0x264
 [<000003ff9e8e520a>] 0x3ff9e8e520a
Last Breaking-Event-Address:
 [<0000000000156342>] dma_update_cpu_trans+0x9a/0xa8
---[ end trace 35996336235145c8 ]---
BUG: Bad page state in process jbd2/dasdb1-8  pfn:3215b
page:000003d100c856c0 count:-1 mapcount:0 mapping:          (null) index:0x0
flags: 0x3fffc0000000000()
page dumped because: nonzero _count

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Frank Haverkamp <haver@linux.vnet.ibm.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 08:25:18 -04:00
Martyn Welch
6ad37567b6 vme: vme_get_size potentially returning incorrect value on failure
The function vme_get_size returns the size of the window to the caller,
however it doesn't check the return value of the call to vme_master_get.

Return 0 on failure rather than anything else.

Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martyn Welch <martyn.welch@collabora.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 08:25:18 -04:00
Ville Syrjälä
c89d5454d4 drm/i915: Fix SKL+ 90/270 degree rotated plane coordinate computation
Pass the framebuffer size in .16 fixed point coordinates to
drm_rect_rotate() since that's what the source coordinates are as well
at this stage. We used to do this part of the computation in integer
coordinates, but that got changed when moving the computation to
happen in the check phase of the operation. Unfortunately I forgot
to shift up the fb width and height appropriately.

With the bogus size we ended up with some negative fb offset, which when
added to the vma offset caused out scanout to start at an offset earlier
than we inteded. Eg. when testing on my SKL I saw a row of incorrect
tiles at the top of my screen.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: b63a16f6cd ("drm/i915: Compute display surface offset in the plane check hook for SKL+")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477325584-23679-1-git-send-email-ville.syrjala@linux.intel.com
Tested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit da064b47c0)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:24:30 +03:00
Tvrtko Ursulin
7e9b3f95d6 drm/i915: Remove two invalid warns
Objects can have multiple VMAs used for display in which
case assertion that objects must not be pinned for display
more times than the current VMA is incorrect.

v2: Commit message update. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 058d88c433 ("drm/i915: Track pinned VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1477413635-3876-1-git-send-email-tvrtko.ursulin@linux.intel.com
(cherry picked from commit 3299e7e434)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:24:15 +03:00
Tvrtko Ursulin
6d9deb910c drm/i915: Rotated view does not need a fence
We do not need to set up a fence for the rotated view.

Display does not need it and no one can access it.

v2: Move code to __i915_vma_set_map_and_fenceable. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 05a20d098d ("drm/i915: Move map-and-fenceable tracking to the VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 07ee2bce6a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:24:04 +03:00
Paulo Zanoni
e3b9e6e3a9 drm/i915/fbc: fix CFB size calculation for gen8+
Broadwell and newer actually compress up to 2560 lines instead of 2048
(as documented in the FBC_CTL page). If we don't take this into
consideration we end up reserving too little stolen memory for the
CFB, so we may allocate something else (such as a ring) right after
what we reserved, and the hardware will overwrite it with the contents
of the CFB when FBC is active, causing GPU hangs. Another possibility
is that the CFB may be allocated at the very end of the available
space, so the CFB will overlap the reserved stolen area, leading to
FIFO underruns.

This bug has always been a problem on BDW (the only affected platform
where FBC is enabled by default), but it's much easier to reproduce
since the following commit:
    commit c58b735fc7
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Thu Aug 18 17:16:57 2016 +0100
        drm/i915: Allocate rings from stolen

Of course, you can only reproduce the bug if your screen is taller
than 2048 lines.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98213
Fixes: a98ee79317 ("drm/i915/fbc: enable FBC by default on HSW and BDW")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477065346-13736-1-git-send-email-paulo.r.zanoni@intel.com
(cherry picked from commit 79f2624b1b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:23:52 +03:00
Daniel Stone
1fb3672eaf drm: i915: Wait for fences on new fb, not old
The previous code would wait for fences on the framebuffer from the old
plane state to complete, rather than the new, so you would see tearing
everywhere. Fix this to wait on the new state before we make it active.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Fixes: 94f050246b ("drm/i915: nonblocking commit")
Cc: stable@vger.kernel.org
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161021144454.6288-1-daniels@collabora.com
(cherry picked from commit 2d2c5ad83f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:23:35 +03:00
Ville Syrjälä
0ce140d45a drm/i915: Clean up DDI DDC/AUX CH sanitation
Now that we use the AUX and GMBUS assignment from VBT for all ports,
let's clean up the sanitization of the port information a bit.
Previosuly we only did this for port E, and only complained about a
non-standard assignment for the other ports. But as we know that
non-standard assignments are a fact of life, let's expand the
sanitization to all the ports.

v2: Include a commit message, fix up the comments a bit
v3: Don't clobber other ports if the current port has no alternate aux ch/ddc pin

Cc: stable@vger.kernel.org
Cc: Maarten Maathuis <madman2003@gmail.com>
Tested-by: Maarten Maathuis <madman2003@gmail.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=97877
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476208368-5710-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jim Bride <jim.bride@linux.intel.com> (v2)
(cherry picked from commit 9454fa871e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:23:21 +03:00
Ville Syrjälä
198c5ee3c6 drm/i915: Respect alternate_aux_channel for all DDI ports
The VBT provides the platform a way to mix and match the DDI ports vs.
AUX channels. Currently we only trust the VBT for DDI E, which has no
corresponding AUX channel of its own. However it is possible that some
board might use some non-standard DDI vs. AUX port routing even for
the other ports. Perhaps for signal routing reasons or something,
So let's generalize this and trust the VBT for all ports.

For now we'll limit this to DDI platforms, as we trust the VBT a bit
more there anyway when it comes to the DDI ports. I've structured
the code in a way that would allow us to easily expand this to
other platforms as well, by simply filling in the ddi_port_info.

v2: Drop whitespace changes, keep MISSING_CASE() for unknown
    aux ch assignment, include a commit message, include debug
    message during init

Cc: stable@vger.kernel.org
Cc: Maarten Maathuis <madman2003@gmail.com>
Tested-by: Maarten Maathuis <madman2003@gmail.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=97877
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476208368-5710-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jim Bride <jim.bride@linux.intel.com>
(cherry picked from commit 8f7ce038f1)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:20:54 +03:00
Paulo Zanoni
5e33791e1f drm/i915/gen9: fix watermarks when using the pipe scaler
Luckily, the necessary adjustments for when we're using the scaler are
exactly the same as the ones needed on ILK+, so just reuse the
function we already have.

v2: Invert the patch order so stable backports get easier.

Cc: stable@vger.kernel.org
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1475872138-16194-1-git-send-email-paulo.r.zanoni@intel.com
(cherry picked from commit cfd7e3a202)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:20:15 +03:00
Imre Deak
fd58753ead drm/i915: Fix mismatched INIT power domain disabling during suspend
Currently the display INIT power domain disabling/enabling happens in a
mismatched way in the suspend/resume_early hooks respectively. This can
leave display power wells incorrectly disabled in the resume hook if the
suspend sequence is aborted for some reason resulting in the
suspend/resume hooks getting called but the suspend_late/resume_early
hooks being skipped. In particular this change fixes "Unclaimed read
from register 0x1e1204" on BYT/BSW triggered from i915_drm_resume()->
intel_pps_unlock_regs_wa() when suspending with /sys/power/pm_test set
to devices.

Fixes: 85e9067933 ("drm/i915: disable power wells on suspend")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: David Weinehall <david.weinehall@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476358446-11621-1-git-send-email-imre.deak@intel.com
(cherry picked from commit 4c494a5769)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:19:43 +03:00
Dan Carpenter
be4dd651ff drm/i915: fix a read size argument
We want to read 3 bytes here, but because the parenthesis are in the
wrong place we instead read:

	sizeof(intel_dp->edp_dpcd) == sizeof(intel_dp->edp_dpcd)

which is one byte.

Fixes: fe5a66f91c ("drm/i915: Read PSR caps/intermediate freqs/etc. only once on eDP")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <drm-intel-fixes@lists.freedesktop.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161013085508.GJ16198@mwanda
(cherry picked from commit f7170e2eb8)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:19:12 +03:00
Chris Wilson
47ed32483e drm/i915: Use fence_write() from rpm resume
During rpm resume we restore the fences, but we do not have the
protection of struct_mutex. This rules out updating the activity
tracking on the fences, and requires us to rely on the rpm as the
serialisation barrier instead.

[  350.298052] [drm:intel_runtime_resume [i915]] Resuming device
[  350.308606]
[  350.310520] ===============================
[  350.315560] [ INFO: suspicious RCU usage. ]
[  350.320554] 4.8.0-rc8-bsw-rapl+ #3133 Tainted: G     U  W
[  350.327208] -------------------------------
[  350.331977] ../drivers/gpu/drm/i915/i915_gem_request.h:371 suspicious rcu_dereference_protected() usage!
[  350.342619]
[  350.342619] other info that might help us debug this:
[  350.342619]
[  350.351593]
[  350.351593] rcu_scheduler_active = 1, debug_locks = 0
[  350.358952] 3 locks held by Xorg/320:
[  350.363077]  #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa030589c>] drm_modeset_lock_all+0x3c/0xd0 [drm]
[  350.375162]  #1:  (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa03058a6>] drm_modeset_lock_all+0x46/0xd0 [drm]
[  350.387022]  #2:  (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffa0305056>] drm_modeset_lock+0x36/0x110 [drm]
[  350.398236]
[  350.398236] stack backtrace:
[  350.403196] CPU: 1 PID: 320 Comm: Xorg Tainted: G     U  W       4.8.0-rc8-bsw-rapl+ #3133
[  350.412457] Hardware name: Intel Corporation CHERRYVIEW C0 PLATFORM/Braswell CRB, BIOS BRAS.X64.X088.R00.1510270350 10/27/2015
[  350.425212]  0000000000000000 ffff8801680a78c8 ffffffff81332187 ffff88016c5c5000
[  350.433611]  0000000000000001 ffff8801680a78f8 ffffffff810ca6da ffff88016cc8b0f0
[  350.442012]  ffff88016cc80000 ffff88016cc80000 ffff880177ad0000 ffff8801680a7948
[  350.450409] Call Trace:
[  350.453165]  [<ffffffff81332187>] dump_stack+0x67/0x90
[  350.458931]  [<ffffffff810ca6da>] lockdep_rcu_suspicious+0xea/0x120
[  350.466002]  [<ffffffffa039e8dd>] fence_update+0xbd/0x670 [i915]
[  350.472766]  [<ffffffffa039efe2>] i915_gem_restore_fences+0x52/0x70 [i915]
[  350.480496]  [<ffffffffa0368f42>] vlv_resume_prepare+0x72/0x570 [i915]
[  350.487839]  [<ffffffffa0369802>] intel_runtime_resume+0x102/0x210 [i915]
[  350.495442]  [<ffffffff8137f26f>] pci_pm_runtime_resume+0x7f/0xb0
[  350.502274]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
[  350.509883]  [<ffffffff814401c5>] __rpm_callback+0x35/0x70
[  350.516037]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
[  350.523646]  [<ffffffff81440224>] rpm_callback+0x24/0x80
[  350.529604]  [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40
[  350.537212]  [<ffffffff814417bd>] rpm_resume+0x4ad/0x740
[  350.543161]  [<ffffffff81441aa1>] __pm_runtime_resume+0x51/0x80
[  350.549824]  [<ffffffffa03889c8>] intel_runtime_pm_get+0x28/0x90 [i915]
[  350.557265]  [<ffffffffa0388a53>] intel_display_power_get+0x23/0x50 [i915]
[  350.565001]  [<ffffffffa03ef23d>] intel_atomic_commit_tail+0xdfd/0x10b0 [i915]
[  350.573106]  [<ffffffffa034b2e9>] ? drm_atomic_helper_swap_state+0x159/0x300 [drm_kms_helper]
[  350.582659]  [<ffffffff81615091>] ? _raw_spin_unlock+0x31/0x50
[  350.589205]  [<ffffffffa034b2e9>] ? drm_atomic_helper_swap_state+0x159/0x300 [drm_kms_helper]
[  350.598787]  [<ffffffffa03ef8a5>] intel_atomic_commit+0x3b5/0x500 [i915]
[  350.606319]  [<ffffffffa03061dc>] ? drm_atomic_set_crtc_for_connector+0xcc/0x100 [drm]
[  350.615209]  [<ffffffffa0306b49>] drm_atomic_commit+0x49/0x50 [drm]
[  350.622242]  [<ffffffffa034dee8>] drm_atomic_helper_set_config+0x88/0xc0 [drm_kms_helper]
[  350.631419]  [<ffffffffa02f94ac>] drm_mode_set_config_internal+0x6c/0x120 [drm]
[  350.639623]  [<ffffffffa02fa94c>] drm_mode_setcrtc+0x22c/0x4d0 [drm]
[  350.646760]  [<ffffffffa02f0f19>] drm_ioctl+0x209/0x460 [drm]
[  350.653217]  [<ffffffffa02fa720>] ? drm_mode_getcrtc+0x150/0x150 [drm]
[  350.660536]  [<ffffffff810c984a>] ? __lock_is_held+0x4a/0x70
[  350.666885]  [<ffffffff81202303>] do_vfs_ioctl+0x93/0x6b0
[  350.672939]  [<ffffffff8120f843>] ? __fget+0x113/0x200
[  350.678797]  [<ffffffff8120f735>] ? __fget+0x5/0x200
[  350.684361]  [<ffffffff81202964>] SyS_ioctl+0x44/0x80
[  350.690030]  [<ffffffff81001deb>] do_syscall_64+0x5b/0x120
[  350.696184]  [<ffffffff81615ada>] entry_SYSCALL64_slow_path+0x25/0x25

Note we also have to remember the lesson from commit 4fc788f5ee
("drm/i915: Flush delayed fence releases after reset") where we have to
flush any changes to the fence on restore.

v2: Replace call to release user mmaps with an assertion that they have
already been zapped.

Fixes: 49ef5294cd ("drm/i915: Move fence tracking from object to vma")
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161012114827.17031-1-chris@chris-wilson.co.uk
(cherry picked from commit 4676dc838b)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:18:51 +03:00
Paulo Zanoni
01c72d6c17 drm/i915/gen9: fix DDB partitioning for multi-screen cases
With the previous code we were only recomputing the DDB partitioning
for the CRTCs included in the atomic commit, so any other active CRTCs
would end up having their DDB registers zeroed. In this patch we make
sure that the computed state starts as a copy of the current
partitioning, and then we only zero the DDBs that we're actually
going to recompute.

How to reproduce the bug:
  1 - Enable the primary plane on pipe A
  2 - Enable the primary plane on pipe B
  3 - Enable the cursor or sprite plane on pipe A

Step 3 will zero the DDB partitioning for pipe B since it's not
included in the commit that enabled the cursor or sprite for pipe A.

I expect this to fix many FIFO underrun problems on gen9+.

v2:
  - Mention the cursor on the steps to reproduce the problem (Paulo).
  - Add Testcase tag provided by Maarten (Maarten).

Testcase: kms_cursor_legacy.cursorA-vs-flipB-atomic-transitions
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96226
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96828
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97450
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97596
Bugzilla: https://www.phoronix.com/scan.php?page=news_item&px=Intel-Skylake-Multi-Screen-Woes
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Lyude <cpaul@redhat.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1475602652-17326-1-git-send-email-paulo.r.zanoni@intel.com
(cherry picked from commit 5a920b85f2)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:18:32 +03:00
Jani Nikula
7a0e17bd5d drm/i915: workaround sparse warning on variable length arrays
Fix sparse warning:

drivers/gpu/drm/i915/intel_device_info.c:195:31: warning: Variable
length array is used.

In truth the array does have constant length, but sparse is too dumb to
realize. This is a bit ugly, but silence the warning no matter what.

Fixes: 91bedd34ab ("drm/i915/bdw: Check for slice, subslice and EU count for BDW")
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1475574853-4178-1-git-send-email-jani.nikula@intel.com
(cherry picked from commit ff64aa1e63)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:17:50 +03:00
Jani Nikula
9cccc76bb4 drm/i915: keep declarations in i915_drv.h
Fix sparse warnings:

drivers/gpu/drm/i915/i915_drv.c:1179:5: warning: symbol
'i915_driver_load' was not declared. Should it be static?

drivers/gpu/drm/i915/i915_drv.c:1267:6: warning: symbol
'i915_driver_unload' was not declared. Should it be static?

drivers/gpu/drm/i915/i915_drv.c:2444:25: warning: symbol 'i915_pm_ops'
was not declared. Should it be static?

Fixes: 42f5551d27 ("drm/i915: Split out the PCI driver interface to i915_pci.c")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1473946137-1931-3-git-send-email-jani.nikula@intel.com
(cherry picked from commit efab0698f9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-10-28 15:17:12 +03:00
Rob Herring
d0f4bce2bc tty: serial_core: fix NULL struct tty pointer access in uart_write_wakeup
Since commit 761ed4a945 ("tty: serial_core: convert uart_close to
use tty_port_close"), the serial console is broken on various systems
and typing "reboot" splats the following on the serial console:

INIT: Sending p[  427.863916] BUG: unable to handle kernel NULL pointer dereference at 00000000000001e0
[  427.885156] IP: [] tty_wakeup+0xc/0x70
[  427.898337] PGD 0 [  427.902051]
[  427.907498] Oops: 0000 [#1] PREEMPT SMP
[  427.917635] Modules linked in: nfsv3 nfs_acl nfs fscache lockd
sunrpc grace edd af_packet cpufreq_conservative cpufreq_userspace
cpufreq_powersave fuse loop md_mod dm_mod joydev hid_generic usbhid
ipmi_ssif ohci_pci ohci_hcd ehci_pci ehci_hcd e1000e ptp firewire_ohci
edac_core pps_core tpm_infineon sp5100_tco firewire_core acpi_cpufreq
serio_raw pcspkr fjes usbcore shpchp edac_mce_amd tpm_tis ipmi_si
tpm_tis_core i2c_piix4 k10temp sg ipmi_msghandler tpm sr_mod button
cdrom kvm_amd kvm irqbypass crc_itu_t ast ttm drm_kms_helper drm
fb_sys_fops sysimgblt sysfillrect syscopyarea i2c_algo_bit scsi_dh_rdac
scsi_dh_alua scsi_dh_emc scsi_dh_hp_sw ata_generic pata_atiixp
[  428.054179] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc1-1.g73e3f23-default #1
[  428.072868] Hardware name: System manufacturer System Product Name/KGP(M)E-D16, BIOS 0902    12/03/2010
[  428.094755] task: ffffffffa2c0d500 task.stack: ffffffffa2c00000
[  428.109717] RIP: 0010:[]  [] tty_wakeup+0xc/0x70
[  428.128407] RSP: 0018:ffff9a1a5fc03df8  EFLAGS: 00010086
[  428.142184] RAX: ffff9a1857258000 RBX: ffffffffa3050ea0 RCX: 0000000000000000
[  428.159649] RDX: 000000000000001b RSI: 0000000000000000 RDI: 0000000000000000
[  428.177109] RBP: ffff9a1a5fc03e08 R08: 0000000000000000 R09: 0000000000000000
[  428.194547] R10: 0000000000021c77 R11: 0000000000000000 R12: ffff9a1857258000
[  428.212002] R13: 0000000000000000 R14: 0000000000000020 R15: 0000000000000020
[  428.229481] FS:  0000000000000000(0000) GS:ffff9a1a5fc00000(0000) knlGS:0000000000000000
[  428.248938] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  428.263726] CR2: 00000000000001e0 CR3: 0000000390c06000 CR4: 00000000000006f0
[  428.281331] Stack:
[  428.288696]  ffffffffa3050ea0 ffff9a1857258000 ffff9a1a5fc03e18 ffffffffa24e0ab1
[  428.307064]  ffff9a1a5fc03e40 ffffffffa24e8865 ffffffffa3050ea0 00000000000000c2
[  428.325456]  0000000000000046 ffff9a1a5fc03e78 ffffffffa24e8a5f ffffffffa3050ea0
[  428.343905] Call Trace:
[  428.352319]   [  428.356216]  [] uart_write_wakeup+0x21/0x30

The problem is for console ports, the serial port is not shutdown and
interrupts may fire after the struct tty is gone. Simply calling the
tty_port helper tty_port_tty_wakeup instead of tty_wakeup directly will
ensure there is a valid struct tty.

Fixes: 761ed4a945 ("tty: serial_core: convert uart_close to use tty_port_close")
Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Mike Galbraith <mgalbraith@suse.de>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-serial@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 08:13:07 -04:00
Geert Uytterhoeven
4dda864d73 tty: serial_core: Fix serial console crash on port shutdown
The port->console flag is always false, as uart_console() is called
before the serial console has been registered.

Hence for a serial port used as the console, uart_tty_port_shutdown()
will still be called when userspace closes the port, powering it down.
This may lead to a system lock up when the serial console driver writes
to the serial port's registers.

To fix this, move the setting of port->console after the call to
uart_configure_port(), which registers the serial console.

Fixes: 761ed4a945 ("tty: serial_core: convert uart_close to use tty_port_close")
Reported-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Rob Herring <robh@kernel.org>
Tested-by: Mugunthan V N <mugunthanvnm@ti.com>
Tested-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
[robh: rebased on tty-linus]
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 08:13:07 -04:00
Richard Genoud
9bcffe7575 tty/serial: at91: fix hardware handshake on Atmel platforms
After commit 1cf6e8fc83 ("tty/serial: at91: fix RTS line management
when hardware handshake is enabled"), the hardware handshake wasn't
functional anymore on Atmel platforms (beside SAMA5D2).

To understand why, one has to understand the flag ATMEL_US_USMODE_HWHS
first:
Before commit 1cf6e8fc83 ("tty/serial: at91: fix RTS line management
when hardware handshake is enabled"), this flag was never set.
Thus, the CTS/RTS where only handled by serial_core (and everything
worked just fine).

This commit introduced the use of the ATMEL_US_USMODE_HWHS flag,
enabling it for all boards when the user space enables flow control.

When the ATMEL_US_USMODE_HWHS is set, the Atmel USART controller
handles a part of the flow control job:
- disable the transmitter when the CTS pin gets high.
- drive the RTS pin high when the DMA buffer transfer is completed or
  PDC RX buffer full or RX FIFO is beyond threshold. (depending on the
  controller version).

NB: This feature is *not* mandatory for the flow control to work.
(Nevertheless, it's very useful if low latencies are needed.)

Now, the specifics of the ATMEL_US_USMODE_HWHS flag:

- For platforms with DMAC and no FIFOs (sam9x25, sam9x35, sama5D3,
sama5D4, sam9g15, sam9g25, sam9g35)* this feature simply doesn't work.
( source: https://lkml.org/lkml/2016/9/7/598 )
Tested it on sam9g35, the RTS pins always stays up, even when RXEN=1
or a new DMA transfer descriptor is set.
=> ATMEL_US_USMODE_HWHS must not be used for those platforms

- For platforms with a PDC (sam926{0,1,3}, sam9g10, sam9g20, sam9g45,
sam9g46)*, there's another kind of problem. Once the flag
ATMEL_US_USMODE_HWHS is set, the RTS pin can't be driven anymore via
RTSEN/RTSDIS in USART Control Register. The RTS pin can only be driven
by enabling/disabling the receiver or setting RCR=RNCR=0 in the PDC
(Receive (Next) Counter Register).
=> Doing this is beyond the scope of this patch and could add other
bugs, so the original (and working) behaviour should be set for those
platforms (meaning ATMEL_US_USMODE_HWHS flag should be unset).

- For platforms with a FIFO (sama5d2)*, the RTS pin is driven according
to the RX FIFO thresholds, and can be also driven by RTSEN/RTSDIS in
USART Control Register. No problem here.
(This was the use case of commit 1cf6e8fc83 ("tty/serial: at91: fix
RTS line management when hardware handshake is enabled"))
NB: If the CTS pin declared as a GPIO in the DTS, (for instance
cts-gpios = <&pioA PIN_PB31 GPIO_ACTIVE_LOW>), the transmitter will be
disabled.
=> ATMEL_US_USMODE_HWHS flag can be set for this platform ONLY IF the
CTS pin is not a GPIO.

So, the only case when ATMEL_US_USMODE_HWHS can be enabled is when
(atmel_use_fifo(port) &&
 !mctrl_gpio_to_gpiod(atmel_port->gpios, UART_GPIO_CTS))

Tested on all Atmel USART controller flavours:
AT91SAM9G35-CM (DMAC flavour), AT91SAM9G20-EK (PDC flavour),
SAMA5D2xplained (FIFO flavour).

* the list may not be exhaustive

Cc: <stable@vger.kernel.org> #4.4+ (beware, missing atmel_port variable)
Fixes: 1cf6e8fc83 ("tty/serial: at91: fix RTS line management when hardware handshake is enabled")
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 08:10:48 -04:00
Ido Yariv
bd768e1466 KVM: x86: fix wbinvd_dirty_mask use-after-free
vcpu->arch.wbinvd_dirty_mask may still be used after freeing it,
corrupting memory. For example, the following call trace may set a bit
in an already freed cpu mask:
    kvm_arch_vcpu_load
    vcpu_load
    vmx_free_vcpu_nested
    vmx_free_vcpu
    kvm_arch_vcpu_free

Fix this by deferring freeing of wbinvd_dirty_mask.

Cc: stable@vger.kernel.org
Signed-off-by: Ido Yariv <ido@wizery.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-10-28 11:35:21 +02:00
Imre Palik
f92b760414 perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
perf doesn't seem to honour the number of fixed counters specified by CPUID
leaf 0xa. It always assumes that Intel CPUs have at least 3 fixed counters.

So if some of the fixed counters are masked out by the hypervisor, it still
tries to check/set them.

This patch makes perf behave nicer when the kernel is running under a
hypervisor that doesn't expose all the counters.

This patch contains some ideas from Matt Wilson.

Signed-off-by: Imre Palik <imrep@amazon.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Kozyrev <alexander.kozyrev@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Artyom Kuanbekov <artyom.kuanbekov@intel.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Wilson <msw@amazon.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1477037939-15605-1-git-send-email-imrep.amz@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28 11:06:25 +02:00
Jiri Olsa
5aab90ce1e perf/powerpc: Don't call perf_event_disable() from atomic context
The trinity syscall fuzzer triggered following WARN() on powerpc:

  WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
  ...
  NIP [c00000000093aedc] .hw_breakpoint_handler+0x28c/0x2b0
  LR [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0
  Call Trace:
  [c0000002f7933580] [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 (unreliable)
  [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
  [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
  [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
  [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
  [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48

Followed by a lockdep warning:

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.8.0-rc5+ #7 Tainted: G        W
  -------------------------------
  ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side critical section!

  other info that might help us debug this:

  rcu_scheduler_active = 1, debug_locks = 0
  2 locks held by ls/2998:
   #0:  (rcu_read_lock){......}, at: [<c0000000000f6a00>] .__atomic_notifier_call_chain+0x0/0x1c0
   #1:  (rcu_read_lock){......}, at: [<c00000000093ac50>] .hw_breakpoint_handler+0x0/0x2b0

  stack backtrace:
  CPU: 9 PID: 2998 Comm: ls Tainted: G        W       4.8.0-rc5+ #7
  Call Trace:
  [c0000002f7933150] [c00000000094b1f8] .dump_stack+0xe0/0x14c (unreliable)
  [c0000002f79331e0] [c00000000013c468] .lockdep_rcu_suspicious+0x138/0x180
  [c0000002f7933270] [c0000000001005d8] .___might_sleep+0x278/0x2e0
  [c0000002f7933300] [c000000000935584] .mutex_lock_nested+0x64/0x5a0
  [c0000002f7933410] [c00000000023084c] .perf_event_ctx_lock_nested+0x16c/0x380
  [c0000002f7933500] [c000000000230a80] .perf_event_disable+0x20/0x60
  [c0000002f7933580] [c00000000093aeec] .hw_breakpoint_handler+0x29c/0x2b0
  [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
  [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
  [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
  [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
  [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48

While it looks like the first WARN() is probably valid, the other one is
triggered by disabling event via perf_event_disable() from atomic context.

The event is disabled here in case we were not able to emulate
the instruction that hit the breakpoint. By disabling the event
we unschedule the event and make sure it's not scheduled back.

But we can't call perf_event_disable() from atomic context, instead
we need to use the event's pending_disable irq_work method to disable it.

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161026094824.GA21397@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28 11:06:25 +02:00
Jiri Olsa
0933840acf perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
CAI Qian reported a crash in the PMU uncore device removal code,
enabled by the CONFIG_DEBUG_TEST_DRIVER_REMOVE=y option:

  https://marc.info/?l=linux-kernel&m=147688837328451

The reason for the crash is that perf_pmu_unregister() tries to remove
a PMU device which is not added at this point. We add PMU devices
only after pmu_bus is registered, which happens in the
perf_event_sysfs_init() call and sets the 'pmu_bus_running' flag.

The fix is to get the 'pmu_bus_running' flag state at the point
the PMU is taken out of the PMU list and remove the device
later only if it's set.

Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161020111011.GA13361@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28 11:06:25 +02:00
Borislav Petkov
1c27f646b1 x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y
We needed the physical address of the container in order to compute the
offset within the relocated ramdisk. And we did this by doing __pa() on
the virtual address.

However, __pa() does checks whether the physical address is within
PAGE_OFFSET and __START_KERNEL_map - see __phys_addr() - which fail
if we have CONFIG_RANDOMIZE_MEMORY enabled: we feed a virtual address
which *doesn't* have the randomization offset into a function which uses
PAGE_OFFSET which *does* have that offset.

This makes this check fire:

	VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x));
			^^^^^^

due to the randomization offset.

The fix is as simple as using __pa_nodebug() because we do that
randomization offset accounting later in that function ourselves.

Reported-by: Bob Peterson <rpeterso@redhat.com>
Tested-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm <linux-mm@kvack.org>
Cc: stable@vger.kernel.org # 4.9
Link: http://lkml.kernel.org/r/20161027123623.j2jri5bandimboff@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-10-28 10:29:59 +02:00
Arnd Bergmann
8ff0513bdc mtd: mtk: avoid warning in mtk_ecc_encode
When building with -Wmaybe-uninitialized, gcc produces a silly false positive
warning for the mtk_ecc_encode function:

drivers/mtd/nand/mtk_ecc.c: In function 'mtk_ecc_encode':
drivers/mtd/nand/mtk_ecc.c:402:15: error: 'val' may be used uninitialized in this function [-Werror=maybe-uninitialized]

The function for some reason contains a double byte swap on big-endian
builds to get the OOB data into the correct order again, and is written
in a slightly confusing way.

Using a simple memcpy32_fromio() to read the data simplifies it a lot
so it becomes more readable and produces no warning. However, the
output might not have 32-bit alignment, so we have to use another
memcpy to avoid taking alignment faults or writing beyond the end
of the array.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: RogerCC Lin <rogercc.lin@mediatek.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-10-28 10:21:23 +02:00
Boris Brezillon
73f907fd5f mtd: nand: Fix data interface configuration logic
When changing from one data interface setting to another, one has to
ensure a specific sequence which is described in the ONFI spec.

One of these constraints is that the CE line has go high after a reset
before a command can be sent with the new data interface setting, which
is not guaranteed by the current implementation.

Rework the nand_reset() function and all the call sites to make sure the
CE line is asserted and released when required.

Also make sure to actually apply the new data interface setting on the
first die.

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Fixes: d8e725dd83 ("mtd: nand: automate NAND timings selection")
Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de>
Tested-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
2016-10-28 09:58:36 +02:00
Fabio Estevam
ce93bedb5e mtd: nand: gpmi: disable the clocks on errors
We should disable the previously enabled GPMI clocks in the error paths.

Also, when gpmi_enable_clk() fails simply return the error
code immediately rather than jumping to to the 'err_out' label.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Acked-by: Han Xu <han.xu@nxp.com>
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
2016-10-28 09:58:05 +02:00
Linus Torvalds
14970f204b Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "20 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
  cris/arch-v32: cryptocop: print a hex number after a 0x prefix
  ipack: print a hex number after a 0x prefix
  block: DAC960: print a hex number after a 0x prefix
  fs: exofs: print a hex number after a 0x prefix
  lib/genalloc.c: start search from start of chunk
  mm: memcontrol: do not recurse in direct reclaim
  CREDITS: update credit information for Martin Kepplinger
  proc: fix NULL dereference when reading /proc/<pid>/auxv
  mm: kmemleak: ensure that the task stack is not freed during scanning
  lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
  latent_entropy: raise CONFIG_FRAME_WARN by default
  kconfig.h: remove config_enabled() macro
  ipc: account for kmem usage on mqueue and msg
  mm/slab: improve performance of gathering slabinfo stats
  mm: page_alloc: use KERN_CONT where appropriate
  mm/list_lru.c: avoid error-path NULL pointer deref
  h8300: fix syscall restarting
  kcov: properly check if we are in an interrupt
  mm/slab: fix kmemcg cache creation delayed issue
2016-10-27 19:58:39 -07:00
Dave Airlie
1cfa126c52 Merge branch 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Two sets of amdgpu fixes as I missed one set.
* 'drm-fixes-4.9' of git://people.freedesktop.org/~agd5f/linux: (23 commits)
  drm/amd/powerplay: fix bug get wrong evv voltage of Polaris.
  drm/amdgpu/si_dpm: workaround for SI kickers
  drm/radeon/si_dpm: workaround for SI kickers
  drm/amdgpu: fix s3 resume back, uvd dpm randomly can't disable.
  drm/radeon: drop register readback in cayman_cp_int_cntl_setup
  drm/amdgpu/vce3: only enable 3 rings on new enough firmware (v2)
  drm/amdgpu: fix fence slab teardown
  drm/amdgpu: update kernel-doc for some functions
  drm/amdgpu: fix a vm_flush fence leak
  drm/amdgpu: fix sched fence slab teardown
  Revert "drm/radeon: fix DP link training issue with second 4K monitor"
  drm/amdgpu/dpm: flush any thermal work on fini
  drm/amdgpu: cancel reset work on fini
  drm/amd/powerplay: don't give up if DPM is already running
  drm/amd/powerplay: fix static checker warning in process_pptables_v1_0.c
  drm/amdgpu: avoid drm error log during S3 on RHEL7.3
  drm/amdgpu: explicitly set pg_flags for ST
  drm/amdgpu/st: move ATC CG golden init from gfx to mc
  drm/amd/amdgpu: expose max engine and memory clock for powerplay enabled case
  drm/amdgpu: move atom scratch register save/restore to common code
  ...
2016-10-28 12:25:01 +10:00
Dave Airlie
aa72c26c2b Merge tag 'drm-misc-fixes-2016-10-27' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes
Set of drm core fixes.

Hopefully fixes a bug in MST unplugs in fbdev.

* tag 'drm-misc-fixes-2016-10-27' of git://anongit.freedesktop.org/git/drm-misc:
  drm/dp/mst: Check peer device type before attempting EDID read
  drm/dp/mst: Clear port->pdt when tearing down the i2c adapter
  drm/fb-helper: Keep references for the current set of used connectors
  drm: Don't force all planes to be added to the state due to zpos
  drm/fb-helper: Fix connector ref leak on error
  drm/fb-helper: Don't call dirty callback for untouched clips
  drm: Release reference from blob lookup after replacing property
2016-10-28 12:24:14 +10:00
Dimitri Sivanich
8e819101ce drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk
Would like to have this be a decimal number.

Link: http://lkml.kernel.org/r/20161026134746.GA30169@sgi.com
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Uwe Kleine-König
17a8893956 cris/arch-v32: cryptocop: print a hex number after a 0x prefix
It makes the result hard to interpret correctly if a base 10 number is
prefixed by 0x.  So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-6-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Uwe Kleine-König
9105585d13 ipack: print a hex number after a 0x prefix
It makes the result hard to interpret correctly if a base 10 number is
prefixed by 0x.  So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-4-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Cc: Jens Taprogge <jens.taprogge@taprogge.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Uwe Kleine-König
ee52c44dee block: DAC960: print a hex number after a 0x prefix
It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x.  So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Uwe Kleine-König
14f947c87a fs: exofs: print a hex number after a 0x prefix
It makes the message hard to interpret correctly if a base 10 number is
prefixed by 0x.  So change to a hex number.

Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.de
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@primarydata.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Daniel Mentz
62e931fac4 lib/genalloc.c: start search from start of chunk
gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.

The shortcut

	if (size > atomic_read(&chunk->avail))
		continue;

makes the loop skip over chunks that do not have enough bytes left to
fulfill the request.  There are two situations, though, where an
allocation might still fail:

(1) The available memory is not contiguous, i.e.  the request cannot
    be fulfilled due to external fragmentation.

(2) A race condition.  Another thread runs the same code concurrently
    and is quicker to grab the available memory.

In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed.  This return value is then assigned to
start_bit.  The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched.  Today, the code fails to reset start_bit to 0.  As a
result, prefixes of subsequent chunks are ignored.  Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.

Fixes: 7f184275aa ("lib, Make gen_pool memory allocator lockless")
Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Johannes Weiner
89a2848381 mm: memcontrol: do not recurse in direct reclaim
On 4.0, we saw a stack corruption from a page fault entering direct
memory cgroup reclaim, calling into btrfs_releasepage(), which then
tried to allocate an extent and recursed back into a kmem charge ad
nauseam:

  [...]
  btrfs_releasepage+0x2c/0x30
  try_to_release_page+0x32/0x50
  shrink_page_list+0x6da/0x7a0
  shrink_inactive_list+0x1e5/0x510
  shrink_lruvec+0x605/0x7f0
  shrink_zone+0xee/0x320
  do_try_to_free_pages+0x174/0x440
  try_to_free_mem_cgroup_pages+0xa7/0x130
  try_charge+0x17b/0x830
  memcg_charge_kmem+0x40/0x80
  new_slab+0x2d9/0x5a0
  __slab_alloc+0x2fd/0x44f
  kmem_cache_alloc+0x193/0x1e0
  alloc_extent_state+0x21/0xc0
  __clear_extent_bit+0x2b5/0x400
  try_release_extent_mapping+0x1a3/0x220
  __btrfs_releasepage+0x31/0x70
  btrfs_releasepage+0x2c/0x30
  try_to_release_page+0x32/0x50
  shrink_page_list+0x6da/0x7a0
  shrink_inactive_list+0x1e5/0x510
  shrink_lruvec+0x605/0x7f0
  shrink_zone+0xee/0x320
  do_try_to_free_pages+0x174/0x440
  try_to_free_mem_cgroup_pages+0xa7/0x130
  try_charge+0x17b/0x830
  mem_cgroup_try_charge+0x65/0x1c0
  handle_mm_fault+0x117f/0x1510
  __do_page_fault+0x177/0x420
  do_page_fault+0xc/0x10
  page_fault+0x22/0x30

On later kernels, kmem charging is opt-in rather than opt-out, and that
particular kmem allocation in btrfs_releasepage() is no longer being
charged and won't recurse and overrun the stack anymore.

But it's not impossible for an accounted allocation to happen from the
memcg direct reclaim context, and we needed to reproduce this crash many
times before we even got a useful stack trace out of it.

Like other direct reclaimers, mark tasks in memcg reclaim PF_MEMALLOC to
avoid recursing into any other form of direct reclaim.  Then let
recursive charges from PF_MEMALLOC contexts bypass the cgroup limit.

Link: http://lkml.kernel.org/r/20161025141050.GA13019@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Martin Kepplinger
8f72cb4ef9 CREDITS: update credit information for Martin Kepplinger
Content and employer changed.

Link: http://lkml.kernel.org/r/1477304102-28830-1-git-send-email-martin.kepplinger@ginzinger.com
Signed-off-by: Martin Kepplinger <martink@posteo.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Leon Yu
06b2849d10 proc: fix NULL dereference when reading /proc/<pid>/auxv
Reading auxv of any kernel thread results in NULL pointer dereferencing
in auxv_read() where mm can be NULL.  Fix that by checking for NULL mm
and bailing out early.  This is also the original behavior changed by
recent commit c531716785 ("proc: switch auxv to use of __mem_open()").

  # cat /proc/2/auxv
  Unable to handle kernel NULL pointer dereference at virtual address 000000a8
  Internal error: Oops: 17 [#1] PREEMPT SMP ARM
  CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
  Hardware name: BCM2709
  task: ea3b0b00 task.stack: e99b2000
  PC is at auxv_read+0x24/0x4c
  LR is at do_readv_writev+0x2fc/0x37c
  Process cat (pid: 113, stack limit = 0xe99b2210)
  Call chain:
    auxv_read
    do_readv_writev
    vfs_readv
    default_file_splice_read
    splice_direct_to_actor
    do_splice_direct
    do_sendfile
    SyS_sendfile64
    ret_fast_syscall

Fixes: c531716785 ("proc: switch auxv to use of __mem_open()")
Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com
Signed-off-by: Leon Yu <chianglungyu@gmail.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Janis Danisevskis <jdanis@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Catalin Marinas
37df49f433 mm: kmemleak: ensure that the task stack is not freed during scanning
Commit 68f24b08ee ("sched/core: Free the stack early if
CONFIG_THREAD_INFO_IN_TASK") may cause the task->stack to be freed
during kmemleak_scan() execution, leading to either a NULL pointer fault
(if task->stack is NULL) or kmemleak accessing already freed memory.

This patch uses the new try_get_task_stack() API to ensure that the task
stack is not freed during kmemleak stack scanning.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=173901.

Fixes: 68f24b08ee ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK")
Link: http://lkml.kernel.org/r/1476266223-14325-1-git-send-email-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: CAI Qian <caiqian@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: CAI Qian <caiqian@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Dmitry Vyukov
02754e0a48 lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB
KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
second level).  Size of each stack is (num_frames + 3) * sizeof(long).
Which gives us ~84K stacks.  This capacity was chosen empirically and it
is enough to run kernel normally.

However, when lots of configs are enabled and a fuzzer tries to maximize
code coverage, it easily hits the limit within tens of minutes.  I've
tested for long a time with number of top level entries bumped 4x
(4096).  And I think I've seen overflow only once.  But I don't have all
configs enabled and code coverage has not reached maximum yet.  So bump
it 8x to 8192.

Since we have two-level table, memory cost of this is very moderate --
currently the top-level table is 8KB, with this patch it is 64KB, which
is negligible under KASAN.

Here is some approx math.

128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches.  Most of
alloc/free call sites are reachable with only one stack.  But some
utility functions can have large fanout.  Assuming average fanout is 5x,
total number of alloc/free stacks is ~300K.

Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Kees Cook
0e07f663c9 latent_entropy: raise CONFIG_FRAME_WARN by default
When building with the latent_entropy plugin, set the default
CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
blocks that, when instrumented by the latent_entropy plugin, grow beyond
1024 byte stack size on 32-bit builds.

Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Masahiro Yamada
c0a0aba8e4 kconfig.h: remove config_enabled() macro
The use of config_enabled() is ambiguous.  For config options,
IS_ENABLED(), IS_REACHABLE(), etc.  will make intention clearer.
Sometimes config_enabled() has been used for non-config options because
it is useful to check whether the given symbol is defined or not.

I have been tackling on deprecating config_enabled(), and now is the
time to finish this work.

Some new users have appeared for v4.9-rc1, but it is trivial to replace
them:

 - arch/x86/mm/kaslr.c
  replace config_enabled() with IS_ENABLED() because
  CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.

 - include/asm-generic/export.h
  replace config_enabled() with __is_defined().

Then, config_enabled() can be removed now.

Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
options, and __is_defined() for non-config symbols.

Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Aristeu Rozanski
8c8d4d4520 ipc: account for kmem usage on mqueue and msg
When kmem accounting switched from account by default to only account if
flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.

The production use case at hand is that mqueues should be customizable
via sysctls in Docker containers in a Kubernetes cluster.  This can only
be safely allowed to the users of the cluster (without the risk that
they can cause resource shortage on a node, influencing other users'
containers) if all resources they control are bounded, i.e.  accounted
for.

Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Reported-by: Stefan Schimanski <sttts@redhat.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Stefan Schimanski <sttts@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Aruna Ramakrishna
07a63c41fa mm/slab: improve performance of gathering slabinfo stats
On large systems, when some slab caches grow to millions of objects (and
many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2
seconds.  During this time, interrupts are disabled while walking the
slab lists (slabs_full, slabs_partial, and slabs_free) for each node,
and this sometimes causes timeouts in other drivers (for instance,
Infiniband).

This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for
total number of allocated slabs per node, per cache.  This counter is
updated when a slab is created or destroyed.  This enables us to skip
traversing the slabs_full list while gathering slabinfo statistics, and
since slabs_full tends to be the biggest list when the cache is large,
it results in a dramatic performance improvement.  Getting slabinfo
statistics now only requires walking the slabs_free and slabs_partial
lists, and those lists are usually much smaller than slabs_full.

We tested this after growing the dentry cache to 70GB, and the
performance improved from 2s to 5ms.

Link: http://lkml.kernel.org/r/1472517876-26814-1-git-send-email-aruna.ramakrishna@oracle.com
Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Joe Perches
1f84a18fc0 mm: page_alloc: use KERN_CONT where appropriate
Recent changes to printk require KERN_CONT uses to continue logging
messages.  So add KERN_CONT where necessary.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 4bcc595ccd ("printk: reinstate KERN_CONT for printing continuation lines")
Link: http://lkml.kernel.org/r/c7df37c8665134654a17aaeb8b9f6ace1d6db58b.1476239034.git.joe@perches.com
Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:43 -07:00
Alexander Polakov
1bc11d70b5 mm/list_lru.c: avoid error-path NULL pointer deref
As described in https://bugzilla.kernel.org/show_bug.cgi?id=177821:

After some analysis it seems to be that the problem is in alloc_super().
In case list_lru_init_memcg() fails it goes into destroy_super(), which
calls list_lru_destroy().

And in list_lru_init() we see that in case memcg_init_list_lru() fails,
lru->node is freed, but not set NULL, which then leads list_lru_destroy()
to believe it is initialized and call memcg_destroy_list_lru().
memcg_destroy_list_lru() in turn can access lru->node[i].memcg_lrus,
which is NULL.

[akpm@linux-foundation.org: add comment]
Signed-off-by: Alexander Polakov <apolyakov@beget.ru>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27 18:43:42 -07:00