Some of the newer MDP5 versions support Source Split of SSPPs. It is a
feature that allows us to route the output of a hwpipe to 2 Layer
Mixers. This is required to achieve the following use cases:
- Dual DSI: For high res DSI panels (such as 2560x1600 etc), a single
DSI interface doesn't have the bandwidth to drive the required pixel
clock. We use 2 DSI interfaces to drive the left and right halves
of the panel (i.e, 1280x1600 each). The MDP5 pipeline here would look
like:
LM0 -- DSPP0 -- INTF1 -- DSI1
/
hwpipe--
\
LM1 -- DSPP1 -- INTF2 -- DSI2
A single hwpipe is used to scan out the left and right halves to DSI1
and DSI2 respectively. In order to do this, we need to configure the
2 Layer Mixers in Source Split mode.
- HDMI 4K: In order to support resolutions with width higher than the
max width supported by a hwpipe, we club 2 hwpipes together:
hwpipe1 --- LM0 -- DSPP0
- - \
- -- 3D Mux -- INTF0 -- HDMI
- - /
hwpipe2 --- LM1 -- DSPP1
hwpipe1 is staged on the 'left' Layer Mixer, and hwpipe2 is staged on
the 'right' Layer Mixer. An additional block called the '3D Mux' is
used to merge the output of the 2 DSPPs to a single interface.
In this use case, it is possible that a 4K surface is downscaled and
placed completely within one of the halves. In order to support such
scenarios (and keep the programming simple), Layer Mixers with Source
Split can be assigned 2 hw pipes per stage. While scanning out, the HW
takes care of fetching the pixels fom the correct pipe.
Add a MDP cap to tell whether the HW supports source split or not.
Add a MDP LM cap that tells whether a LM instance can operate in
source split mode (and generate the 'left' part of the display
output).
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
These are a part of CRTC state, it doesn't feel nice to leave them
hanging in mdp5_ctl struct. Pass mdp5_pipeline pointer instead
wherever it is needed.
We still have some params in mdp5_ctl like start_mask etc which
are derivative of atomic state, and should be rolled back if
a commit fails, but it doesn't seem to cause much trouble.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
In the last few commits, we've been adding params to mdp5_crtc_state, and
assigning them in the atomic_check() funcs. Now it's time to actually
start using them.
Remove the duplicated params from the mdp5_crtc struct, and start using
them in the mdp5_crtc code. The majority of the references to these params
is in code that executes after the atomic swap has occurred, so it's okay
to use crtc->state in them. There are a couple of legacy LM cursor ops that
may not use the updated state, but (I think) it's okay to live with that.
Now that we dynamically allocate a mixer to the CRTC, we can also remove
the static assignment to it in mdp5_crtc_init, and also drop the code that
skipped init-ing WB bound mixers (those will now be rejected by
mdp5_mixer_assign()).
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Things like vblank/err irq masks, mode of operation (command mode or not)
are derivative of the interface and mixer state. Therefore, they need to
be a part of the CRTC state too.
Add them to mdp5_crtc_state, and assign them in the CRTC's atomic_check()
func, so that it can be rolled back to a clean state.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The INTF and CTL used in a display pipeline are going to be maintained as
a part of the CRTC state (i.e, in mdp5_crtc_state).
These entities, however, are currently statically assigned to drm_encoders
(i.e. mdp5_encoder). Since these aren't directly visible to the CRTC, we
assign them to the CRTC state in the encoder's atomic_check() op.
With this approach, we assign portions of CRTC state in two different
places: the layer mixer in CRTC's atomic_check(), and the INTF and CTL
pieces in the encoder's atomic_check() op.
We'd have more options here if the drm core maintained encoder state too,
but the current approach of clubbing everything in CRTC's state works just
fine.
Unlike hwpipes and mixers, we don't need to keep a track of INTF/CTL
assignments in the global atomic state. This is because they're currently
not sharable resources. For example, INTF0 and CTL0 will always be assigned
to one drm_encoder. This can change later when we implement writeback and
want a CRTC to use a CTL for a while, and then release it for others to use
it. Or, when a drm_encoder can switch between using a single INTF vs
2 INTFs.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add the stuff needed to allow dynamically assigning a mixer to a CRTC.
Since mixers are a resource that can be shared across multiple CRTCs, we
need to maintain a 'hwmixer_to_crtc' map in the global atomic state,
acquire the mdp5_kms.state_lock modeset lock and so on.
The mixer is assigned in the CRTC's atomic_check() func, a failure will
result in the new state being cleanly rolled back.
The mixer assignment itself is straightforward, and almost identical to
what we do for hwpipes. We don't need to grab the old hwmixer_to_crtc
state like we do in hwpipes since we don't need to compare anything
with the old state at the moment.
The only LM capability we care about at the moment is whether the mixer
instance can be used to display stuff (i.e, connect to an INTF
downstream).
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Subclass drm_crtc_state so that we can maintain additional state for
our CRTCs.
Add mdp5_pipeline and mdp5_ctl pointers in the subclassed state.
mdp5_pipeline is a grouping of the HW entities that forms the downstream
pipeline for a particular CRTC. It currently contains pointers to
mdp5_interface and mdp5_hw_mixer tied to this CRTC. Later, we will
have 2 hwmixers in this struct. (We could also have 2 intfs if we want
to support dual DSI with Source Split enabled. Implementing that feature
isn't planned at the moment).
The mdp5_pipeline state isn't used at the moment. For now, we just
introduce mdp5_crtc_state and the crtc funcs needed to manage the
subclassed state.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The mdp5_ctl has an 'op_mode' struct which contains info on
the downstream pipeline.
Grouping these params together in a struct doesn't serve much
purpose in the code. Maybe there was a plan to expand this
further that never happened.
Remove the op_mode struct, and place its members directly in
mdp5_ctl. This will help avoid confusion later when I introduce
my own verion of a mdp5 pipeline :)
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
mdp5_interface struct contains data corresponding to a INTF
instance in MDP5 hardware. This sturct is memcpy'd to the
mdp5_encoder struct, and then later to the mdp5_ctl struct.
Instead of copying around interface data, create mdp5_interface
instances in mdp5_init, like how it's done currently done for
pipes and layer mixers. Pass around the interface pointers to
mdp5_encoder and mdp5_ctl. This simplifies the code, and allows
us to decouple encoders from INTFs in the future if needed.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
PingPong ID for a Layer Mixer is already contained in
mdp5_hw_mixer.
This avoids the need to retrieve PP ID using macros
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Use the mdp5_hw_mixer struct in the mdp5_crtc and mdp5_ctl instead of
using the LM index.
Like before, the Layer Mixers are assigned statically to the CRTCs.
The hwmixer(s) will later be dynamically assigned to CRTCs.
For now, ignore the hwmixers that can only do WB.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Create a struct to represent MDP5 Layer Mixer instances. This will
eventually allow us to detach CRTCs from the Layer Mixers, and
generally clean things up a bit.
This is very similar to how hwpipes were previously abstracted away
from drm planes.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The number of Layer Mixers and the downstream blocks (DSPPs and PPs)
connected to each LM can vary with different MDP5 revisions. These
parameters are also static.
Keep the per instance LM data in mdp5_cfg. This will avoid the need
to have macros which identify PP id or DSPP id the LM is connected
to. We don't configure DSPPs at the moment, but keeping the DSPP
instance # here might come handy later.
Also add a 'caps' field that identifies features supported by a
LM instance. Introduce the caps MDP_LM_CAP_DISPLAY and MDP_LM_CAP_WB
that identify whether a LM instance can be used for display or
writeback.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We'd previously moved the pipe_lock spinlock to the hwpipe struct. Bring
it back to mdp5_plane. We will need this because an mdp5_plane in the
future could comprise of 2 hw pipes. It makes more sense to have a single
lock to protect the registers for the hw pipes used by a plane, rather
than trying to take individual locks per hwpipe when committing a
configuration.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
4 macros already defined in hdmi.h,
which is not required to redefine in hdmi_audio.c
Signed-off-by: Vinay Simha BN <simhavcs@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
'SSPP_MAX + 1' is the max number of hwpipes that can be present on a
MDP5 platform. Recently, 2 new cursor hwpipes were added, which
caused overflows in arrays that used SSPP_MAX to represent the number
of elements. Update the SSPP_MAX value to incorporate the extra
hwpipes.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
A recent commit introduces a bug in dsi_mgr_phy_enable. In the non
dual DSI mode, we reset the mdsi (master DSI) PHY. This isn't right
since master and slave DSI exist only in dual DSI mode. For the normal
mode of operation, we should simply reset the PHY of the DSI device
(i.e. msm_dsi) corresponding to the current bridge.
Usage of the wrong DSI pointer also resulted in a static checker
warning. That too is resolved with this fix.
Fixes: b62aa70a98 (drm/msm/dsi: Move PHY operations out of host)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Zero sized buffer objects tend to make various bits of the GEM
infrastructure complain:
WARNING: CPU: 1 PID: 2323 at drivers/gpu/drm/drm_mm.c:389 drm_mm_insert_node_generic+0x258/0x2f0
Modules linked in:
CPU: 1 PID: 2323 Comm: drm-api-test Tainted: G W 4.9.0-rc4-00906-g693af44 #213
Hardware name: Qualcomm Technologies, Inc. DB820c (DT)
task: ffff8000d7353400 task.stack: ffff8000d7720000
PC is at drm_mm_insert_node_generic+0x258/0x2f0
LR is at drm_vma_offset_add+0x4c/0x70
Zero sized buffers serve no appreciable value to the user so disallow
them at create time.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Output the upper 32 bits of a 64 bit iova in the RD_CMDSTREAM_ADDR
section while maintaining backwards compatibility for tools that
only understand 32 bit iovas.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The interrupt status was being cleared before processing the handlers.
a5xx_rbbm_err_irq() was checking the interrupt status again, which would
likely turn out bad because the interrupt status would be 0 (or at least
different). Pass the original status to the function instead.
Also, skip clearing RBBM_AHB_ERROR from the interrupt status. The interrupt
will keep firing until the error source is cleared. Skip the clear to
avoid a storm until the error is cleared in a5xx_rbbm_err_irq().
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
priv->num_aspaces is increased and then checked to see if it still fits
in the priv->aspace array. If it doesn't, we warn and exit but
priv->num_aspaces remains incremented.
Don't incremement the count until we know that it fits in the array.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Instead of checking for a5xx_gpu->gpmu_iova during destroy we
accidently check a5xx_gpu->gpmu_bo.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The core takes care of handling the send_event vs. close() issues, we
can remove that driver code.
Cc: Rob Clark <robdclark@gmail.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
I didn't spot anything that would require ordering here (well not
anywhere else either), and I'm trying to unify at least modern drivers
on one close hook.
Cc: Rob Clark <robdclark@gmail.com>
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The newly added a5xx support fails to build when debugfs is diabled:
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:4: error: 'struct msm_gpu_funcs' has no member named 'show'
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:11: error: 'a5xx_show' undeclared here (not in a function); did you mean 'a5xx_irq'?
This adds a missing #ifdef.
Fixes: b5f103ab98 ("drm/msm: gpu: Add A5XX target support")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Probably a symptom of needing finer grained locking, but if we wait on
the incoming fence-fd (which could come from a different context) while
holding struct_mutex, that blocks retire_worker so gpu fences cannot get
signalled.
This causes a problem if userspace manages to get more than a frame
ahead, leaving the atomic-commit worker blocked waiting on fences that
cannot be signaled because submit is blocked waiting for a fence
signalled from vblank (after the atomic commit which is blocked).
If we start having multiple fence ctxs for the gpu, submit_fence_sync()
would probably need to move outside of struct_mutex as well.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Significantly simplifies things. Also iommu_unmap() can unmap an entire
iova range.
(If backporting to downstream kernel you might need to revert this. Or
at least double check older iommu implementation.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Each of the per-generation callbacks was doing this. Lets just simplify
and move it into toplevel show() fxn.
Signed-off-by: Rob Clark <robdclark@gmail.com>
vfs_llseek will check whether the file mode has
FMODE_LSEEK, no return failure. But ashmem can be
lseek, so add FMODE_LSEEK to ashmem file.
Comment From Greg Hackmann:
ashmem_llseek() passes the llseek() call through to the backing
shmem file. 91360b02ab ("ashmem: use vfs_llseek()") changed
this from directly calling the file's llseek() op into a VFS
layer call. This also adds a check for the FMODE_LSEEK bit, so
without that bit ashmem_llseek() now always fails with -ESPIPE.
Fixes: 91360b02ab ("ashmem: use vfs_llseek()")
Signed-off-by: Shuxiao Zhang <zhangshuxiao@xiaomi.com>
Tested-by: Greg Hackmann <ghackmann@google.com>
Cc: stable <stable@vger.kernel.org> # 3.18+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull sparc fixes from David Miller:
"Several fixes here, mostly having to due with either build errors or
memory corruptions depending upon whether you have THP enabled or not"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: remove unused wp_works_ok macro
sparc32: Export vac_cache_size to fix build error
sparc64: Fix memory corruption when THP is enabled
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
arch/sparc: Avoid DCTI Couples
sparc64: kern_addr_valid regression
sparc64: Add support for 2G hugepages
sparc64: Fix size check in huge_pte_alloc
ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJY5/X8AAoJEED/6hsPKofo8hQH/As3CbihZMysaK6JJTx5oMZw
b3W8p8xVXVu4dKM8WnXa6m5xBDFmOa7eBB+CtT3gP68XnFvMpr/vPmDv6v6i9p8q
7VyALDqqk2fxDmgHEwuETw9XZyuhdyCz/GaINCdnAJs25wTFOA7r0WEW5W8qRJpA
9nQirapdJcknymIch1JqeWlYYmbIaFzT8jItfA9QQ7F9mG4pxC8D1k2D56lNYwTf
FJIgXgkMPe7CPDXmgc/KqT5+iVsc/+SgzP/WdH6bX/007TV71sksxxfz6fIrao0X
RtcL2WIZTXBdSNrvXflHhCfYgogPgCnYp8AsYTIa+IEijcfteJx7UiET47Ne0Ow=
=/SPG
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
KVM: nVMX: initialize PML fields in vmcs02
KVM: nVMX: do not leak PML full vmexit to L1
KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
KVM: arm64: Ensure LRs are clear when they should be
kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
KVM: s390: remove change-recording override support
arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
Pull audit cleanup from Paul Moore:
"A week later than I had hoped, but as promised, here is the audit
uninline-fix we talked about during the last audit pull request.
The patch is slightly different than what we originally discussed as
it made more sense to keep the audit_signal_info() function in
auditsc.c rather than move it and bunch of other related
variables/definitions into audit.c/audit.h.
At some point in the future I need to look at how the audit code is
organized across kernel/audit*, I suspect we could do things a bit
better, but it doesn't seem like a -rc release is a good place for
that ;)
Regardless, this patch passes our tests without problem and looks good
for v4.11"
* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
audit: move audit_signal_info() into kernel/auditsc.c
Merge misc fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: move pcp and lru-pcp draining into single wq
mailmap: update Yakir Yang email address
mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
dax: fix radix tree insertion race
mm, thp: fix setting of defer+madvise thp defrag mode
ptrace: fix PTRACE_LISTEN race corrupting task->state
vmlinux.lds: add missing VMLINUX_SYMBOL macros
mm/page_alloc.c: fix print order in show_free_areas()
userfaultfd: report actual registered features in fdinfo
mm: fix page_vma_mapped_walk() for ksm pages
We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches. This seems more than necessary because both can run
on a single WQ. Both do not block on locks requiring a memory
allocation nor perform any allocations themselves. We will save one
rescuer thread this way.
On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created).
Initially we thought this would be more of a theoretical problem but
Hugh Dickins has reported:
: 4.11-rc has been giving me hangs after hours of swapping load. At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().
This worker should be using WQ_RECLAIM as well in order to guarantee a
forward progress. We can reuse the same one as for lru draining and
vmstat.
Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Yang Li <pku.leo@gmail.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We got need_resched() warnings in swap_cgroup_swapoff() because
swap_cgroup_ctrl[type].length is particularly large.
Reschedule when needed.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While running generic/340 in my test setup I hit the following race. It
can happen with kernels that support FS DAX PMDs, so v4.10 thru
v4.11-rc5.
Thread 1 Thread 2
-------- --------
dax_iomap_pmd_fault()
grab_mapping_entry()
spin_lock_irq()
get_unlocked_mapping_entry()
'entry' is NULL, can't call lock_slot()
spin_unlock_irq()
radix_tree_preload()
dax_iomap_pmd_fault()
grab_mapping_entry()
spin_lock_irq()
get_unlocked_mapping_entry()
...
lock_slot()
spin_unlock_irq()
dax_pmd_insert_mapping()
<inserts a PMD mapping>
spin_lock_irq()
__radix_tree_insert() fails with -EEXIST
<fall back to 4k fault, and die horribly
when inserting a 4k entry where a PMD exists>
The issue is that we have to drop mapping->tree_lock while calling
radix_tree_preload(), but since we didn't have a radix tree entry to
lock (unlike in the pmd_downgrade case) we have no protection against
Thread 2 coming along and inserting a PMD at the same index. For 4k
entries we handled this with a special-case response to -EEXIST coming
from the __radix_tree_insert(), but this doesn't save us for PMDs
because the -EEXIST case can also mean that we collided with a 4k entry
in the radix tree at a different index, but one that is covered by our
PMD range.
So, correctly handle both the 4k and 2M collision cases by explicitly
re-checking the radix tree for an entry at our index once we reacquire
mapping->tree_lock.
This patch has made it through a clean xfstests run with the current
v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
loop. It used to fail within the first 10 iterations.
Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org> [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Setting thp defrag mode of "defer+madvise" actually sets "defer" in the
kernel due to the name similarity and the out-of-order way the string is
checked in defrag_store().
Check the string in the correct order so that
TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG is set appropriately for
"defer+madvise".
Fixes: 21440d7eb9 ("mm, thp: add new defer+madvise defrag option")
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704051814420.137626@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED. This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.
Oleg said:
"The kernel can crash or this can lead to other hard-to-debug problems.
In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
contract. Obviusly it is very wrong to manipulate task->state if this
task is already running, or WAKING, or it sleeps again"
[akpm@linux-foundation.org: coding-style fixes]
Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL")
Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When __{start,end}_ro_after_init is referenced from C code, we run into
the following build errors on blackfin:
kernel/extable.c:169: undefined reference to `__start_ro_after_init'
kernel/extable.c:169: undefined reference to `__end_ro_after_init'
The build error is due to the fact that blackfin is one of the few
arches that prepends an underscore '_' to all symbols defined in C.
Fix this by wrapping __{start,end}_ro_after_init in vmlinux.lds.h with
VMLINUX_SYMBOL(), which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX.
Link: http://lkml.kernel.org/r/1491259387-15869-1-git-send-email-jeyu@redhat.com
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fdinfo for userfault file descriptor reports UFFD_API_FEATURES. Up
until recently, the UFFD_API_FEATURES was defined as 0, therefore
corresponding field in fdinfo always contained zero. Now, with
introduction of several additional features, UFFD_API_FEATURES is not
longer 0 and it seems better to report actual features requested for the
userfaultfd object described by the fdinfo.
First, the applications that were using userfault will still see zero at
the features field in fdinfo. Next, reporting actual features rather
than available features, gives clear indication of what userfault
features are used by an application.
Link: http://lkml.kernel.org/r/1491140181-22121-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doug Smythies reports oops with KSM in this backtrace, I've been seeing
the same:
page_vma_mapped_walk+0xe6/0x5b0
page_referenced_one+0x91/0x1a0
rmap_walk_ksm+0x100/0x190
rmap_walk+0x4f/0x60
page_referenced+0x149/0x170
shrink_active_list+0x1c2/0x430
shrink_node_memcg+0x67a/0x7a0
shrink_node+0xe1/0x320
kswapd+0x34b/0x720
Just as observed in commit 4b0ece6fa0 ("mm: migrate: fix
remove_migration_pte() for ksm pages"), you cannot use page->index
calculations on ksm pages.
page_vma_mapped_walk() is relying on __vma_address(), where a ksm page
can lead it off the end of the page table, and into whatever nonsense is
in the next page, ending as an oops inside check_pte()'s pte_page().
KSM tells page_vma_mapped_walk() exactly where to look for the page, it
does not need any page->index calculation: and that's so also for all
the normal and file and anon pages - just not for THPs and their
subpages. Get out early in most cases: instead of a PageKsm test, move
down the earlier not-THP-page test, as suggested by Kirill.
I'm also slightly worried that this loop can stray into other vmas, so
added a vm_end test to prevent surprises; though I have not imagined
anything worse than a very contrived case, in which a page mlocked in
the next vma might be reclaimed because it is not mlocked in this vma.
Fixes: ace71a19ce ("mm: introduce page_vma_mapped_walk()")
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1704031104400.1118@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Without this fix (and another to the userspace component itself
described later), the kernel will be unable to process any OrangeFS
requests after the userspace component is restarted (due to a crash or
at the administrator's behest).
The bug here is that inside orangefs_remount, the orangefs_request_mutex
is locked. When the userspace component restarts while the filesystem
is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
which causes the kernel to send it a few requests aimed at synchronizing
the state between the two. While this is happening the
orangefs_request_mutex is locked to prevent any other requests going
through.
This is only half of the bugfix. The other half is in the userspace
component which outright ignores(!) requests made before it considers
the filesystem remounted, which is after the ioctl returns. Of course
the ioctl doesn't return until after the userspace component responds to
the request it ignores. The userspace component has been changed to
allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.
Mike Marshall says:
"I've tested this patch against the fixed userspace part. This patch is
real important, I hope it can make it into 4.11...
Here's what happens when the userspace daemon is restarted, without
the patch:
=============================================
[ INFO: possible recursive locking detected ]
[ 4.10.0-00007-ge98bdb3 #1 Not tainted ]
---------------------------------------------
pvfs2-client-co/29032 is trying to acquire lock:
(orangefs_request_mutex){+.+.+.}, at: service_operation+0x3c7/0x7b0 [orangefs]
but task is already holding lock:
(orangefs_request_mutex){+.+.+.}, at: dispatch_ioctl_command+0x1bf/0x330 [orangefs]
CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted 4.10.0-00007-ge98bdb3 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
Call Trace:
__lock_acquire+0x7eb/0x1290
lock_acquire+0xe8/0x1d0
mutex_lock_killable_nested+0x6f/0x6e0
service_operation+0x3c7/0x7b0 [orangefs]
orangefs_remount+0xea/0x150 [orangefs]
dispatch_ioctl_command+0x227/0x330 [orangefs]
orangefs_devreq_ioctl+0x29/0x70 [orangefs]
do_vfs_ioctl+0xa3/0x6e0
SyS_ioctl+0x79/0x90"
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts
commit b8dfa821c2
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Fri Apr 7 12:17:12 2017 +0100
drm: Don't allow interruptions when opening debugfs/crc
It reportedly breaks things, so let's revert now and try again later.
Fixes: b8dfa821c2 ("drm: Don't allow interruptions when opening debugfs/crc")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: David Airlie <airlied@linux.ie>
Cc: dri-devel@lists.freedesktop.org
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
V2: remove **array method, directly fence_put after fence wait.
Signed-off-by: Chunming Zhou <David1.Zhou@amd.com>
Reviewed-by: Christian König <chrstian.koenig@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>