This was supposed to be a mask of all known rings, but it is being used
by execbuffer to filter out invalid rings, and so is instead mapping high
unused values onto valid rings. Instead of a mask of all known rings,
we need it to be the mask of all possible rings.
Fixes: 549f736582 ("drm/i915: Enable SandyBridge blitter ring")
Fixes: de1add3605 ("drm/i915: Decouple execbuf uAPI from internal implementation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301140404.26690-21-chris@chris-wilson.co.uk
We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.
v2: Stop rolling the bits into a chain and just use a flag in case this
request or any of our dependencies use a semaphore. The rolling around
was contagious as Tvrtko was heard to fall off his chair.
Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk
Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.
However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).
As might be reasonably expected, HW semaphores excel in inter-engine
synchronisation microbenchmarks (where the 3x reduced latency / increased
throughput more than offset the power cost of spinning on a second ring)
and have significant improvement (can be up to ~10%, most see no change)
for single clients that utilize multiple engines (typically media players
and transcoders), without regressing multiple clients that can saturate
the system or changing the power envelope dramatically.
v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.
Testcase: igt/gem_exec_whisper
Testcase: igt/benchmarks/gem_wsim
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-3-chris@chris-wilson.co.uk
In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until its use across entire system has completed,
as any other timeline active on the GPU may still refer back to the
already retired timeline. We both have to delay recycling available
cachelines and unpinning old HWSP until the next idle point.
An easy option would be to simply keep all used HWSP until the system as
a whole was idle, i.e. we could release them all at once on parking.
However, on a busy system, we may never see a global idle point,
essentially meaning the resource will be leaked until we are forced to
do a GC pass. We already employ a fine-grained idle detection mechanism
for vma, which we can reuse here so that each cacheline can be freed
immediately after the last request using it is retired.
v3: Keep track of the activity of each cacheline.
v4: cacheline_free() on canceling the seqno tracking
v5: Finally with a testcase to exercise wraparound
v6: Pack cacheline into empty bits of page-aligned vaddr
v7: Use i915_utils to hide the pointer casting around bit manipulation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-2-chris@chris-wilson.co.uk
On unwinding the active request we give it a small (limited to internal
priority levels) boost to prevent it from being gazumped a second time.
However, this means that it can be promoted to above the request that
triggered the preemption request, causing a preempt-to-idle cycle for no
change. We can avoid this if we take the boost into account when
checking if the preemption request is valid.
v2: After preemption the active request will be after the preemptee if
they end up with equal priority.
v3: Tvrtko pointed out that this, the existing logic, makes
I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
v5: Except not all priorities were made equal, and the WAIT not preempting
is only if we start off as !NEWCLIENT.
v6: More commentary after coming to an understanding about what I had
forgotten to say.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-1-chris@chris-wilson.co.uk
One important patch:
- Fix for a memory corruption issue in the Intel VT-d driver
that triggers on hardware with deep PCI hierarchies
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJceWO/AAoJECvwRC2XARrjBtQQAJ5w8QuiuPsukHSE6Qt7OqAM
2W0B6J4Mb2LoY08Zfh0eG59WrJZU+vZSgi/NoKU7VJJFQmlD/Y+DMFdkzCpBfGXq
NP1lCvKlcAaKi81PnAr2LFR7jMr4n5j7kPl6DVizFuBztwZcoYXXesNcbemMJLfo
NMCYDlo8qyAf9+ETY6UKa5rbr6lJZCGodiNQazkAsVAAs5LNlYoXZKlD4/SYElmL
Kj7OFeKD2TgtmD1MKA6LdP3MfjP3HvI9nXO/+20LZdDxJoBIkD/4SWCrjxtzny84
z85ypxwGZahKRZwKNSvjKMNaQJfu/S9uiN2yx4IZI8prYG5Po7lB4bI/Ol420Ze5
oKdMjFQ4mfKswm3fBOwlMpCJr41jwY1oWjdtLwHNXX3iwCh7EoGTKzQpjqH7GAa2
iSzlVhQS/o2FS7OcklYtmHnOwgbM1t3J4r3viAPpyVjkQh1RCnw3RBAsxM6ta3Rn
BFVzoTdf3oDAyTVhBpfbPIGCmy3BD7KBanarYPXAKdSUn94UNj2qe6e1ePsGtESk
Xmj9eHxO2eJXo1CndWB+kElGrdGT0WuRE2kYI9A3PzNPgR504yqyPcTUaJ7A74l6
iubggAE6SY6QyLVTLUe25x9papjudLELlG+bAFKnjpQ9IxL34X2ExIZEYc9K9jsu
OrgXdHKFjnhHvlMUNs/b
=H9KF
-----END PGP SIGNATURE-----
Merge tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"One important fix for a memory corruption issue in the Intel VT-d
driver that triggers on hardware with deep PCI hierarchies"
* tag 'iommu-fix-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dmar: Fix buffer overflow during PCI bus notification
Merge misc fixes from Andrew Morton:
"2 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
hugetlbfs: fix races and page leaks during migration
kasan: turn off asan-stack for clang-8 and earlier
hugetlb pages should only be migrated if they are 'active'. The
routines set/clear_page_huge_active() modify the active state of hugetlb
pages.
When a new hugetlb page is allocated at fault time, set_page_huge_active
is called before the page is locked. Therefore, another thread could
race and migrate the page while it is being added to page table by the
fault code. This race is somewhat hard to trigger, but can be seen by
strategically adding udelay to simulate worst case scheduling behavior.
Depending on 'how' the code races, various BUG()s could be triggered.
To address this issue, simply delay the set_page_huge_active call until
after the page is successfully added to the page table.
Hugetlb pages can also be leaked at migration time if the pages are
associated with a file in an explicitly mounted hugetlbfs filesystem.
For example, consider a two node system with 4GB worth of huge pages
available. A program mmaps a 2G file in a hugetlbfs filesystem. It
then migrates the pages associated with the file from one node to
another. When the program exits, huge page counts are as follows:
node0
1024 free_hugepages
1024 nr_hugepages
node1
0 free_hugepages
1024 nr_hugepages
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool
That is as expected. 2G of huge pages are taken from the free_hugepages
counts, and 2G is the size of the file in the explicitly mounted
filesystem. If the file is then removed, the counts become:
node0
1024 free_hugepages
1024 nr_hugepages
node1
1024 free_hugepages
1024 nr_hugepages
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G 2.0G 2.0G 50% /var/opt/hugepool
Note that the filesystem still shows 2G of pages used, while there
actually are no huge pages in use. The only way to 'fix' the filesystem
accounting is to unmount the filesystem
If a hugetlb page is associated with an explicitly mounted filesystem,
this information in contained in the page_private field. At migration
time, this information is not preserved. To fix, simply transfer
page_private from old to new page at migration time if necessary.
There is a related race with removing a huge page from a file and
migration. When a huge page is removed from the pagecache, the
page_mapping() field is cleared, yet page_private remains set until the
page is actually freed by free_huge_page(). A page could be migrated
while in this state. However, since page_mapping() is not set the
hugetlbfs specific routine to transfer page_private is not called and we
leak the page count in the filesystem.
To fix that, check for this condition before migrating a huge page. If
the condition is detected, return EBUSY for the page.
Link: http://lkml.kernel.org/r/74510272-7319-7372-9ea6-ec914734c179@oracle.com
Link: http://lkml.kernel.org/r/20190212221400.3512-1-mike.kravetz@oracle.com
Fixes: bcc5422230 ("mm: hugetlb: introduce page_huge_active")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
[mike.kravetz@oracle.com: v2]
Link: http://lkml.kernel.org/r/7534d322-d782-8ac6-1c8d-a8dc380eb3ab@oracle.com
[mike.kravetz@oracle.com: update comment and changelog]
Link: http://lkml.kernel.org/r/420bcfd6-158b-38e4-98da-26d0cd85bd01@oracle.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Building an arm64 allmodconfig kernel with clang results in over 140
warnings about overly large stack frames, the worst ones being:
drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'
None of these happen with gcc today, and almost all of these are the
result of a single known issue in llvm. Hopefully it will eventually
get fixed with the clang-9 release.
In the meantime, the best idea I have is to turn off asan-stack for
clang-8 and earlier, so we can produce a kernel that is safe to run.
I have posted three patches that address the frame overflow warnings
that are not addressed by turning off asan-stack, so in combination with
this change, we get much closer to a clean allmodconfig build, which in
turn is necessary to do meaningful build regression testing.
It is still possible to turn on the CONFIG_ASAN_STACK option on all
versions of clang, and it's always enabled for gcc, but when
CONFIG_COMPILE_TEST is set, the option remains invisible, so
allmodconfig and randconfig builds (which are normally done with a
forced CONFIG_COMPILE_TEST) will still result in a mostly clean build.
Link: http://lkml.kernel.org/r/20190222222950.3997333-1-arnd@arndb.de
Link: https://bugs.llvm.org/show_bug.cgi?id=38809
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Qian Cai <cai@lca.pw>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJceMUnAAoJEAx081l5xIa+n40P/iFOvzFXDdCaEf21XNdtH1Jk
gjMBeMwxKdrQCsknaQhADtX/IwoSoUyu50jRhkdTQrhCwprkFFnq3LBt11QmYv/k
Wj94D0yjIQAgnt+/rXi8ZKvH+MDZxmwGR+l9gRfKLywtreNaeLqjn875wkhzy6mB
DgyfB4Xuyrpgg9K0yDbVdSQrxmHJuEx3VYVF/tvbL8HLFbxjhP2PV3eYhNxGC9Kj
Ft1ulGZCfT8xQnxc1n/Bb+CHkQX9ZH2sVv8uezkaItEA7okboU71RCkMd5MjZ4AD
HcmJEEpYv2dJPdGpnjrn2xfHKB/HEnBaGeFAIsfmhat5paYh7DV9gATFwQ8ZcCQ8
HzX5/nD7rFA8G9gGjjVXYzCgX6egNtY6Oj35/IHfFJnKoqLeJvv7qkpX4OMUdcP8
dB5TR+S6bD1VCDj6Wi7lFfA9uRBkpLbyDUhDnFS/fKvD0Ecw/BopwRnhTPn+lqtA
50//WT5QZGQ+aRzzSxlIyfXJAyZBT2lr09r97G5noE0bal5iT8pc0xRiz6EHVrAd
y9J2yLOpLrPym1av5RB5Nnj7Pohup9WPf29T7T5XbQ8kxuk+sK5EJt8NqD8KIyU6
3RUG3cWlJGlWmX+V9GiMV/DnrfSZMnif0HUhxCxsDPmI0BHbQ3kn8vWtw9SAbHN2
nFe8B5Ok057cb3FR8uYC
=TtJZ
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Three final fixes, one for a feature that is new in this kernel, one
bochs fix for qemu riscv and one atomic modesetting fix.
I've left a few of the other late fixes until next as I didn't want to
throw in anything that wasn't really necessary"
* tag 'drm-fixes-2019-03-01' of git://anongit.freedesktop.org/drm/drm:
drm/bochs: Fix the ID mismatch error
drm: Block fb changes for async plane updates
drm/amd/display: Use vrr friendly pageflip throttling in DC.
The icl wm1+ underrun w/a has been added to the spec. It changed
slightly from the previous incarnation by requiring that we mirror
the lines watermark and the ignore lines bit from WM0 into WM1.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228173639.18422-1-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Clint Taylor <Clinton.A.Taylor@intel.com>
In bpf/syscall.c, map_create() first set map->usercnt to 1, a file
descriptor is supposed to return to userspace. When bpf_map_new_fd()
fails, drop the refcount.
Fixes: bd5f5f4ecb ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: Peng Sun <sironhide0null@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
A simple mutex used for guarding the flow of requests in and out of the
timeline. In the short-term, it will be used only to guard the addition
of requests into the timeline, taken on alloc and released on commit so
that only one caller can construct a request into the timeline
(important as the seqno and ring pointers must be serialised). This will
be used by observers to ensure that the seqno/hwsp is stable. Later,
when we have reduced retiring to only operate on a single timeline at a
time, we can then use the mutex as the sole guard required for retiring.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301110547.14758-2-chris@chris-wilson.co.uk
- Fix 16b cmpxchg() operations which could erroneously fail if bits 15:8
of the old value are non-zero. In practice I'm not aware of any actual
users of 16b cmpxchg() on MIPS, but this fixes the support for it was
was introduced in v4.13.
- Provide a struct device to dma_alloc_coherent for Lantiq XWAY systems
with a "Voice MIPS Macro Core" (VMMC) device.
- Provide DMA masks for BCM63xx ethernet devices, fixing a regression
introduced in v4.19.
- Fix memblock reservation for the kernel when the system has a non-zero
PHYS_OFFSET, correcting the memblock conversion performed in v4.20.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXHhqjBUccGF1bC5idXJ0
b25AbWlwcy5jb20ACgkQPqefrLV1AN3ZaAD/SFgi3dS9bSWhDhiy83llLaWiCGPb
i09uzo3rpWoKSwQBAIwLEfmaHz/sYdliKRlE13uaxYWzwaN+VHXIPjlzbYMB
=cDlO
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A few more MIPS fixes:
- Fix 16b cmpxchg() operations which could erroneously fail if bits
15:8 of the old value are non-zero. In practice I'm not aware of
any actual users of 16b cmpxchg() on MIPS, but this fixes the
support for it was was introduced in v4.13.
- Provide a struct device to dma_alloc_coherent for Lantiq XWAY
systems with a "Voice MIPS Macro Core" (VMMC) device.
- Provide DMA masks for BCM63xx ethernet devices, fixing a regression
introduced in v4.19.
- Fix memblock reservation for the kernel when the system has a
non-zero PHYS_OFFSET, correcting the memblock conversion performed
in v4.20"
* tag 'mips_fixes_5.0_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: fix memory setup for platforms with PHYS_OFFSET != 0
MIPS: BCM63XX: provide DMA masks for ethernet devices
MIPS: lantiq: pass struct device to DMA API functions
MIPS: fix truncation in __cmpxchg_small for short values
Upon setting the cmode on 6390 and 6390X, the associated serdes
interfaces must be powered off/on.
Both 6390X and 6390 share code to do so, but it currently uses the 6390
specific helper mv88e6390_serdes_power() to disable and enable the
serdes interface.
This call will fail silently on 6390X when trying so set a 10G interface
such as XAUI or RXAUI, since mv88e6390_serdes_power() internally grabs
the lane number based on modes supported by the 6390, and returns 0 when
getting -ENODEV as a lane number.
Using mv88e6390x_serdes_power() should be safe here, since we explicitly
rule-out all ports but the 9 and 10, and because modes supported by 6390
ports 9 and 10 are a subset of those supported on 6390X.
This was tested on 6390X using RXAUI mode.
Fixes: 364e9d7776 ("net: dsa: mv88e6xxx: Power on/off SERDES on cmode change")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.
The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).
v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.
v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
unwind), applying it earlier during submit causes out-of-order execution
combined with execute fences.
v5: Call sw_fence_fini for our dummy request (Matthew)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
The switch maintains u64 counters for the number of octets sent and
received. These are kept as two u32's which need to be combined. Fix
the combing, which wrongly worked on u16's.
Fixes: 80c4627b27 ("dsa: mv88x6xxx: Refactor getting a single statistic")
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Occasionally, during the disconnection procedure on XenBus which
includes hash cache deinitialization there might be some packets
still in-flight on other processors. Handling of these packets includes
hashing and hash cache population that finally results in hash cache
data structure corruption.
In order to avoid this we prevent hashing of those packets if there
are no queues initialized. In that case RCU protection of queues guards
the hash cache as well.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zero-copy callback flag is not yet set on frag list skb at the moment
xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
leaking grant ref mappings since xenvif_zerocopy_callback() is never
called for these fragments. Those eventually build up and cause Xen
to kill Dom0 as the slots get reused for new mappings:
"d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005"
That behavior is observed under certain workloads where sudden spikes
of page cache writes coexist with active atomic skb allocations from
network traffic. Additionally, rework the logic to deal with frag_list
deallocation in a single place.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to Documentation/core-api/printk-formats.rst, size_t should be
printed with %zu, rather than %Zu.
In addition, using %Zu triggers a warning on clang (-Wformat-extra-args):
net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args]
__func__, asoc, max_data);
~~~~~~~~~~~~~~~~^~~~~~~~~
./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited'
printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited'
printk(fmt, ##__VA_ARGS__); \
~~~ ^
Fixes: 5b5e0928f7 ("lib/vsprintf.c: remove %Z support")
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Signed-off-by: Matthias Maennich <maennich@google.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It can be reproduced by following steps:
1. virtio_net NIC is configured with gso/tso on
2. configure nginx as http server with an index file bigger than 1M bytes
3. use tc netem to produce duplicate packets and delay:
tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90%
4. continually curl the nginx http server to get index file on client
5. BUG_ON is seen quickly
[10258690.371129] kernel BUG at net/core/skbuff.c:4028!
[10258690.371748] invalid opcode: 0000 [#1] SMP PTI
[10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2
[10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202
[10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea
[10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002
[10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028
[10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000
[10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0
[10258690.372094] Call Trace:
[10258690.372094] <IRQ>
[10258690.372094] skb_to_sgvec+0x11/0x40
[10258690.372094] start_xmit+0x38c/0x520 [virtio_net]
[10258690.372094] dev_hard_start_xmit+0x9b/0x200
[10258690.372094] sch_direct_xmit+0xff/0x260
[10258690.372094] __qdisc_run+0x15e/0x4e0
[10258690.372094] net_tx_action+0x137/0x210
[10258690.372094] __do_softirq+0xd6/0x2a9
[10258690.372094] irq_exit+0xde/0xf0
[10258690.372094] smp_apic_timer_interrupt+0x74/0x140
[10258690.372094] apic_timer_interrupt+0xf/0x20
[10258690.372094] </IRQ>
In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's
linear data size and nonlinear data size, thus BUG_ON triggered.
Because the skb is cloned and a part of nonlinear data is split off.
Duplicate packet is cloned in netem_enqueue() and may be delayed
some time in qdisc. When qdisc len reached the limit and returns
NET_XMIT_DROP, the skb will be retransmit later in write queue.
the skb will be fragmented by tso_fragment(), the limit size
that depends on cwnd and mss decrease, the skb's nonlinear
data will be split off. The length of the skb cloned by netem
will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(),
the BUG_ON trigger.
To fix it, netem returns NET_XMIT_SUCCESS to upper stack
when it clones a duplicate packet.
Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()")
Signed-off-by: Sheng Lan <lansheng@huawei.com>
Reported-by: Qin Ji <jiqin.ji@huawei.com>
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix NULL ptr crash for a special test case
- Align max segment size with logical block size to prevent bugs in
v5.1-rc1.
MMC host:
- cqhci: Minor fixes
- tmio: Prevent interrupt storm
- tmio: Fixup SD/MMC card initialization
- spi: Allow card to be detected during probe
- sdhci-esdhc-imx: Fixup fix for ERR004536
-----BEGIN PGP SIGNATURE-----
iQJLBAABCgA1FiEEugLDXPmKSktSkQsV/iaEJXNYjCkFAlx37bkXHHVsZi5oYW5z
c29uQGxpbmFyby5vcmcACgkQ/iaEJXNYjCkaiQ//YvZnvb53gzrtUSIooF5unN9V
EdwEjktonM9L60PolyUBksVWlHcALV7jCTuDxXRBsUgB98LIj7gwDLhbgatPFbV3
dlCYY2EXW4IYvyfgAFQJ7hd9OalZftCrqvVU1T0sfetYhdP/um5cwQ2yxt/GjhTz
fXbAjN4Gmn1PQheH9V1KM48EvNuotTsmrERbmFbrdDzWq8WV167A0nXd689qsd5G
rkoL82/CcmfbnVOlDUyaYP/6YCRy+pJyTCoKinU49UHImsUvxebL2Ow3B+sp//eR
I7t/ywpkoBvmKesQd4wPYFgobX0F6r7bFFlWQLqYmKtRNbZEPcpbchbUpqEr5npL
rkRv4k5eg8iivmwfJH9i87A7fFsVE57iBzm8TS3dIseuGZsHCZlD6V+L5VJbTkex
BSDVMETtugLTwGfwdjC+JHwbLIyAQMai+uMNJrZ3S93Bzlheqqhb7SaOx+J1fMx8
ij8YKNTBf8Xo1BhnQPQbpIzOkbtzgtGfsjWVWrZygKDb5BL4IReqeUeyXPUMbCFO
+cIKe9SuFdK95nCTT9QuOmFvHY6oNaJSjW5NtDzcTPPYZ0yWjxbOcfep6o0Lp131
oIr2oF0jZdHiYZNCqk4HL1JBfsgG1Rb93H3ohHdux9zAZsRAv/VtQ125NTlCI/lT
UgXNyLmZplXf4fiQa7Q=
=82mm
-----END PGP SIGNATURE-----
Merge tag 'mmc-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- Fix NULL ptr crash for a special test case
- Align max segment size with logical block size to prevent bugs in
v5.1-rc1.
MMC host:
- cqhci: Minor fixes
- tmio: Prevent interrupt storm
- tmio: Fixup SD/MMC card initialization
- spi: Allow card to be detected during probe
- sdhci-esdhc-imx: Fixup fix for ERR004536"
* tag 'mmc-v5.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: correct the fix of ERR004536
mmc: core: align max segment size with logical block size
mmc: cqhci: Fix a tiny potential memory leak on error condition
mmc: cqhci: fix space allocated for transfer descriptor
mmc: core: Fix NULL ptr crash from mmc_should_fail_request
mmc: tmio: fix access width of Block Count Register
mmc: tmio_mmc_core: don't claim spurious interrupts
mmc: spi: Fix card detection during probe
Pull crypto fixes from Herbert Xu:
"This fixes a compiler warning introduced by a previous fix, as well as
two crash bugs on ARM"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: sha512/arm - fix crash bug in Thumb2 build
crypto: sha256/arm - fix crash bug in Thumb2 build
crypto: ccree - add missing inline qualifier
debugfs can now report an error code if something went wrong instead of
just NULL. So if the return value is to be used as a "real" dentry, it
needs to be checked if it is an error before dereferencing it.
This is now happening because of ff9fb72bc0 ("debugfs: return error
values, not NULL"). syzbot has found a way to trigger multiple debugfs
files attempting to be created, which fails, and then the error code
gets passed to dentry_path_raw() which obviously does not like it.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Reported-and-tested-by: syzbot+7857962b4d45e602b8ad@syzkaller.appspotmail.com
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We currently use a worker queued from an rcu callback to determine when
a how grace period has elapsed while we remained idle. We use this idle
delay to infer that we will be idle for a while and this is a suitable
point at which we can trim our global memory caches.
Since we wrote that, this mechanism now exists as rcu_work, and having
converted the idle shrinkers over to using that, we can remove our own
variant.
v2: Say goodbye to gt.epoch as well.
v3: Remove the misplaced and redundant comment before parking globals
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-3-chris@chris-wilson.co.uk
As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.
The benefit for us is much reduced pointer dancing along the frequent
allocation paths.
v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
Add an of_node_put when a tested device node is not available.
The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
identifier f;
local idexpression e;
expression x;
@@
e = f(...);
... when != of_node_put(e)
when != x = e
when != e = x
when any
if (<+...of_device_is_available(e)...+>) {
... when != of_node_put(e)
(
return e;
|
+ of_node_put(e);
return ...;
)
}
// </smpl>
Fixes: db878f76b9 ("tee: optee: take DT status property into account")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
If we have parked, then we must have passed an idleness test and still
be idle. We chose not to use this shortcut in the past so that we could
use the idleness test at any time and inspect HW. However, some HW like
Sandybridge, doesn't like being woken up frivolously, so avoid doing so.
References: 0b702dca26 ("drm/i915: Avoid waking the engines just to check if they are idle")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190227214159.7946-1-chris@chris-wilson.co.uk
This reverts commit 0b702dca26.
CI reports that this is not as reliable as it first appears, with
failures starting to sporadically occur in selftests.
Fixes: 0b702dca26 ("drm/i915: Avoid waking the engines just to check if they are idle")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190227204654.14907-1-chris@chris-wilson.co.uk
On big endian arm64 kernels, the xchacha20-neon and xchacha12-neon
self-tests fail because hchacha_block_neon() outputs little endian words
but the C code expects native endianness. Fix it to output the words in
native endianness (which also makes it match the arm32 version).
Fixes: cc7cf991e9 ("crypto: arm64/chacha20 - add XChaCha20 support")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The change to encrypt a fifth ChaCha block using scalar instructions
caused the chacha20-neon, xchacha20-neon, and xchacha12-neon self-tests
to start failing on big endian arm64 kernels. The bug is that the
keystream block produced in 32-bit scalar registers is directly XOR'd
with the data words, which are loaded and stored in native endianness.
Thus in big endian mode the data bytes end up XOR'd with the wrong
bytes. Fix it by byte-swapping the keystream words in big endian mode.
Fixes: 2fe55987b2 ("crypto: arm64/chacha - use combined SIMD/ALU routine for more speed")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are two array out-of-bounds memory accesses, one in
cipso_v4_map_lvl_valid(), the other in netlbl_bitmap_walk(). Both
errors are embarassingly simple, and the fixes are straightforward.
As a FYI for anyone backporting this patch to kernels prior to v4.8,
you'll want to apply the netlbl_bitmap_walk() patch to
cipso_v4_bitmap_walk() as netlbl_bitmap_walk() doesn't exist before
Linux v4.8.
Reported-by: Jann Horn <jannh@google.com>
Fixes: 446fda4f26 ("[NetLabel]: CIPSOv4 engine")
Fixes: 3faa8f982f ("netlabel: Move bitmap manipulation functions to the NetLabel core.")
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_input_rcu expects the original ingress device (e.g., for
proper multicast handling). The skb->dev can be changed by l3mdev_ip_rcv,
so dev needs to be saved prior to calling it. This was the behavior prior
to the listify changes.
Fixes: 5fa12739a5 ("net: ipv4: listify ip_rcv_finish")
Cc: Edward Cree <ecree@solarflare.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
selftests: pmtu: fix and increase coverage
This series includes a fixup for the pmtu.sh test script, related to IPv6
address management, and adds coverage for the recently reported and fixed
PMTU exception issue
v2 -> v3:
- more cleanups
v1 -> v2:
- several script cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a couple of new tests, explicitly checking that the kernel
timely releases PMTU exceptions on related device removal.
This is mostly a regression test vs the issue fixed by
commit f5b51fe804 ("ipv6: route: purge exception on removal")
Only 2 new test cases have been added, instead of extending all
the existing ones, because the reproducer requires executing
several commands and would slow down too much the tests otherwise.
v2 -> v3:
- more cleanup, still from Stefano
v1 -> v2:
- several script cleanups, as suggested by Stefano
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise, the configured IPv6 address could be still "tentative"
at test time, possibly causing tests failures.
We can also drop some sleep along the code and decrease the
timeout for most commands so that the test runtime decreases.
v1 -> v2:
- fix comment (Stefano)
Fixes: d1f1b9cbf3 ("selftests: net: Introduce first PMTU test")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to dp83640 delay after soft reset
is needed to set up registers correctly.
Signed-off-by: Max Uvarov <muvarov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running RISC-V QEMU with the Bochs device attached via PCIe the
probe of the Bochs device fails with:
[drm:bochs_hw_init] *ERROR* ID mismatch
This was introduced by this commit:
7780eb9ce8 bochs: convert to drm_dev_register
To fix the error we ensure that pci_enable_device() is called before
bochs_load().
Fixes: 7780eb9ce8 ("bochs: convert to drm_dev_register")
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reported-by: David Abdurachmanov <david.abdurachmanov@gmail.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20190221003231.31625-1-alistair.francis@wdc.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The prepare_fb call always happens on new_plane_state.
The drm_atomic_helper_cleanup_planes checks to see if
plane state pointer has changed when deciding to call cleanup_fb on
either the new_plane_state or the old_plane_state.
For a non-async atomic commit the state pointer is swapped, so this
helper calls prepare_fb on the new_plane_state and cleanup_fb on the
old_plane_state. This makes sense, since we want to prepare the
framebuffer we are going to use and cleanup the the framebuffer we are
no longer using.
For the async atomic update helpers this differs. The async atomic
update helpers perform in-place updates on the existing state. They call
drm_atomic_helper_cleanup_planes but the state pointer is not swapped.
This means that prepare_fb is called on the new_plane_state and
cleanup_fb is called on the new_plane_state (not the old).
In the case where old_plane_state->fb == new_plane_state->fb then
there should be no behavioral difference between an async update
and a non-async commit. But there are issues that arise when
old_plane_state->fb != new_plane_state->fb.
The first is that the new_plane_state->fb is immediately cleaned up
after it has been prepared, so we're using a fb that we shouldn't
be.
The second occurs during a sequence of async atomic updates and
non-async regular atomic commits. Suppose there are two framebuffers
being interleaved in a double-buffering scenario, fb1 and fb2:
- Async update, oldfb = NULL, newfb = fb1, prepare fb1, cleanup fb1
- Async update, oldfb = fb1, newfb = fb2, prepare fb2, cleanup fb2
- Non-async commit, oldfb = fb2, newfb = fb1, prepare fb1, cleanup fb2
We call cleanup_fb on fb2 twice in this example scenario, and any
further use will result in use-after-free.
The simple fix to this problem is to block framebuffer changes
in the drm_atomic_helper_async_check function for now.
v2: Move check by itself, add a FIXME (Daniel)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: <stable@vger.kernel.org> # v4.14+
Fixes: fef9df8b59 ("drm/atomic: initial support for asynchronous plane update")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/275364/
Signed-off-by: Dave Airlie <airlied@redhat.com>
- Add a mechanism to only send commit done events once all pending
updates have been applied. This closes a small race window where
already armed events could fire even though the double buffered
hardware update just missed the update window.
- Add plane zpos property support to allow placing the overlay plane
behind the primary plane.
- Allow building imx-drm on all platforms under COMPILE_TEST.
-----BEGIN PGP SIGNATURE-----
iI4EABYIADYWIQRRO6F6WdpH1R0vGibVhaclGDdiwAUCXG/bAhgccGhpbGlwcC56
YWJlbEBnbWFpbC5jb20ACgkQ1YWnJRg3YsAz6wEA4ThGF7nvouYeI2jLEuLmkIpr
wXdyGA5XScfoSQUgFisBAPGl+g578KjIq7PrazKAEoKeencLytlf8mWCl99YZukB
=Veu5
-----END PGP SIGNATURE-----
Merge tag 'imx-drm-next-2019-02-22' of git://git.pengutronix.de/pza/linux into drm-next
drm/imx: handle pending updates better, add plane zpos property support
- Add a mechanism to only send commit done events once all pending
updates have been applied. This closes a small race window where
already armed events could fire even though the double buffered
hardware update just missed the update window.
- Add plane zpos property support to allow placing the overlay plane
behind the primary plane.
- Allow building imx-drm on all platforms under COMPILE_TEST.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Philipp Zabel <pza@pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20190222112350.m3ucezilqx6cyest@pengutronix.de
For platforms, which use a PHYS_OFFSET != 0, symbol _end also
contains that offset. So when calling memblock_reserve() for
reserving kernel the size argument needs to be adjusted.
Fixes: bcec54bf31 ("mips: switch to NO_BOOTMEM")
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v4.20+