When a buffer has been failed during writeback, the inode items into it
are kept flush locked, and are never resubmitted due the flush lock, so,
if any buffer fails to be written, the items in AIL are never written to
disk and never unlocked.
This causes unmount operation to hang due these items flush locked in AIL,
but this also causes the items in AIL to never be written back, even when
the IO device comes back to normal.
I've been testing this patch with a DM-thin device, creating a
filesystem larger than the real device.
When writing enough data to fill the DM-thin device, XFS receives ENOSPC
errors from the device, and keep spinning on xfsaild (when 'retry
forever' configuration is set).
At this point, the filesystem can not be unmounted because of the flush locked
items in AIL, but worse, the items in AIL are never retried at all
(once xfs_inode_item_push() will skip the items that are flush locked),
even if the underlying DM-thin device is expanded to the proper size.
This patch fixes both cases, retrying any item that has been failed
previously, using the infra-structure provided by the previous patch.
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
With the current code, XFS never re-submit a failed buffer for IO,
because the failed item in the buffer is kept in the flush locked state
forever.
To be able to resubmit an log item for IO, we need a way to mark an item
as failed, if, for any reason the buffer which the item belonged to
failed during writeback.
Add a new log item callback to be used after an IO completion failure
and make the needed clean ups.
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When we do log recovery on a readonly mount, unlinked inode
processing does not happen due to the readonly checks in
xfs_inactive(), which are trying to prevent any I/O on a
readonly mount.
This is misguided - we do I/O on readonly mounts all the time,
for consistency; for example, log recovery. So do the same
RDONLY flag twiddling around xfs_log_mount_finish() as we
do around xfs_log_mount(), for the same reason.
This all cries out for a big rework but for now this is a
simple fix to an obvious problem.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
There are dueling comments in the xfs code about intent
for log writes when unmounting a readonly filesystem.
In xfs_mountfs, we see the intent:
/*
* Now the log is fully replayed, we can transition to full read-only
* mode for read-only mounts. This will sync all the metadata and clean
* the log so that the recovery we just performed does not have to be
* replayed again on the next mount.
*/
and it calls xfs_quiesce_attr(), but by the time we get to
xfs_log_unmount_write(), it returns early for a RDONLY mount:
* Don't write out unmount record on read-only mounts.
Because of this, sequential ro mounts of a filesystem with
a dirty log will replay the log each time, which seems odd.
Fix this by writing an unmount record even for RO mounts, as long
as norecovery wasn't specified (don't write a clean log record
if a dirty log may still be there!) and the log device is
writable.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The 'dmic-delay' property name is different with the dt-binding.
So correct it with 'dmic-wakeup-delay-ms'.
Fixes: 3a6f9dce61 (ASoC: rk3399_gru_sound: fix recording pop at first attempt)
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Add a jump target so that a bit of exception handling can be better reused
at the end of this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Pull "Enable driver for display used on Lego Mindstorms EV3"
from Sekhar Nori:
* tag 'davinci-for-v4.14/defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: davinci_all_defconfig: enable tinydrm and ST7586
Pull "Add device-tree node for display on Lego Mindstorms EV3" from Sekhar Nori:
* tag 'davinci-for-v4.14/dt' of git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci:
ARM: dts: da850-lego-ev3: Add node for LCD display
Pull "arm64: Xilinx ZynqMP DT fixes for v4.14" from Michal Simek:
- Fix DTC warnings
- Add idle states, OP, cci-400, RTC, pcie prefetchable memory
fpd/lpd dmas, clocks for ep108
- Enable can1
- Fix smmu IRQ, aliases, uart compatible string
- Use generic compatible string for i2c eeprom
* tag 'zynqmp-dt-for-4.14' of https://github.com/Xilinx/linux-xlnx:
arm64: zynqmp: Add generic compatible string for I2C EEPROM
arm64: zynqmp: Add missing mmc aliases in ep108
arm64: zynqmp: Enable can1 for ep108
arm64: zynqmp: Added clocks to DT for ep108
arm64: zynqmp: Use C pre-processor for includes
arm64: zynqmp: Add fpd/lpd dmas
arm64: zynqmp: Set status disabled in dtsi
arm64: zynqmp: Add new uartps compatible string
arm64: zynqmp: Correct IRQ nr for the SMMU
arm64: zynqmp: Add support for RTC
arm64: zynqmp: Adding prefetchable memory space to pcie node
arm64: zynqmp: Add CCI-400 node
arm64: zynqmp: Add dcc console for zynqmp
arm64: zynqmp: Add operating points
arm64: zynqmp: Add idle state for ZynqMP
arm64: zynqmp: Add references to cpu nodes
arm64: zynqmp: Move nodes which have no reg property out of bus
arm64: zynqmp: Remove leading 0s from mtd table for spi flashes
arm64: dts: xilinx: fix PCI bus dtc warnings
Add JSON uncore events for Skylake Server to perf.
Based on JSON list V1.01
This is a much fuller list than with earlier uncores, including
more low level (but also harder to understand) events. It does not
include the "experimential" events. The previous
high level metric (LLC_* etc.) are still available when applicable.
C state power events are not included at this point.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170816220553.GA19463@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enhance the expression parser for more complex metric formulas.
- Support python style IF ELSE operators
- Add an #SMT_On magic variable for formulas that depend on the SMT
status.
Example: 4 *( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else cycles
- Support MIN/MAX operations
Example: min(1 , IDQ.MITE_UOPS / ( UPI * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )
This is useful to fix up problems caused by multiplexing.
- Support | & ^ operators
- Minor cleanups and fixes
- Support an \ escape for operators. This allows to specify event names
like c2-residency
- Support @ as an alternative for / to be able to specify pmus without
conflicts with operators (like msr/tsc/ as msr@tsc@)
Example: (cstate_core@c3\\-residency@ / msr@tsc@) * 100
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-8-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
STAudio ADCIII has the same SSID as Hoontech STDSP24, but requires a
slightly different configuration. This patch allows user to choose
this model via model=staudio option to set the proper configuration
for the board.
Bugzilla: http://bugzilla.suse.com/show_bug.cgi?id=1048934
Signed-off-by: Takashi Iwai <tiwai@suse.de>
huge_pte_offset() was updated to correctly handle swap entries for
hugepages. With the addition of the size parameter, it is now possible
to disambiguate whether the request is for a regular hugepage or a
contiguous hugepage.
Fix huge_pte_offset() for contiguous hugepages by using the size to find
the correct page table entry.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
perf stat -e cpu/uops_executed.core,cmask=1/
would be detected as a BPF source event because the .c matches the .c
source BPF pattern.
v2:
Originally I tried to use lex lookahead, but it doesn't seem to work.
This now extends the BPF pattern to match longer events, but then does
an extra check in the C code to reject BPF matches that do not end with
.c/.o/.obj
This uses REJECT, which makes the flex scanner slower, but that
shouldn't be a big problem for the perf events.
Committer testing:
# perf trace -e write -e /home/acme/bpf/tracepoint.c cat /etc/passwd > /dev/null
0.000 ( 0.006 ms): cat/18485 write(fd: 1, buf: 0x7f59eebe1000, count: 3494 ) ...
0.006 ( ): raw_syscalls:sys_enter:NR 1 (1, 7f59eebe1000, da6, 22, 7f59eebe0010, 0))
0.008 ( ): perf_bpf_probe:_write:(ffffffff9626b2c0))
0.000 ( 0.010 ms): cat/18485 ... [continued]: write()) = 3494
#
It continues doing what was expected, i.e. identifying
/home/acme/bpf/tracepoint.c as a BPF event and activates the clang
machinery to build an eBPF object and then uses sys_bpf() to hook it up
to the raw_syscalls:sys_enter tracepoint, etc.
Andi forgot to add Wang to the CC list, fix it.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170811232634.30465-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It has become apparent that one has to take special care when modifying
attributes of memory mappings that employ the contiguous bit.
Both the requirement and the architecturally correct "Break-Before-Make"
technique of updating contiguous entries can be found described in:
ARM DDI 0487A.k_iss10775, "Misprogramming of the Contiguous bit",
page D4-1762.
The huge pte accessors currently replace the attributes of contiguous
pte entries in place thus can, on certain platforms, lead to TLB
conflict aborts or even erroneous results returned from TLB lookups.
This patch adds two helper functions -
* get_clear_flush(.) - clears a contiguous entry and returns the head
pte (whilst taking care to retain dirty bit information that could
have been modified by DBM).
* clear_flush(.) that clears a contiguous entry
A tlb invalidate is performed to then ensure that there is no
possibility of multiple tlb entries being present for the same region.
Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
(Added helper clear_flush(), updated commit log, and some cleanup)
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
[catalin.marinas@arm.com: remove CONFIG_ARM64_HW_AFDBM check]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch aims to re-structure the huge pte accessors without affecting
their functionality. Control flow is changed to reduce indentation and
expanded use is made of post for loop variable modification.
It is then much easier to add break-before-make semantics in a subsequent
patch.
Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fix buffer overflow for:
% perf stat -e msr/tsc/,cstate_core/c7-residency/ true
that causes glibc free list corruption. For some reason it doesn't
trigger in valgrind, but it is visible in AS:
=================================================================
==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0
READ of size 4 at 0x603000003f5c thread T0
#0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196
#1 0x56c57a in perf_evsel__close util/evsel.c:1717
#2 0x55ed5f in perf_evlist__close util/evlist.c:1631
#3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749
#4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
#5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
#6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
#7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
#8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
#9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
#10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
#11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419)
0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c)
allocated by thread T0 here:
#0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020)
#1 0x648a2d in zalloc util/util.h:23
#2 0x648a88 in xyarray__new util/xyarray.c:9
#3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039
#4 0x56b427 in perf_evsel__open util/evsel.c:1529
#5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730
#6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263
#7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600
#8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
#9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
#10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
#11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
#12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
#13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
#14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
The event is allocated with cpus == 1, but freed with cpus == real number
When the evsel close function walks the file descriptors it exceeds the
fd xyarray boundaries and reads random memory.
v2:
Now that xyarrays save their original dimensions we can use these to
iterate the two dimensional fd arrays. Fix some users (close, ioctl) in
evsel.c to use these fields directly. This allows simplifying the code
and dropping quite a few function arguments. Adjust all callers by
removing the unneeded arguments.
The actual perf event reading still uses the original values from the
evsel list.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-2-andi@firstfloor.org
[ Fix up xy_max_[xy]() -> xyarray__max_[xy]() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a WARN_ON to set_huge_pte_at as the accessor assumes
that entries to be written down are all present. (There are separate
accessors to clear huge ptes).
We will need to handle the !pte_present case where memory offlining
is used on hugetlb pages. swap and migration entries will be supplied
to set_huge_pte_at in this case.
Cc: David Woods <dwoods@mellanox.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The initialization of MediaTek power manager(SCPSYS) is
builtin_platform_driver, and SMI must depend on power-domain.
Thus, currently subsys_initcall for SMI is unnecessary, SMI will be
always probe defered by power-domain. Degrade it to module_init.
In addition, there are two small changes about the probe sequence:
1) Delete this two lines.
if (!dev->pm_domain)
return -EPROBE_DEFER;
This is not helpful. the platform driver framework guarantee this.
The "dev_pm_domain_attach" in the "platform_drv_probe" will return
EPROBE_DEFER if its powerdomain is not ready.
2) Add the probe-defer for the smi-larb device should waiting for
smi-common.
In mt2712, there are 2 smi-commons, 10 smi-larbs. All will be
probe-defered by the power-domain, there is seldom case that
smi-larb probe done before smi-common. then it will hang like
this:
Unable to handle kernel NULL pointer dereference at virtual address
00000000 pgd = ffffff800a4e0000
[00000000] *pgd=00000000beffe003[ 17.610026] , *pud=00000000beffe003
...
[<ffffff800897fe04>] mtk_smi_enable+0x1c/0xd0
[<ffffff800897fee8>] mtk_smi_larb_get+0x30/0x98
[<ffffff80088edfa8>] mtk_mipicsi0_resume+0x38/0x1b8
[<ffffff8008634f44>] pm_generic_runtime_resume+0x3c/0x58
[<ffffff8008644ff8>] __genpd_runtime_resume+0x38/0x98
[<ffffff8008647434>] genpd_runtime_resume+0x164/0x220
[<ffffff80086372f8>] __rpm_callback+0x78/0xa0
[<ffffff8008637358>] rpm_callback+0x38/0xa0
[<ffffff8008638a4c>] rpm_resume+0x4a4/0x6f8
[<ffffff8008638d04>] __pm_runtime_resume+0x64/0xa0
[<ffffff80088ed05c>] mtk_mipicsi0_probe+0x40c/0xb70
[<ffffff800862cdc0>] platform_drv_probe+0x58/0xc0
[<ffffff800862a514>] driver_probe_device+0x284/0x438
[<ffffff800862a8ac>] __device_attach_driver+0xb4/0x160
[<ffffff8008627d58>] bus_for_each_drv+0x68/0xa8
[<ffffff800862a0a4>] __device_attach+0xd4/0x168
[<ffffff800862a9d4>] device_initial_probe+0x24/0x30
[<ffffff80086291d8>] bus_probe_device+0xa0/0xa8
[<ffffff8008629784>] deferred_probe_work_func+0x94/0xf0
[<ffffff80080f03a8>] process_one_work+0x1d8/0x6e0
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch is for 4GB mode, mainly for 4 issues:
1) Fix a 4GB bug:
if the dram base is 0x4000_0000, the dram size is 0xc000_0000.
then the code just meet a corner case because max_pfn is
0x10_0000.
data->enable_4GB = !!(max_pfn > (0xffffffffUL >> PAGE_SHIFT));
It's true at the case above. That is unexpected.
2) In mt2712, there is a new register for the 4GB PA range(0x118)
we should enlarge the max PA range, or the HW will report
error.
The dram range is from 0x1_0000_0000 to 0x1_ffff_ffff in the 4GB
mode, we cut out the bit[32:30] of the SA(Start address) and
EA(End address) into this REG_MMU_VLD_PA_RNG(0x118).
3) In mt2712, the register(0x13c) is extended for 4GB mode.
bit[7:6] indicate the valid PA[32:33]. Thus, we don't mask the
value and print it directly for debug.
4) if 4GB is enabled, the dram PA range is from 0x1_0000_0000 to
0x1_ffff_ffff. Thus, the PA from iova_to_pa should also '|' BIT(32)
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When system suspend, infra power domain may be off, and the iommu's
clock must be disabled when system off, or the iommu's bclk clock maybe
disabled after system resume.
Signed-off-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
After adding the global list for M4U HW, We get a chance to
move the pagetable allocation into the mtk_iommu_domain_alloc.
Let the domain_alloc do the right thing.
This patch is for fixing this problem[1].
[1]: https://patchwork.codeaurora.org/patch/53987/
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In theory, If there are 2 M4U HWs, there should be 2 IOMMU domains.
But one IOMMU domain(4GB iova range) is enough for us currently,
It's unnecessary to maintain 2 pagetables.
Besides, This patch can simplify our consumer code largely. They don't
need map a iova range from one domain into another, They can share the
iova address easily.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The M4U IP blocks in mt2712 is MTK's generation2 M4U which use the
ARM Short-descriptor like mt8173, and most of the HW registers are
the same.
The difference is that there are 2 M4U HWs in mt2712 while there's
only one in mt8173. The purpose of 2 M4U HWs is for balance the
bandwidth.
Normally if there are 2 M4U HWs, there should be 2 iommu domains,
each M4U has a iommu domain.
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The definition of MTK_M4U_TO_LARB and MTK_M4U_TO_PORT are shared by
all the gen2 M4U HWs. Thus, Move them out from mt8173-larb-port.h,
and put them into the c file.
Suggested-by: Honghui Zhang <honghui.zhang@mediatek.com>
Signed-off-by: Yong Wu <yong.wu@mediatek.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>