This patch fixes the following KASAN report. The @ioend has been
freed by dio_put(), but the iomap_finish_ioend() still trys to access
its data.
[20563.631624] BUG: KASAN: use-after-free in iomap_finish_ioend+0x58c/0x5c0
[20563.638319] Read of size 8 at addr fffffc0c54a36928 by task kworker/123:2/22184
[20563.647107] CPU: 123 PID: 22184 Comm: kworker/123:2 Not tainted 5.4.0+ #1
[20563.653887] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.11 06/18/2019
[20563.664499] Workqueue: xfs-conv/sda5 xfs_end_io [xfs]
[20563.669547] Call trace:
[20563.671993] dump_backtrace+0x0/0x370
[20563.675648] show_stack+0x1c/0x28
[20563.678958] dump_stack+0x138/0x1b0
[20563.682455] print_address_description.isra.9+0x60/0x378
[20563.687759] __kasan_report+0x1a4/0x2a8
[20563.691587] kasan_report+0xc/0x18
[20563.694985] __asan_report_load8_noabort+0x18/0x20
[20563.699769] iomap_finish_ioend+0x58c/0x5c0
[20563.703944] iomap_finish_ioends+0x110/0x270
[20563.708396] xfs_end_ioend+0x168/0x598 [xfs]
[20563.712823] xfs_end_io+0x1e0/0x2d0 [xfs]
[20563.716834] process_one_work+0x7f0/0x1ac8
[20563.720922] worker_thread+0x334/0xae0
[20563.724664] kthread+0x2c4/0x348
[20563.727889] ret_from_fork+0x10/0x18
[20563.732941] Allocated by task 83403:
[20563.736512] save_stack+0x24/0xb0
[20563.739820] __kasan_kmalloc.isra.9+0xc4/0xe0
[20563.744169] kasan_slab_alloc+0x14/0x20
[20563.747998] slab_post_alloc_hook+0x50/0xa8
[20563.752173] kmem_cache_alloc+0x154/0x330
[20563.756185] mempool_alloc_slab+0x20/0x28
[20563.760186] mempool_alloc+0xf4/0x2a8
[20563.763845] bio_alloc_bioset+0x2d0/0x448
[20563.767849] iomap_writepage_map+0x4b8/0x1740
[20563.772198] iomap_do_writepage+0x200/0x8d0
[20563.776380] write_cache_pages+0x8a4/0xed8
[20563.780469] iomap_writepages+0x4c/0xb0
[20563.784463] xfs_vm_writepages+0xf8/0x148 [xfs]
[20563.788989] do_writepages+0xc8/0x218
[20563.792658] __writeback_single_inode+0x168/0x18f8
[20563.797441] writeback_sb_inodes+0x370/0xd30
[20563.801703] wb_writeback+0x2d4/0x1270
[20563.805446] wb_workfn+0x344/0x1178
[20563.808928] process_one_work+0x7f0/0x1ac8
[20563.813016] worker_thread+0x334/0xae0
[20563.816757] kthread+0x2c4/0x348
[20563.819979] ret_from_fork+0x10/0x18
[20563.825028] Freed by task 22184:
[20563.828251] save_stack+0x24/0xb0
[20563.831559] __kasan_slab_free+0x10c/0x180
[20563.835648] kasan_slab_free+0x10/0x18
[20563.839389] slab_free_freelist_hook+0xb4/0x1c0
[20563.843912] kmem_cache_free+0x8c/0x3e8
[20563.847745] mempool_free_slab+0x20/0x28
[20563.851660] mempool_free+0xd4/0x2f8
[20563.855231] bio_free+0x33c/0x518
[20563.858537] bio_put+0xb8/0x100
[20563.861672] iomap_finish_ioend+0x168/0x5c0
[20563.865847] iomap_finish_ioends+0x110/0x270
[20563.870328] xfs_end_ioend+0x168/0x598 [xfs]
[20563.874751] xfs_end_io+0x1e0/0x2d0 [xfs]
[20563.878755] process_one_work+0x7f0/0x1ac8
[20563.882844] worker_thread+0x334/0xae0
[20563.886584] kthread+0x2c4/0x348
[20563.889804] ret_from_fork+0x10/0x18
[20563.894855] The buggy address belongs to the object at fffffc0c54a36900
which belongs to the cache bio-1 of size 248
[20563.906844] The buggy address is located 40 bytes inside of
248-byte region [fffffc0c54a36900, fffffc0c54a369f8)
[20563.918485] The buggy address belongs to the page:
[20563.923269] page:ffffffff82f528c0 refcount:1 mapcount:0 mapping:fffffc8e4ba31900 index:0xfffffc0c54a33300
[20563.932832] raw: 17ffff8000000200 ffffffffa3060100 0000000700000007 fffffc8e4ba31900
[20563.940567] raw: fffffc0c54a33300 0000000080aa0042 00000001ffffffff 0000000000000000
[20563.948300] page dumped because: kasan: bad access detected
[20563.955345] Memory state around the buggy address:
[20563.960129] fffffc0c54a36800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
[20563.967342] fffffc0c54a36880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[20563.974554] >fffffc0c54a36900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[20563.981766] ^
[20563.986288] fffffc0c54a36980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc
[20563.993501] fffffc0c54a36a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[20564.000713] ==================================================================
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205703
Signed-off-by: Zorro Lang <zlang@redhat.com>
Fixes: 9cd0ed63ca ("iomap: enhance writeback error message")
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
bio completions can race when a page spans more than one file system
block. Add a spinlock to synchronize marking the page uptodate.
Fixes: 9dc55f1389 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The 'start' variable indicates the start of a filemap and is set to the
iocb's position, which we have already cached as 'pos', upon function
entry.
'pos' is used as a cursor indicating the current position and updated
later in iomap_dio_rw(), but not before the last use of 'start'.
Remove 'start' as it's synonym for 'pos' before we're entering the loop
calling iomapp_apply().
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
iomap_dio_bio_actor() copies iter to a local variable and then limits it
to a file extent we have mapped. When IO is submitted,
iomap_dio_bio_actor() advances the original iter while the copied iter
is advanced inside bio_iov_iter_get_pages(). This logic is non-obvious
especially because both iters still point to same shared structures
(such as pipe info) so if iov_iter_advance() changes anything in the
shared structure, this scheme breaks. Let's just truncate and reexpand
the original iter as needed instead of playing games with copying iters
and keeping them in sync.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When splicing using iomap_dio_rw() to a pipe, we may leak pipe pages
because bio_iov_iter_get_pages() records that the pipe will have full
extent worth of data however if file size is not block size aligned
iomap_dio_rw() returns less than what bio_iov_iter_get_pages() set up
and splice code gets confused leaking a pipe page with the file tail.
Handle the situation similarly to the old direct IO implementation and
revert iter to actually returned read amount which makes iter consistent
with value returned from iomap_dio_rw() and thus the splice code is
happy.
Fixes: ff6a9292e6 ("iomap: implement direct I/O")
CC: stable@vger.kernel.org
Reported-by: syzbot+991400e8eba7e00a26e1@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add some tracepoints so that we can more easily debug what the
filesystem is returning from ->iomap_begin.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Naresh reported LTP diotest4 failing for 32bit x86 and arm -next
kernels on ext4. Same problem exists in 5.4-rc7 on xfs.
The failure comes down to:
openat(AT_FDCWD, "testdata-4.5918", O_RDWR|O_DIRECT) = 4
mmap2(NULL, 4096, PROT_READ, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7b000
read(4, 0xb7f7b000, 4096) = 0 // expects -EFAULT
Problem is conversion at iomap_dio_bio_actor() return. Ternary
operator has a return type and an attempt is made to convert each
of operands to the type of the other. In this case "ret" (int)
is converted to type of "copied" (unsigned long). Both have size
of 4 bytes:
size_t copied = 0;
int ret = -14;
long long actor_ret = copied ? copied : ret;
On x86_64: actor_ret == -14;
On x86 : actor_ret == 4294967282
Replace ternary operator with 2 return statements to avoid this
unwanted conversion.
Fixes: 4721a60109 ("iomap: dio data corruption and spurious errors when pipes fill")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Coverity caught this fairly minor bug, but we should check the return
value of iomap_apply regardless.
Coverity-id: 1437065
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
On architectures where loff_t is wider than pgoff_t, the expression
((page->index + 1) << PAGE_SHIFT) can overflow. Rewrite to use the page
offset, which we already compute here anyway.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We've already check if it is READ iov_iter, no need check again.
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The srcmap is used to identify where the read is to be performed from.
It is passed to ->iomap_begin, which can fill it in if we need to read
data for partially written blocks from a different location than the
write target. The srcmap is only supported for buffered writes so far.
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
[hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as
srcmap if not set, adjust length down to srcmap end as well]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Instead of keeping a separate unnamed state for uninitialized iomaps,
renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated
as a hole.
Suggested-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Use the existing iomap write_begin code to read the pages unshared
by iomap_file_unshare. That avoids the extra ->readpage call and
extent tree lookup currently done by read_mapping_page.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
That keeps the function a little easier to understand, and easier to
modify for pending enhancements.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
xfs_file_dirty is used to unshare reflink blocks. Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared and don't need zeroing. This will allow to simplify
the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin
to allow reusing the flags arguments for an internal flags namespace
soon. Also remove the local index variable that is only used once.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The documentation for IOMAP_F_* is a bit disorganized, and doesn't
mention the fact that most flags are set by the file system and consumed
by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and
consumed by the file system.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
If we encounter an IO error during writeback, log the inode, offset, and
sector number of the failure, instead of forcing the user to do some
sort of reverse mapping to figure out which file is affected.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
No need to pass the full bio_vec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Move the initialization of ia and ib to the declaration line and remove
a superflous else.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that all the writepage code is in the iomap code there is no
need to keep this structure public.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
And inline mapping should never mark the page dirty and thus never end up
in writepages. Add a check for that condition and warn if it happens.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Take the xfs writeback code and move it to fs/iomap. A new structure
with three methods is added as the abstraction from the generic writeback
code to the file system. These methods are used to map blocks, submit an
ioend, and cancel a page that encountered an error before it was added to
an ioend.
Signed-off-by: Christoph Hellwig <hch@lst.de>
[darrick: rename ->submit_ioend to ->prepare_ioend to clarify what it
does]
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Lift the xfs code for tracing address space operations to the iomap
layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
File systems like gfs2 don't support delayed allocations or unwritten
extents and thus allocate normal mapped blocks to fill holes. To
cover the case of such file systems allocating new blocks to fill holes
also zero out mapped blocks with the new flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
In preparation for moving the writeback code to iomap.c, replace the
XFS-specific COW fork concept with the iomap IOMAP_F_SHARED flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
In preparation for moving the ioend structure to common code we need
to get rid of the xfs-specific xfs_trans type. Just make it a file
system private void pointer instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Introduce two nicely abstracted helper, which can be moved to the iomap
code later. Also use list_first_entry_or_null to simplify the code a
bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
In preparation for moving the XFS writeback code to fs/iomap.c, switch
it to use struct iomap instead of the XFS-specific struct xfs_bmbt_irec.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Don't set IOMAP_F_NEW if we COW over an existing allocated range, as
these aren't strictly new allocations. This is required to be able to
use IOMAP_F_NEW to zero newly allocated blocks, which is required for
the iomap code to fully support file systems that don't do delayed
allocations or use unwritten extents.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Currently we don't overwrite the flags field in the iomap in
xfs_bmbt_to_iomap. This works fine with 0-initialized iomaps on stack,
but is harmful once we want to be able to reuse an iomap in the
writeback code. Replace the shared parameter with a set of initial
flags an thus ensures the flags field is always reinitialized.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When doing a direct IO that spans the current EOF, and there are
written blocks beyond EOF that extend beyond the current write, the
only metadata update that needs to be done is a file size extension.
However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that
there is IO completion metadata updates required, and hence we may
fail to correctly sync file size extensions made in IO completion
when O_DSYNC writes are being used and the hardware supports FUA.
Hence when setting IOMAP_F_DIRTY, we need to also take into account
whether the iomap spans the current EOF. If it does, then we need to
mark it dirty so that IO completion will call generic_write_sync()
to flush the inode size update to stable storage correctly.
Fixes: 3460cac1ca ("iomap: Use FUA for pure data O_DSYNC DIO writes")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: removed the ext4 part; they'll handle it separately]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Use iomap_dio_rw() to wait for unaligned direct IO instead of opencoding
the wait.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Filesystems do not support doing IO as asynchronous in some cases. For
example in case of unaligned writes or in case file size needs to be
extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO
in such cases, add argument to iomap_dio_rw() which makes the function
wait for IO completion. This also results in executing
iomap_dio_complete() inline in iomap_dio_rw() providing its return value
to the caller as for ordinary sync IO.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
- Removed locked down from tracefs itself and moved it to the trace
directory. Having the open functions there do the lockdown checks.
- Fixed a few races with opening an instance file and the instance being
deleted (Discovered during the locked down updates). Kept separate
from the clean up code such that they can be backported to stable
easier.
- Cleaned up and consolidated the checks done when opening a trace
file, as there were multiple checks that need to be done, and it
did not make sense having them done in each open instance.
- Fixed a regression in the record mcount code.
- Small hw_lat detector tracer fixes.
- A trace_pipe read fix due to not initializing trace_seq.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXaNhphQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6quDIAP4v08ARNdIh+r+c4AOBm3xsOuE/d9GB
I56ydnskm+x2JQD6Ap9ivXe9yDBIErFeHNtCoq7pM8YDI4YoYIB30N0GfwM=
=7oAu
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few tracing fixes:
- Remove lockdown from tracefs itself and moved it to the trace
directory. Have the open functions there do the lockdown checks.
- Fix a few races with opening an instance file and the instance
being deleted (Discovered during the lockdown updates). Kept
separate from the clean up code such that they can be backported to
stable easier.
- Clean up and consolidated the checks done when opening a trace
file, as there were multiple checks that need to be done, and it
did not make sense having them done in each open instance.
- Fix a regression in the record mcount code.
- Small hw_lat detector tracer fixes.
- A trace_pipe read fix due to not initializing trace_seq"
* tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
tracing/hwlat: Report total time spent in all NMIs during the sample
recordmcount: Fix nop_mcount() function
tracing: Do not create tracefs files if tracefs lockdown is in effect
tracing: Add locked_down checks to the open calls of files created for tracefs
tracing: Add tracing_check_open_get_tr()
tracing: Have trace events system open call tracing_open_generic_tr()
tracing: Get trace_array reference for available_tracers files
ftrace: Get a reference counter for the trace_array on filter files
tracefs: Revert ccbd54ff54 ("tracefs: Restrict tracefs when the kernel is locked down")
Update/fix inspur-ipsps1 and k10temp Documentation
Fix nct7904 driver
Fix HWMON_P_MIN_ALARM mask in hwmon core
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEiHPvMQj9QTOCiqgVyx8mb86fmYEFAl2iD0IACgkQyx8mb86f
mYEnFQ//XJMPIukj659J3hwfzxEx3LXucQsQivUhmVXYzPAzdxa9eC97tj1I3pgz
nKR2BIFvXtI+o3459TWuvj+Y7utmQXRQ5XiP6JBb+Wkn84IK6O0GNKSOPYDNI/3t
/ii2XY3dERr8BuBPmRuYP7GorjFJH1P7B0FXrln5HLYAyS1yJVioLVhbrbi63NkL
ngoxZhgcVP5QIbaSFC6ZJs1oqXXVIbvBEkt9bjkmDhJssmPtFweVxvIUnWUoPdAg
A/GvmOddhRa/9vXzZbNThOFOLq7nHguIDvbIwizULMRHfDHB5aFEqyz9W/OFqSj5
8s+GRiO1oYBxVrz4Bl32pR0aCJbZ5WlxhgKrLdlunRnATAzXpLzRAbKhOhoBchJZ
xyC+96q0x8GIJ5dBqm2XqZhJssZgfgb/6SSlzSu+r6aQogHQc1/PDL26PHR517E5
T/pUNY97LvOOgiosDuaLiGpiszoz0WdeHVqdUujnRiN5TtE5u0M6ACdumXGjFb8K
0sPkep919yaaKNaPYA6u9PCHl4WG5EJdBkY44z4BkCt9scVvXEvATGUxNMwH8Bvm
95w/swRlN2yq+fa4T4I3UtAAW9EgCJ3LIYYUo1me0SDld2cPwbVkMvOIirD3TBLC
qoP178X34HaqeozOaW4Hi08QYLzktmCo5KFm6KKNFbFCQZ4oBo4=
=d+np
-----END PGP SIGNATURE-----
Merge tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Update/fix inspur-ipsps1 and k10temp Documentation
- Fix nct7904 driver
- Fix HWMON_P_MIN_ALARM mask in hwmon core
* tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: docs: Extend inspur-ipsps1 title underline
hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct.
docs: hwmon: Include 'inspur-ipsps1.rst' into docs
hwmon: Fix HWMON_P_MIN_ALARM mask
hwmon: (k10temp) Update documentation and add temp2_input info
hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
- spi-nor: Fix for a regression in write_sr()
- rawnand: Regression fix for the au1550nd driver
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl2iNXoWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wXBVD/sH9DCWSFAUXwqnXG1GP/05R++q
0OHc8t8DmTxcsNXMmPUhj/icQHRf611QvH9T3oOKCn6umOeuDskB89H4ELDnk9kg
ncxW48ijCRT4RNvL0FPbsbL3eU22kRtLoHj9VSa4tSjeNzgdbBG5sdugmASCNByi
WLyNY6nXZG2v8lpRwP/uv6zDtlp3r7ucNVdKlKWTsdBzpL94LIkwYgsWg/TTG6eO
9/29DBEjQonu3kCMkxKoHP1OVfS+XYKnbg4W6MNyDFRyT6xJxgt191K4q735ZiCJ
Qf+rjUMo9/7WTUdSlAept3tb/0iKAV6V8Ql8xBGfLsGMsa7O6eh083d+9QvjXMw9
I3dwsb9Of7t2n6Cj9pT74k6UA3KSfkM4v7r87Gr4UkyWxNn5uIn9uy/WJ+yCHFUd
1aDC6ZR8dzzsJNoWx6E4zFtBtu+Whd9/V18qTGegjg/eFK51z4Dh6bDkyKveYV5C
3343fuYQ0KO4wsFyiUcraP8BQcxk2FkjMcWS3bX3tbJuux8GV9UwfKSdlICwEgw+
Rql4N4SKZWoLg7d1tXOQ9fGikBel0mwQOvgABJoVOq13doUTesdiVO2JyUIblkvP
cA4w1NDgjXZ+kMh0n6aR+kJZcHd751/EoO0mk8tt6JNT80NWottr4CpzSNx+MbAW
YgYzi/q35JnIV5Ec4Q==
=mJlc
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull MTD fixes from Richard Weinberger:
"Two fixes for MTD:
- spi-nor: Fix for a regression in write_sr()
- rawnand: Regression fix for the au1550nd driver"
* tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: au1550nd: Fix au_read_buf16() prototype
mtd: spi-nor: Fix direction of the write_sr() transfer
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl2iBvAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpvokEACaUozc0Y2L99CqJcRtoH8cwIAimjKTFSmi
UKijO/w11uc9xCIOospkM8iseRbpP/oCIwfYyVROi4LniPB1TSyQJoLF/MELeqIi
lOSg8RrXzC22h5+QG3BR5VO55me3MwIXmJLMMWXbh7FAuUg+wepVwHwzZBurd1Pz
C1wG8NU758AmqJdSGclUbdLLwTBkR+fXoi2EHfdbsBMqGmdZ5VRzCLx70grDcxq2
S9gt+2+diME6o6p1wko+m9pdhUdEy7epPE50K4SA+DU06I77OZpTdSAPilPyQHei
+3HDslzjCyos5/Ay+7hWl42jvyZ+x6GI5VcDLMTcMZsmxLy/WObDB4zcAixJrDV0
c7XWnwCCs2wMXYMsu0ASbkU32mcQ/Ik8CrX0W3/tP1aZBW2LZnOHvXYbybzpbgbo
QY47t3jz21uzF5kplBGxqaY1sPmLxp5V0vQmj9eN+Y7T+hiR/UgWKO/5B4CGNJe0
BQkXNl5cZAtqxmSE8AxlpOla7z6beZUljdhFc7K8cMDdhoatRgm2MBh2G9BB1ayo
VwHc1qLcvInpQrGiC4l5H/F8jrEs/PglknFSs1wlsav1gXA69J0cHA7LY6Czweaw
mkiKr7jA6duBBn6MRUThF5sBMFWYuxp3B/BA4wfDJsSfCmyhWN+OZlsDaEVrzk7o
/XK2Ew9R2Q==
=PUeL
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20191012' of git://git.kernel.dk/linux-block
Pull io_uring fix from Jens Axboe:
"Single small fix for a regression in the sequence logic for linked
commands"
* tag 'for-linus-20191012' of git://git.kernel.dk/linux-block:
io_uring: fix sequence logic for timeout requests
A customer reported the following softlockup:
[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002] tracing_read_pipe+0x336/0x3c0
[899688.160002] __vfs_read+0x26/0x140
[899688.160002] vfs_read+0x87/0x130
[899688.160002] SyS_read+0x42/0x90
[899688.160002] do_syscall_64+0x74/0x160
It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.
Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.
The culprit seems to be in the code:
/* reset all but tr, trace, and overruns */
memset(&iter->seq, 0,
sizeof(struct trace_iterator) -
offsetof(struct trace_iterator, seq));
It was added by the commit 53d0aa7730 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:
struct trace_seq {
unsigned char buffer[PAGE_SIZE];
unsigned int len;
};
There was no "size" variable and zeroing was perfectly fine.
The solution is to reinitialize the structure after or without
zeroing.
Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
max_latency is intended to record the maximum ever observed hardware
latency, which may occur in either part of the loop (inner/outer). So
we need to also consider the outer-loop sample when updating
max_latency.
Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu
Fixes: e7c15cd8a1 ("tracing: Added hardware latency tracer")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
nmi_total_ts is supposed to record the total time spent in *all* NMIs
that occur on the given CPU during the (active portion of the)
sampling window. However, the code seems to be overwriting this
variable for each NMI, thereby only recording the time spent in the
most recent NMI. Fix it by accumulating the duration instead.
Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu
Fixes: 7b2c862501 ("tracing: Add NMI tracing in hwlat detector")
Cc: stable@vger.kernel.org
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The removal of the longjmp code in recordmcount.c mistakenly made the return
of make_nop() being negative an exit of nop_mcount(). It should not exit the
routine, but instead just not process that part of the code. By exiting with
an error code, it would cause the update of recordmcount to fail some files
which would fail the build if ftrace function tracing was enabled.
Link: http://lkml.kernel.org/r/20191009110538.5909fec6@gandalf.local.home
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 3f1df12019 ("recordmcount: Rewrite error/success handling")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
If on boot up, lockdown is activated for tracefs, don't even bother creating
the files. This can also prevent instances from being created if lockdown is
in effect.
Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Added various checks on open tracefs calls to see if tracefs is in lockdown
mode, and if so, to return -EPERM.
Note, the event format files (which are basically standard on all machines)
as well as the enabled_functions file (which shows what is currently being
traced) are not lockde down. Perhaps they should be, but it seems counter
intuitive to lockdown information to help you know if the system has been
modified.
Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Currently, most files in the tracefs directory test if tracing_disabled is
set. If so, it should return -ENODEV. The tracing_disabled is called when
tracing is found to be broken. Originally it was done in case the ring
buffer was found to be corrupted, and we wanted to prevent reading it from
crashing the kernel. But it's also called if a tracing selftest fails on
boot. It's a one way switch. That is, once it is triggered, tracing is
disabled until reboot.
As most tracefs files can also be used by instances in the tracefs
directory, they need to be carefully done. Each instance has a trace_array
associated to it, and when the instance is removed, the trace_array is
freed. But if an instance is opened with a reference to the trace_array,
then it requires looking up the trace_array to get its ref counter (as there
could be a race with it being deleted and the open itself). Once it is
found, a reference is added to prevent the instance from being removed (and
the trace_array associated with it freed).
Combine the two checks (tracing_disabled and trace_array_get()) into a
single helper function. This will also make it easier to add lockdown to
tracefs later.
Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Instead of having the trace events system open call open code the taking of
the trace_array descriptor (with trace_array_get()) and then calling
trace_open_generic(), have it use the tracing_open_generic_tr() that does
the combination of the two. This requires making tracing_open_generic_tr()
global.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
As instances may have different tracers available, we need to look at the
trace_array descriptor that shows the list of the available tracers for the
instance. But there's a race between opening the file and an admin
deleting the instance. The trace_array_get() needs to be called before
accessing the trace_array.
Cc: stable@vger.kernel.org
Fixes: 607e2ea167 ("tracing: Set up infrastructure to allow tracers for instances")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The ftrace set_ftrace_filter and set_ftrace_notrace files are specific for
an instance now. They need to take a reference to the instance otherwise
there could be a race between accessing the files and deleting the instance.
It wasn't until the :mod: caching where these file operations started
referencing the trace_array directly.
Cc: stable@vger.kernel.org
Fixes: 673feb9d76 ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>