The commits introducing 'mlock-random-test'[1], 'map_fiex_noreplace'[2],
and 'thuge-gen'[3] have not added those in the 'run_vmtests' script and
thus the 'run_tests' command of kselftests doesn't run those. This
commit adds those in the script.
'gup_benchmark' and 'transhuge-stress' are also not included in the
'run_vmtests', but this commit does not add those because those are for
performance measurement rather than pass/fail tests.
[1] commit 26b4224d99 ("selftests: expanding more mlock selftest")
[2] commit 91cbacc345 ("tools/testing/selftests/vm/map_fixed_noreplace.c: add test for MAP_FIXED_NOREPLACE")
[3] commit fcc1f2d5dd ("selftests: add a test program for variable huge page sizes in mmap/shmget")
Link: http://lkml.kernel.org/r/20200206085144.29126-1-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
QEMU has a funny new build error message when I use the upstream kernel
headers:
CC block/file-posix.o
In file included from /home/cborntra/REPOS/qemu/include/qemu/timer.h:4,
from /home/cborntra/REPOS/qemu/include/qemu/timed-average.h:29,
from /home/cborntra/REPOS/qemu/include/block/accounting.h:28,
from /home/cborntra/REPOS/qemu/include/block/block_int.h:27,
from /home/cborntra/REPOS/qemu/block/file-posix.c:30:
/usr/include/linux/swab.h: In function `__swab':
/home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:34: error: "sizeof" is not defined, evaluates to 0 [-Werror=undef]
20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE)
| ^~~~~~
/home/cborntra/REPOS/qemu/include/qemu/bitops.h:20:41: error: missing binary operator before token "("
20 | #define BITS_PER_LONG (sizeof (unsigned long) * BITS_PER_BYTE)
| ^
cc1: all warnings being treated as errors
make: *** [/home/cborntra/REPOS/qemu/rules.mak:69: block/file-posix.o] Error 1
rm tests/qemu-iotests/socket_scm_helper.o
This was triggered by commit d5767057c9 ("uapi: rename ext2_swab() to
swab() and share globally in swab.h"). That patch is doing
#include <asm/bitsperlong.h>
but it uses BITS_PER_LONG.
The kernel file asm/bitsperlong.h provide only __BITS_PER_LONG.
Let us use the __ variant in swap.h
Link: http://lkml.kernel.org/r/20200213142147.17604-1-borntraeger@de.ibm.com
Fixes: d5767057c9 ("uapi: rename ext2_swab() to swab() and share globally in swab.h")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Allison Randal <allison@lohutok.net>
Cc: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
Cc: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit a979558448.
Commit a979558448 ("ipc,sem: remove uneeded sem_undo_list lock usage
in exit_sem()") removes a lock that is needed. This leads to a process
looping infinitely in exit_sem() and can also lead to a crash. There is
a reproducer available in [1] and with the commit reverted the issue
does not reproduce anymore.
Using the reproducer found in [1] is fairly easy to reach a point where
one of the child processes is looping infinitely in exit_sem between
for(;;) and if (semid == -1) block, while it's trying to free its last
sem_undo structure which has already been freed by freeary().
Each sem_undo struct is on two lists: one per semaphore set (list_id)
and one per process (list_proc). The list_id list tracks undos by
semaphore set, and the list_proc by process.
Undo structures are removed either by freeary() or by exit_sem(). The
freeary function is invoked when the user invokes a syscall to remove a
semaphore set. During this operation freeary() traverses the list_id
associated with the semaphore set and removes the undo structures from
both the list_id and list_proc lists.
For this case, exit_sem() is called at process exit. Each process
contains a struct sem_undo_list (referred to as "ulp") which contains
the head for the list_proc list. When the process exits, exit_sem()
traverses this list to remove each sem_undo struct. As in freeary(),
whenever a sem_undo struct is removed from list_proc, it is also removed
from the list_id list.
Removing elements from list_id is safe for both exit_sem() and freeary()
due to sem_lock(). Removing elements from list_proc is not safe;
freeary() locks &un->ulp->lock when it performs
list_del_rcu(&un->list_proc) but exit_sem() does not (locking was
removed by commit a979558448 ("ipc,sem: remove uneeded sem_undo_list
lock usage in exit_sem()").
This can result in the following situation while executing the
reproducer [1] : Consider a child process in exit_sem() and the parent
in freeary() (because of semctl(sid[i], NSEM, IPC_RMID)).
- The list_proc for the child contains the last two undo structs A and
B (the rest have been removed either by exit_sem() or freeary()).
- The semid for A is 1 and semid for B is 2.
- exit_sem() removes A and at the same time freeary() removes B.
- Since A and B have different semid sem_lock() will acquire different
locks for each process and both can proceed.
The bug is that they remove A and B from the same list_proc at the same
time because only freeary() acquires the ulp lock. When exit_sem()
removes A it makes ulp->list_proc.next to point at B and at the same
time freeary() removes B setting B->semid=-1.
At the next iteration of for(;;) loop exit_sem() will try to remove B.
The only way to break from for(;;) is for (&un->list_proc ==
&ulp->list_proc) to be true which is not. Then exit_sem() will check if
B->semid=-1 which is and will continue looping in for(;;) until the
memory for B is reallocated and the value at B->semid is changed.
At that point, exit_sem() will crash attempting to unlink B from the
lists (this can be easily triggered by running the reproducer [1] a
second time).
To prove this scenario instrumentation was added to keep information
about each sem_undo (un) struct that is removed per process and per
semaphore set (sma).
CPU0 CPU1
[caller holds sem_lock(sma for A)] ...
freeary() exit_sem()
... ...
... sem_lock(sma for B)
spin_lock(A->ulp->lock) ...
list_del_rcu(un_A->list_proc) list_del_rcu(un_B->list_proc)
Undo structures A and B have different semid and sem_lock() operations
proceed. However they belong to the same list_proc list and they are
removed at the same time. This results into ulp->list_proc.next
pointing to the address of B which is already removed.
After reverting commit a979558448 ("ipc,sem: remove uneeded
sem_undo_list lock usage in exit_sem()") the issue was no longer
reproducible.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1694779
Link: http://lkml.kernel.org/r/20191211191318.11860-1-ioanna-maria.alifieraki@canonical.com
Fixes: a979558448 ("ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()")
Signed-off-by: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Herton R. Krzesinski <herton@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: <malat@debian.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are no in-kernel users remaining, but there may still be users that
include linux/time.h instead of sys/time.h from user space, so leave the
types available to user space while hiding them from kernel space.
Only the __kernel_old_* versions of these types remain now.
Link: http://lkml.kernel.org/r/20200110154232.4104492-4-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No users remain, so kill these off before we grow new ones.
Link: http://lkml.kernel.org/r/20200110154232.4104492-3-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A couple of helpers are now obsolete and can be removed, so drivers can no
longer start using them and instead use y2038-safe interfaces.
Link: http://lkml.kernel.org/r/20200110154232.4104492-2-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit fdde0ff859 ("ACPI: PM: s2idle: Prevent spurious SCIs from
waking up the system") overlooked the fact that fixed events can wake
up the system too and broke RTC wakeup from suspend-to-idle as a
result.
Fix this issue by checking the fixed events in acpi_s2idle_wake() in
addition to checking wakeup GPEs and break out of the suspend-to-idle
loop if the status bits of any enabled fixed events are set then.
Fixes: fdde0ff859 ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system")
Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull NVMe fixes from Keith.
* 'nvme-5.6-rc3' of git://git.infradead.org/nvme:
nvme-multipath: Fix memory leak with ana_log_buf
nvme: Fix uninitialized-variable warning
nvme-pci: Use single IRQ vector for old Apple models
nvme/pci: Add sleep quirk for Samsung and Toshiba drives
This patch drops 'cur_mm' before calling cond_resched(), to prevent
the sq_thread from spinning even when the user process is finished.
Before this patch, if the user process ended without closing the
io_uring fd, the sq_thread continues to spin until the
'sq_thread_idle' timeout ends.
In the worst case where the 'sq_thread_idle' parameter is bigger than
INT_MAX, the sq_thread will spin forever.
Fixes: 6c271ce2f1 ("io_uring: add submission polling")
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While logging the prealloc extents of an inode during a fast fsync we call
btrfs_truncate_inode_items(), through btrfs_log_prealloc_extents(), while
holding a read lock on a leaf of the inode's root (not the log root, the
fs/subvol root), and then that function locks the file range in the inode's
iotree. This can lead to a deadlock when:
* the fsync is ranged
* the file has prealloc extents beyond eof
* writeback for a range different from the fsync range starts
during the fsync
* the size of the file is not sector size aligned
Because when finishing an ordered extent we lock first a file range and
then try to COW the fs/subvol tree to insert an extent item.
The following diagram shows how the deadlock can happen.
CPU 1 CPU 2
btrfs_sync_file()
--> for range [0, 1MiB)
--> inode has a size of
1MiB and has 1 prealloc
extent beyond the
i_size, starting at offset
4MiB
flushes all delalloc for the
range [0MiB, 1MiB) and waits
for the respective ordered
extents to complete
--> before task at CPU 1 locks the
inode, a write into file range
[1MiB, 2MiB + 1KiB) is made
--> i_size is updated to 2MiB + 1KiB
--> writeback is started for that
range, [1MiB, 2MiB + 4KiB)
--> end offset rounded up to
be sector size aligned
btrfs_log_dentry_safe()
btrfs_log_inode_parent()
btrfs_log_inode()
btrfs_log_changed_extents()
btrfs_log_prealloc_extents()
--> does a search on the
inode's root
--> holds a read lock on
leaf X
btrfs_finish_ordered_io()
--> locks range [1MiB, 2MiB + 4KiB)
--> end offset rounded up
to be sector size aligned
--> tries to cow leaf X, through
insert_reserved_file_extent()
--> already locked by the
task at CPU 1
btrfs_truncate_inode_items()
--> gets an i_size of
2MiB + 1KiB, which is
not sector size
aligned
--> tries to lock file
range [2MiB, (u64)-1)
--> the start range
is rounded down
from 2MiB + 1K
to 2MiB to be sector
size aligned
--> but the subrange
[2MiB, 2MiB + 4KiB) is
already locked by
task at CPU 2 which
is waiting to get a
write lock on leaf X
for which we are
holding a read lock
*** deadlock ***
This results in a stack trace like the following, triggered by test case
generic/561 from fstests:
[ 2779.973608] INFO: task kworker/u8:6:247 blocked for more than 120 seconds.
[ 2779.979536] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2779.984503] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2779.990136] kworker/u8:6 D 0 247 2 0x80004000
[ 2779.990457] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[ 2779.990466] Call Trace:
[ 2779.990491] ? __schedule+0x384/0xa30
[ 2779.990521] schedule+0x33/0xe0
[ 2779.990616] btrfs_tree_read_lock+0x19e/0x2e0 [btrfs]
[ 2779.990632] ? remove_wait_queue+0x60/0x60
[ 2779.990730] btrfs_read_lock_root_node+0x2f/0x40 [btrfs]
[ 2779.990782] btrfs_search_slot+0x510/0x1000 [btrfs]
[ 2779.990869] btrfs_lookup_file_extent+0x4a/0x70 [btrfs]
[ 2779.990944] __btrfs_drop_extents+0x161/0x1060 [btrfs]
[ 2779.990987] ? mark_held_locks+0x6d/0xc0
[ 2779.990994] ? __slab_alloc.isra.49+0x99/0x100
[ 2779.991060] ? insert_reserved_file_extent.constprop.19+0x64/0x300 [btrfs]
[ 2779.991145] insert_reserved_file_extent.constprop.19+0x97/0x300 [btrfs]
[ 2779.991222] ? start_transaction+0xdd/0x5c0 [btrfs]
[ 2779.991291] btrfs_finish_ordered_io+0x4f4/0x840 [btrfs]
[ 2779.991405] btrfs_work_helper+0xaa/0x720 [btrfs]
[ 2779.991432] process_one_work+0x26d/0x6a0
[ 2779.991460] worker_thread+0x4f/0x3e0
[ 2779.991481] ? process_one_work+0x6a0/0x6a0
[ 2779.991489] kthread+0x103/0x140
[ 2779.991499] ? kthread_create_worker_on_cpu+0x70/0x70
[ 2779.991515] ret_from_fork+0x3a/0x50
(...)
[ 2780.026211] INFO: task fsstress:17375 blocked for more than 120 seconds.
[ 2780.027480] Not tainted 5.6.0-rc2-btrfs-next-53 #1
[ 2780.028482] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2780.030035] fsstress D 0 17375 17373 0x00004000
[ 2780.030038] Call Trace:
[ 2780.030044] ? __schedule+0x384/0xa30
[ 2780.030052] schedule+0x33/0xe0
[ 2780.030075] lock_extent_bits+0x20c/0x320 [btrfs]
[ 2780.030094] ? btrfs_truncate_inode_items+0xf4/0x1150 [btrfs]
[ 2780.030098] ? rcu_read_lock_sched_held+0x59/0xa0
[ 2780.030102] ? remove_wait_queue+0x60/0x60
[ 2780.030122] btrfs_truncate_inode_items+0x133/0x1150 [btrfs]
[ 2780.030151] ? btrfs_set_path_blocking+0xb2/0x160 [btrfs]
[ 2780.030165] ? btrfs_search_slot+0x379/0x1000 [btrfs]
[ 2780.030195] btrfs_log_changed_extents.isra.8+0x841/0x93e [btrfs]
[ 2780.030202] ? do_raw_spin_unlock+0x49/0xc0
[ 2780.030215] ? btrfs_get_num_csums+0x10/0x10 [btrfs]
[ 2780.030239] btrfs_log_inode+0xf83/0x1124 [btrfs]
[ 2780.030251] ? __mutex_unlock_slowpath+0x45/0x2a0
[ 2780.030275] btrfs_log_inode_parent+0x2a0/0xe40 [btrfs]
[ 2780.030282] ? dget_parent+0xa1/0x370
[ 2780.030309] btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
[ 2780.030329] btrfs_sync_file+0x3f3/0x490 [btrfs]
[ 2780.030339] do_fsync+0x38/0x60
[ 2780.030343] __x64_sys_fdatasync+0x13/0x20
[ 2780.030345] do_syscall_64+0x5c/0x280
[ 2780.030348] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 2780.030356] RIP: 0033:0x7f2d80f6d5f0
[ 2780.030361] Code: Bad RIP value.
[ 2780.030362] RSP: 002b:00007ffdba3c8548 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 2780.030364] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f2d80f6d5f0
[ 2780.030365] RDX: 00007ffdba3c84b0 RSI: 00007ffdba3c84b0 RDI: 0000000000000003
[ 2780.030367] RBP: 000000000000004a R08: 0000000000000001 R09: 00007ffdba3c855c
[ 2780.030368] R10: 0000000000000078 R11: 0000000000000246 R12: 00000000000001f4
[ 2780.030369] R13: 0000000051eb851f R14: 00007ffdba3c85f0 R15: 0000557a49220d90
So fix this by making btrfs_truncate_inode_items() not lock the range in
the inode's iotree when the target root is a log root, since it's not
needed to lock the range for log roots as the protection from the inode's
lock and log_mutex are all that's needed.
Fixes: 28553fa992 ("Btrfs: fix race between shrinking truncate and fiemap")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
kmemleak reports a memory leak with the ana_log_buf allocated by
nvme_mpath_init():
unreferenced object 0xffff888120e94000 (size 8208):
comm "nvme", pid 6884, jiffies 4295020435 (age 78786.312s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<00000000e2360188>] kmalloc_order+0x97/0xc0
[<0000000079b18dd4>] kmalloc_order_trace+0x24/0x100
[<00000000f50c0406>] __kmalloc+0x24c/0x2d0
[<00000000f31a10b9>] nvme_mpath_init+0x23c/0x2b0
[<000000005802589e>] nvme_init_identify+0x75f/0x1600
[<0000000058ef911b>] nvme_loop_configure_admin_queue+0x26d/0x280
[<00000000673774b9>] nvme_loop_create_ctrl+0x2a7/0x710
[<00000000f1c7a233>] nvmf_dev_write+0xc66/0x10b9
[<000000004199f8d0>] __vfs_write+0x50/0xa0
[<0000000065466fef>] vfs_write+0xf3/0x280
[<00000000b0db9a8b>] ksys_write+0xc6/0x160
[<0000000082156b91>] __x64_sys_write+0x43/0x50
[<00000000c34fbb6d>] do_syscall_64+0x77/0x2f0
[<00000000bbc574c9>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
nvme_mpath_init() is called by nvme_init_identify() which is called in
multiple places (nvme_reset_work(), nvme_passthru_end(), etc). This
means nvme_mpath_init() may be called multiple times before
nvme_mpath_uninit() (which is only called on nvme_free_ctrl()).
When nvme_mpath_init() is called multiple times, it overwrites the
ana_log_buf pointer with a new allocation, thus leaking the previous
allocation.
To fix this, free ana_log_buf before allocating a new one.
Fixes: 0d0b660f21 ("nvme: add ANA support")
Cc: <stable@vger.kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
This was noticed when printing debugfs for MSIs on my ARM64 server. The
new dstate IRQD_MSI_NOMASK_QUIRK came out surprisingly while it should only
be the x86 stuff for the time being...
The new MSI quirk flag uses the same bit as IRQ_DOMAIN_NAME_ALLOCATED which
is oddly defined as bit 6 for no good reason.
Switch it to the non used bit 1.
Fixes: 6f1a4891a5 ("x86/apic/msi: Plug non-maskable MSI affinity race")
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200221020725.2038-1-yuzenghui@huawei.com
Fix typos, spellos, etc. in zonefs.txt.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
This is required for clone3 which passes the TLS value through a
struct rather than a register.
Cc: Amanieu d'Antras <amanieu@gmail.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Add the pci related code for csky arch to support basic pci virtual
function, such as qemu virt-pci-9pfs.
Signed-off-by: MaJun <majun258@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Some bsp (eg: buildroot) has defconfig.fragment design to add more
configs into the defconfig in linux source code tree. For example,
we could put different cpu configs into different defconfig.fragments,
but they all use the same defconfig in Linux.
Signed-off-by: Ma Jun <majun258@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
We should give some necessary check for initrd just like other
architectures and it seems that setup_initrd() could be a common
code for all architectures.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
CONFIG_CLKSRC_OF is gone since commit bb0eb050a5
("clocksource/drivers: Rename CLKSRC_OF to TIMER_OF"). The platform
already selects TIMER_OF.
CONFIG_HAVE_DMA_API_DEBUG is gone since commit 6e88628d03 ("dma-debug:
remove CONFIG_HAVE_DMA_API_DEBUG").
CONFIG_DEFAULT_DEADLINE is gone since commit f382fb0bce ("block:
remove legacy IO schedulers").
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Fix wording in help text for the CPU_HAS_LDSTEX symbol.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
During ftrace init, linux will replace all function prologues
(call_mcout) with nops, but it need flush_dcache and
invalidate_icache to make it work. So flush_cache functions
couldn't be nested called by ftrace framework.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Some CPUs don't support icache.va instruction to maintain the whole
smp cores' icache. Using icache.all + IPI casue a lot on performace
and using defer mechanism could reduce the number of calling icache
_flush_all functions.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Only when vma is for VM_EXEC, we need sync dcache & icache. eg:
- gdb ptrace modify user space instruction code area.
Add VM_EXEC condition to reduce unnecessary cache flush.
The abiv1 cpus' cache are all VIPT, so we still need to deal with
dcache aliasing problem. But there is optimized way to use cache
color, just like what's done in arch/csky/abiv1/inc/abi/page.h.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Instead of flushing cache per update_mmu_cache() called, we use
flush_dcache_page to reduce the frequency of flashing the cache.
As abiv2 cpus are all PIPT for icache & dcache, we needn't handle
dcache aliasing problem. But their icache can't snoop dcache, so
we still need sync_icache_dcache in update_mmu_cache().
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
The abiv2 CPUs are all PIPT cache, so there is no need to implement
flush_icache_page function.
The function flush_icache_user_range hasn't been used, so just
remove it.
The function flush_cache_range is not necessary for PIPT cache when
tlb mapping changed.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Some CPUs don't support icache specific instructions to flush icache
lines in broadcast way. We use cpu control registers to flush local
icache and use IPI to notify other cores.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
If we use a non-ipi-support interrupt controller, it will cause panic here.
We should let cpu up and work with CONFIG_SMP, when we use a non-ipi intc.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
In the past, we didn't care about kernel sp when saving pt_reg. But in some
cases, we still need pt_reg->usp to represent the kernel stack before enter
exception.
For cmpxhg in atomic.S, we need save and restore usp for above.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
There is no present bit in csky pmd hardware, so we need to prepare invalid_pte_table
for empty pmd entry and the functions (pmd_none & pmd_present) in pgtable.h need
invalid_pte_talbe to get result. If a module use these functions, we need export the
symbol for it.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Mo Qihui <qihui.mo@verisilicon.com>
Cc: Zhange Jian <zhang_jian5@dahuatech.com>
After fixaddr_init is separated from highmem, we could use tcm
without highmem selected. (610 (abiv1) don't support highmem,
but it could use tcm now.)
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
The implementation are not only used by TCM but also used by sram on
SOC bus. It follow existed linux tcm software interface, so that old
tcm application codes could be re-used directly.
Software interface list in asm/tcm.h:
- Variables/Const: __tcmdata, __tcmconst
- Functions: __tcmfunc, __tcmlocalfunc
- Malloc/Free: tcm_alloc, tcm_free
In linux menuconfig:
- Choose a TCM contain instrctions + data or separated in ITCM/DTCM.
- Determine TCM_BASE (DTCM_BASE) in phyiscal address.
- Determine size of TCM or ITCM(DTCM) in page counts.
Here is hello tcm example from Documentation/arm/tcm.rst which could
be directly used:
/* Uninitialized data */
static u32 __tcmdata tcmvar;
/* Initialized data */
static u32 __tcmdata tcmassigned = 0x2BADBABEU;
/* Constant */
static const u32 __tcmconst tcmconst = 0xCAFEBABEU;
static void __tcmlocalfunc tcm_to_tcm(void)
{
int i;
for (i = 0; i < 100; i++)
tcmvar ++;
}
static void __tcmfunc hello_tcm(void)
{
/* Some abstract code that runs in ITCM */
int i;
for (i = 0; i < 100; i++) {
tcmvar ++;
}
tcm_to_tcm();
}
static void __init test_tcm(void)
{
u32 *tcmem;
int i;
hello_tcm();
printk("Hello TCM executed from ITCM RAM\n");
printk("TCM variable from testrun: %u @ %p\n", tcmvar, &tcmvar);
tcmvar = 0xDEADBEEFU;
printk("TCM variable: 0x%x @ %p\n", tcmvar, &tcmvar);
printk("TCM assigned variable: 0x%x @ %p\n", tcmassigned, &tcmassigned);
printk("TCM constant: 0x%x @ %p\n", tcmconst, &tcmconst);
/* Allocate some TCM memory from the pool */
tcmem = tcm_alloc(20);
if (tcmem) {
printk("TCM Allocated 20 bytes of TCM @ %p\n", tcmem);
tcmem[0] = 0xDEADBEEFU;
tcmem[1] = 0x2BADBABEU;
tcmem[2] = 0xCAFEBABEU;
tcmem[3] = 0xDEADBEEFU;
tcmem[4] = 0x2BADBABEU;
for (i = 0; i < 5; i++)
printk("TCM tcmem[%d] = %08x\n", i, tcmem[i]);
tcm_free(tcmem, 20);
}
}
TODO:
- Separate fixup mapping from highmem
- Support abiv1
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
This is a basic -fstack-protector support without per-task canary
switching. The protector will report something like when stack
corruption is detected:
It's tested with strcpy local array overflow in sys_kill and get:
stack-protector: Kernel stack is corrupted in: sys_kill+0x23c/0x23c
TODO:
- Support task switch for different cannary
Signed-off-by: Mao Han <han_mao@c-sky.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
During an online resize an array of pointers to s_group_info gets replaced
so it can get enlarged. If there is a concurrent access to the array in
ext4_get_group_info() and this memory has been reused then this can lead to
an invalid memory access.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-3-tytso@mit.edu
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Balbir Singh <sblbir@amazon.com>
Cc: stable@kernel.org
During an online resize an array of pointers to buffer heads gets
replaced so it can get enlarged. If there is a racing block
allocation or deallocation which uses the old array, and the old array
has gotten reused this can lead to a GPF or some other random kernel
memory getting modified.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-2-tytso@mit.edu
Reported-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
drm/i915 fixes for v5.6-rc3:
- Workaround missing Display Stream Compression (DSC) state readout by
forcing modeset when its enabled at probe
- Fix EHL port clock voltage level requirements
- Fix queuing retire workers on the virtual engine
- Fix use of partially initialized waiters
- Stop using drm_pci_alloc/drm_pci/free
- Fix rewind of RING_TAIL by forcing a context reload
- Fix locking on resetting ring->head
- Propagate our bug filing URL change to stable kernels
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87y2sxtsrd.fsf@intel.com
- Fix dt binding for sunxi.
- Allow only 1 rotation argument, and allow 0 rotation in video cmdline.
- Small compiler warning fix for panfrost.
- Fix when using performance counters in panfrost when using per fd address space.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuXvWqAysSYEJGuVH/lWMcqZwE8MFAl5OWawACgkQ/lWMcqZw
E8M3UxAAlUvBWQvrGz6bLD3NANF5nW5KjARPNFhwr7uyI67MExZoBh9WqJejhoQ6
4dMv9BwT81EXhcaxyNjdshwkHktsvPr/ouEdO256e0UC1hs7zWOEapL7xXlV7dwe
pvEUUnG07Umh50Df39jcOla4YgnoqYRwW7E7SMadvDuo81UJ6+Daf8we+PO5w2zD
IbZsOfPBMn5NhyzXgnynlp3Y8df521EDb71x3R4d4vAyOoRE2Axmd7xZuqqkt59P
BDxj02glXIZsSt5OLTPFQdlv7rYXo/Y52wulBDIDup6N/wD+9jlSMFL+OOQby3rP
Q5Ve6TkLrSdkZWFVNsMMubylEw+CtNYForZb9J9uo0M7+PsP3tApLP1CYbi5lvI0
yIff8986H5U8I3DaETugwyPTMdnWnnqRsQN57A8WYbQV5YLSx7bUqV0bgW6pucJP
yC0e0h7367sgYvtENCIxvQ1sNUxiEz0QfppN1xW55JLsEerghomF8vzQNQJd1/Iy
4GnHdvsB6NrBH1Ebzu3Ibj5hj5Y15znJlfhgFHuUwY0aiAW5cf4a+wH7EQdTt7T9
ufBM9DFiySBE4xhffHo8JpEMOQVrabBfZzs8qg0RMT899DMPTpjW2OIoblDfuck0
7LYfV/xU9qJMSsBA9X4G3+F/cH7EFikdNENEwJ2hyv04unpc/Ww=
=NWxB
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-fixes-2020-02-20' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.6-rc3:
- Fix dt binding for sunxi.
- Allow only 1 rotation argument, and allow 0 rotation in video cmdline.
- Small compiler warning fix for panfrost.
- Fix when using performance counters in panfrost when using per fd address space.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f5a6370d-9898-6c72-43e4-5bb56a99b6f2@linux.intel.com
Michael Chan says:
====================
bnxt_en: shutdown and kexec/kdump related fixes.
2 small patches to fix kexec shutdown and kdump kernel driver init issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
If crashed kernel does not shutdown the NIC properly, PCIe FLR
is required in the kdump kernel in order to initialize all the
functions properly.
Fixes: d629522e1d ("bnxt_en: Reduce memory usage when running in kdump kernel.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Especially when bnxt_shutdown() is called during kexec, we need to
disable MSIX and disable Bus Master to completely quiesce the device.
Make these 2 calls unconditionally in the shutdown method.
Fixes: c20dc142dd ("bnxt_en: Disable bus master during PCI shutdown and driver unload.")
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since nl_groups is a u32 we can't bind more groups via ->bind
(netlink_bind) call, but netlink has supported more groups via
setsockopt() for a long time and thus nlk->ngroups could be over 32.
Recently I added support for per-vlan notifications and increased the
groups to 33 for NETLINK_ROUTE which exposed an old bug in the
netlink_bind() code causing out-of-bounds access on archs where unsigned
long is 32 bits via test_bit() on a local variable. Fix this by capping the
maximum groups in netlink_bind() to BITS_PER_TYPE(u32), effectively
capping them at 32 which is the minimum of allocated groups and the
maximum groups which can be bound via netlink_bind().
CC: Christophe Leroy <christophe.leroy@c-s.fr>
CC: Richard Guy Briggs <rgb@redhat.com>
Fixes: 4f52090052 ("netlink: have netlink per-protocol bind function return an error code.")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While it is not yet understood why a TX underflow can easily occur
for SGMII interfaces resulting in a TX wedge. It has been found that
disabling/re-enabling the LMAC resolves the issue.
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Reviewed-by: Robert Jones <rjones@gateworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fw_status field is only 8 bits, so fix the read. Also,
we only want to look at the one status bit, to allow for future
use of the other bits, and watch for a bad PCI read.
Fixes: 97ca486592 ("ionic: add heartbeat check")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IMA fixes from Mimi Zohar:
"Two bug fixes and an associated change for each.
The one that adds SM3 to the IMA list of supported hash algorithms is
a simple change, but could be considered a new feature"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add sm3 algorithm to hash algorithm configuration list
crypto: rename sm3-256 to sm3 in hash_algo_name
efi: Only print errors about failing to get certs if EFI vars are found
x86/ima: use correct identifier for SetupMode variable
The description says 'If unsure, say N.' but
the module is built as M by default (once
the dependencies are satisfied).
When the module is selected (Y or M), it enables
NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS
which alter kernel internal structures.
We (Android Studio Emulator) currently do not
use this module and think this it is more consistent
to have it disabled by default as opposite to
disabling it explicitly to prevent enabling
NETFILTER_FAMILY_BRIDGE and SKB_EXTENSIONS.
Signed-off-by: Roman Kiryanov <rkir@google.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
at91ether_init was handling the phy mode and speed but since the switch to
phylink, the NCFGR register got overwritten by macb_mac_config(). The issue
is that the RM9200_RMII bit and the MACB_CLK_DIV32 field are cleared
but never restored as they conflict with the PAE, GBE and PCSSEL bits.
Add new capability to differentiate between EMAC and the other versions of
the IP and use it to set and avoid clearing the relevant bits.
Also, this fixes a NULL pointer dereference in macb_mac_link_up as the EMAC
doesn't use any rings/bufffers/queues.
Fixes: 7897b071ac ("net: macb: convert to phylink")
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xen_maybe_preempt_hcall() is called from the exception entry point
xen_do_hypervisor_callback with interrupts disabled.
_cond_resched() evades the might_sleep() check in cond_resched() which
would have caught that and schedule_debug() unfortunately lacks a check
for irqs_disabled().
Enable interrupts around the call and use cond_resched() to catch future
issues.
Fixes: fdfd811ddd ("x86/xen: allow privcmd hypercalls to be preempted")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/878skypjrh.fsf@nanos.tec.linutronix.de
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>