linux

Author	SHA1	Message	Date
Chengming Zhou	1bf6ece573	iocost_monitor: start from the oldest usage index iocg usage_idx is the latest usage index, we should start from the oldest usage index to show the consecutive NR_USAGE_SLOTS usages. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-30 11:45:29 -06:00
Chengming Zhou	d9012a59db	iocost: Fix check condition of iocg abs_vdebt We shouldn't skip iocg when its abs_vdebt is not zero. Fixes: `0b80f9866e` ("iocost: protect iocg->abs_vdebt with iocg->waitq.lock") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-30 11:45:12 -06:00
Pavel Begunkov	01cec8c18f	io_uring: get rid of atomic FAA for cq_timeouts If ->cq_timeouts modifications are done under ->completion_lock, we don't really nee any fetch-and-add and other complex atomics. Replace it with non-atomic FAA, that saves an implicit full memory barrier. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-30 11:42:21 -06:00
Pavel Begunkov	4693014340	io_uring: consolidate *_check_overflow accounting Add a helper to mark ctx->{cq,sq}_check_overflow to get rid of duplicates, and it's clearer to check cq_overflow_list directly anyway. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-30 11:42:21 -06:00
Pavel Begunkov	dd9dfcdf5a	io_uring: fix stalled deferred requests Always do io_commit_cqring() after completing a request, even if it was accounted as overflowed on the CQ side. Failing to do that may lead to not to pushing deferred requests when needed, and so stalling the whole ring. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-30 11:42:21 -06:00
Pavel Begunkov	b2bd1cf99f	io_uring: fix racy overflow count reporting All ->cq_overflow modifications should be under completion_lock, otherwise it can report a wrong number to the userspace. Fix it in io_uring_cancel_files(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-30 11:42:21 -06:00
Pavel Begunkov	81b68a5ca0	io_uring: deduplicate __io_complete_rw() Call __io_complete_rw() in io_iopoll_queue() instead of hand coding it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-30 11:42:21 -06:00
Pavel Begunkov	010e8e6be2	io_uring: de-unionise io_kiocb As io_kiocb have enough space, move ->work out of a union. It's safer this way and removes ->work memcpy bouncing. By the way make tabulation in struct io_kiocb consistent. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-30 11:42:21 -06:00
Lukasz Luba	0de967f24e	thermal: Update power allocator and devfreq cooling to SPDX licensing Update the license to the SPDX licensing format. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200730165117.13998-1-lukasz.luba@arm.com	2020-07-30 19:26:10 +02:00
Francesco Ruggeri	024a8168b7	igb: reinit_locked() should be called with rtnl_lock We observed two panics involving races with igb_reset_task. The first panic is caused by this race condition: kworker reboot -f igb_reset_task igb_reinit_locked igb_down napi_synchronize __igb_shutdown igb_clear_interrupt_scheme igb_free_q_vectors igb_free_q_vector adapter->q_vector[v_idx] = NULL; napi_disable Panics trying to access adapter->q_vector[v_idx].napi_state The second panic (a divide error) is caused by this race: kworker reboot -f tx packet igb_reset_task __igb_shutdown rtnl_lock() ... igb_clear_interrupt_scheme igb_free_q_vectors adapter->num_tx_queues = 0 ... rtnl_unlock() rtnl_lock() igb_reinit_locked igb_down igb_up netif_tx_start_all_queues dev_hard_start_xmit igb_xmit_frame igb_tx_queue_mapping Panics on r_idx % adapter->num_tx_queues This commit applies to igb_reset_task the same changes that were applied to ixgbe in commit `2f90b8657e` ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver"), commit `8f4c5c9fb8` ("ixgbe: reinit_locked() should be called with rtnl_lock") and commit `88adce4ea8` ("ixgbe: fix possible race in reset subtask"). Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-30 10:05:14 -07:00
Aaron Ma	1050242fa6	e1000e: continue to init PHY even when failed to disable ULP After 'commit `e086ba2fcc` ("e1000e: disable s0ix entry and exit flows for ME systems")', ThinkPad P14s always failed to disable ULP by ME. 'commit `0c80cdbf33` ("e1000e: Warn if disabling ULP failed")' break out of init phy: error log: [ 42.364753] e1000e 0000:00:1f.6 enp0s31f6: Failed to disable ULP [ 42.524626] e1000e 0000:00:1f.6 enp0s31f6: PHY Wakeup cause - Unicast Packet [ 42.822476] e1000e 0000:00:1f.6 enp0s31f6: Hardware Error When disable s0ix, E1000_FWSM_ULP_CFG_DONE will never be 1. If continue to init phy like before, it can work as before. iperf test result good too. Fixes: `0c80cdbf33` ("e1000e: Warn if disabling ULP failed") Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-30 10:04:54 -07:00
Rafael J. Wysocki	a7ee88c3d3	Update devfreq for 5.9 Detailed description for this pull request: 1. Update devfreq core - Add support delayed timer for polling mode. Until now, the devfreq supports only deferrable timer for reducing the unneeded CPU wakeup. But, it has a problem for Non-CPU device like DMC device with DMA operation. These Non-CPU device need to monitor continuously regardless of CPU state. Add support the delayed timer for polling mode to support the continuous monitoring. - Fix indentation of result of devfreq_summary debugfs node. - Fix the wrong end of code with semicolon instead of comma - Clean-up code to use the unified local variable name in sysfs-related internal funcitons. - Fix trivial spelling for devfreq-event.c. 2. Update devfreq driver - Add the exception handling code to control when rockchip,pmu property is absent for rk3399_dmc.c. - Add missing 'rockchip,pmu' property to dt-binding document for rk3399_dmc.c. - Change the kind of timer of exynos5422-dmc.c from deferrable to delayed timer in order to monitor the DMC (Dynamic Memory Controller) status regardless of CPU idle state. And adjust the polling interval and upthreshold value in order to react faster and make better decisions when benchmarking testing for the memory behavior. - Add module parameter to either enable or disable the IRQ mode for DMC behavior monitoring. The exynos5422-dmc.c can operate in both polling and IRQ mode. The user can choose the monitoring mode by using module param. The default monitoring mode is polling mode with delayed timer. 3. Add maintainer entry - Add Dmitry Osipenko <digetx@gmail.com> as maintainer for memory frequency scaling drivers for Nvidia Tegra. He have been developed and reviewed the tegra-devfreq.c. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEsSpuqBtbWtRe4rLGnM3fLN7rz1MFAl8iiJ0WHGN3MDAuY2hv aUBzYW1zdW5nLmNvbQAKCRCczd8s3uvPU+WbD/0TkndmsnqXgzkLsyAUFgWsRy3N LL8xwtHXmM76ujT5m5UH5A+BHp3Ex9SsGA4xJ9cr7C3Reg2OmSKe8BZjkA52fwDE 2qu0CHB4IP41EjS3skBqiEhSGdFPv7xd9z39dk6xgUNaQM3yEmcrtPI96jx2CYE9 WYroUl8Lc1uU9fnV+1dyah7nK9p+pi27QqFQBdyOLikOpi2qq5loY6EsBjDq8bym Lv5VjgpI5cpBflolf1y5Zi6p+qFHNUroPz5iOnYJIUNqgKUHEhY8CdGVlLynQTo/ IOLXvhuGQc7q2grFKUjHGTAps+YV2lbY8j8WZl+ujhLTkCxme/XILHXe7b2GHHZy TleViwsdhL0lYkGCOrla66qFn2kNIXMjEnRJ3GfL7wRUFliS6IlFrg50/TLws7Qe RogI+rM/LuBPM9H4IDy5WTglChnctOxc2sSmbWKy2u1LoDMxfR/SIEwjvdFq/enx U0fE/vpXrJkADPSk/4+W/AdnnV2JmIFKlHoy83cZYzp5KHq9voQOv575sMkvSYRl hRc9Y8zxYtPOS9cJGV/nxgyEfp/gkOpcwrvy/uPuOqVMLC//ZEK/gR78nfT1YvJ3 c6ODnY8wpK+HZdqhWqc7SXWA9kK3BZrrDRkDBRPXthVOvyvKcifKn9AjVETqRGDu OPpZ19FZqIy3KMVMEg== =Iw2C -----END PGP SIGNATURE----- Merge tag 'devfreq-next-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq updates for v5.9 from Chanwoo Choi: 1. Update devfreq core - Add delayed timer support for polling mode. Until now, devfreq supported only deferrable timer to avoid unneeded CPU wakeups. However, it has a problem for non-CPU devices, like DMC, doing DMA. Such devices need to monitor continuously regardless of the CPU state, so delayed timer support for the polling mode to facilitate the continuous monitoring. - Fix indentation of result of devfreq_summary debugfs node. - Fix the wrong end of code with a semicolon instead of a comma. - Clean-up code to use a unified local variable name in sysfs-related internal funcitons. - Fix trivial spelling mistake in devfreq-event.c. 2. Update devfreq drivers - Add the exception handling code to control when rockchip,pmu property is absent for rk3399_dmc.c. - Add missing 'rockchip,pmu' property to dt-binding document for rk3399_dmc.c. - Change the type of timer in exynos5422-dmc.c from deferrable to delayed in order to monitor the DMC (Dynamic Memory Controller) status regardless of the CPU idle state. Also adjust the polling interval and upthreshold value in order to react faster and make better decisions when benchmarking testing for the memory behavior. - Add module parameter to either enable or disable the IRQ mode for DMC behavior monitoring. exynos5422-dmc.c can operate in both the polling and the IRQ mode. The user can choose the monitoring mode via a module param. The default monitoring mode is the polling mode with a delayed timer. 3. Add maintainer entry - Add Dmitry Osipenko <digetx@gmail.com> as maintainer for memory frequency scaling drivers for Nvidia Tegra. He has developed and reviewed tegra-devfreq.c. * tag 'devfreq-next-for-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: Fix the wrong end with semicolon PM / devfreq: Fix indentaion of devfreq_summary debugfs node PM / devfreq: Clean up the devfreq instance name in sysfs attr memory: samsung: exynos5422-dmc: Add module param to control IRQ mode memory: samsung: exynos5422-dmc: Adjust polling interval and uptreshold memory: samsung: exynos5422-dmc: Use delayed timer as default PM / devfreq: Add support delayed timer for polling mode dt-bindings: devfreq: rk3399_dmc: Add rockchip,pmu phandle PM / devfreq: tegra: Add Dmitry as a maintainer PM / devfreq: event: Fix trivial spelling PM / devfreq: rk3399_dmc: Fix kernel oops when rockchip,pmu is absent	2020-07-30 18:52:15 +02:00
Linus Torvalds	e2c46b5762	block-5.8-2020-07-30 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl8i5PQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsKUD/wPkhv/x0DvnPMlXICmH3sejGM4PWxHSQMO buFsOIY6qrJPygmWdyMNPBvBF5Gq1gkykBAEYHWZj2obVyiM3W+5kBsH8J5+sIHH ImnAsmKK7KBoYaneMMLh7yAK//uEYja1avy5b2J+1rHzBr4ODQYAJCYGL4fpE+f3 ZOlAFicbxeYI1PIsSFHJbDut5fDdnAq8cDLTMz3feP0PSeJxCuNkYnQHFoehoI7l WkhvO3zh/TBzs+ApZtZZHpUn6u/526+I24lKvzjKMIYUPqGsm7C/LKoECArAcHw3 wrIJsbpIDYcDqLcxcGSpIEANPB3UC9gDCGMPwpz2y5rF1FRJ6pgtw/jsTDQiMUYP k5ZeOQFTjs9jUowEItNe0Fr7bvgEWE1fyUCrC6FW4SFG1WLjbuKT+XXuDwE35gF/ mMx+rx7kXJ6YaP1dA0YR57UucEHy/tL1pWhQ6GkrTMjGtTO4V0bG4o+3B+qQXaEn 9sQpJfevpX+oWsNYV1h+8kyjLjAOVNMJdJ+hWWPsA+2kGsvx6+DN921EgPAzgqTs APOxELVs0ERxJ6NBRfPACkL80f8VE/K+DbnsoAVU/WOgitwWXE2VlEqxhxbHAyR+ 6Sj/BAmviiaWKDXA1A3RVfvaMCIGkxI3CndD20gcb3TN28QhovdKct4JANa0xxpf S76OOlMFbA== =IaBu -----END PGP SIGNATURE----- Merge tag 'block-5.8-2020-07-30' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Three NVMe fixes" * tag 'block-5.8-2020-07-30' of git://git.kernel.dk/linux-block: nvme: add a Identify Namespace Identification Descriptor list quirk nvme-pci: prevent SK hynix PC400 from using Write Zeroes command nvme-tcp: fix possible hang waiting for icresp response	2020-07-30 09:48:51 -07:00
Linus Torvalds	0513b9d75c	io_uring-5.8-2020-07-30 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl8i5KMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgFqD/40SBrTepXq3QfVd3rAplHzbrzbNmMvYX7l w1h62AnKLBgdN/BSUkrpetBRGMZ2vPI0dAaHnRP8jrZQY3tjMA7vd75kYZxEe0fF BkogWAQzAlpOmX5I958yQVCbvv4G5mLMU+YXJ5cWUCR8x0OCOj+56GKOe27KnWYg WZCWNERqra/9g4uzpSGY7NUoHQwazA84yJP3eM19c7f+qP8R7NCVdELRCnUGQ6qa Mumu8MHJVL6MlaY9w9woyotJr4Xw5DdH6+durKzRLuB7EQr4R2SZiI5C7Lj2G8Jp KBbCvAQ8bvwJGPz9RIKDhe6f9wnOjL8TMqihBLUWJSwteIV1RwkqoYUJwgWK7guz 6EZlmATYNfAkJhtD95XE3S9D4Ayeg1MCUXOKp81pKE6NAosewPw2LIfTklkJ44Ya ahTI9aEN2yAyJQIO0EbovfIK6CHbHJucqtkoYcsbNwi2WsV6x3rva+OaCq2Os65D MWfzevgh7ZuJKoMw8ZFp+N5VNG8AIf3iUY3FR7Yv+/W5gnBNo2xu49okos72I1fC 1u+41QM8m4Lc4AqYmsZE6a/OgWJng0cuwi/vC026r7UuGh9k48tDDbVlBQoAgbSd Q7SoXfiLSYZmV6uph9pawca4zc8dgd+Axb75zhTnbHCARxltryEezefbribQxe8b xzp+qOTjfg== =bQqk -----END PGP SIGNATURE----- Merge tag 'io_uring-5.8-2020-07-30' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two small fixes for corner/error cases" * tag 'io_uring-5.8-2020-07-30' of git://git.kernel.dk/linux-block: io_uring: fix lockup in io_fail_links() io_uring: fix ->work corruption with poll_add	2020-07-30 09:47:07 -07:00
Rafael J. Wysocki	de002c55ca	cpufreq: intel_pstate: Fix EPP setting via sysfs in active mode Because intel_pstate_set_energy_pref_index() reads and writes the MSR_HWP_REQUEST register without using the cached value of it used by intel_pstate_hwp_boost_up() and intel_pstate_hwp_boost_down(), those functions may overwrite the value written by it and so the EPP value set via sysfs may be lost. To avoid that, make intel_pstate_set_energy_pref_index() take the cached value of MSR_HWP_REQUEST just like the other two routines mentioned above and update it with the new EPP value coming from user space in addition to updating the MSR. Note that the MSR itself still needs to be updated too in case hwp_boost is unset or the boosting mechanism is not active at the EPP change time. Fixes: `e0efd5be63` ("cpufreq: intel_pstate: Add HWP boost utility and sched util hooks") Reported-by: Francisco Jerez <currojerez@riseup.net> Cc: 4.18+ <stable@vger.kernel.org> # 4.18+: 3da97d4db8ee cpufreq: intel_pstate: Rearrange ... Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-07-30 18:20:23 +02:00
Rafael J. Wysocki	3a95717606	cpufreq: intel_pstate: Rearrange the storing of new EPP values Move the locking away from intel_pstate_set_energy_pref_index() into its only caller and drop the (now redundant) return_pref label from it. Also move the "raw" EPP value check into the caller of that function, so as to do it before acquiring the mutex, and reduce code duplication related to the "raw" EPP values processing somewhat. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-07-30 18:19:52 +02:00
Willy Tarreau	1c9df907da	random: fix circular include dependency on arm64 after addition of percpu.h Daniel Díaz and Kees Cook independently reported that commit `f227e3ec3b` ("random32: update the net random state on interrupt and activity") broke arm64 due to a circular dependency on include files since the addition of percpu.h in random.h. The correct fix would definitely be to move all the prandom32 stuff out of random.h but for backporting, a smaller solution is preferred. This one replaces linux/percpu.h with asm/percpu.h, and this fixes the problem on x86_64, arm64, arm, and mips. Note that moving percpu.h around didn't change anything and that removing it entirely broke differently. When backporting, such options might still be considered if this patch fails to help. [ It turns out that an alternate fix seems to be to just remove the troublesome <asm/pointer_auth.h> remove from the arm64 <asm/smp.h> that causes the circular dependency. But we might as well do the whole belt-and-suspenders thing, and minimize inclusion in <linux/random.h> too. Either will fix the problem, and both are good changes. - Linus ] Reported-by: Daniel Díaz <daniel.diaz@linaro.org> Reported-by: Kees Cook <keescook@chromium.org> Tested-by: Marc Zyngier <maz@kernel.org> Fixes: `f227e3ec3b` Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-07-30 09:15:17 -07:00
Wei Yongjun	2e4770a566	PCI: rpadlpar: Make functions static The sparse tool report build warnings as follows: drivers/pci/hotplug/rpadlpar_core.c:355:5: warning: symbol 'dlpar_remove_pci_slot' was not declared. Should it be static? drivers/pci/hotplug/rpadlpar_core.c:461:12: warning: symbol 'rpadlpar_io_init' was not declared. Should it be static? drivers/pci/hotplug/rpadlpar_core.c:473:6: warning: symbol 'rpadlpar_io_exit' was not declared. Should it be static? Those functions are not used outside of this file, so mark them static. Also mark rpadlpar_io_exit() as __exit. Link: https://lore.kernel.org/r/20200721151735.41181-1-weiyongjun1@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2020-07-30 11:11:13 -05:00
John Garry	6a7389f031	MAINTAINERS: Include drivers subdirs for ARM PMU PROFILING AND DEBUGGING entry Ensure that the ARM PMU PROFILING AND DEBUGGING maintainers are included for the HiSilicon PMU driver. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1592392648-128331-1-git-send-email-john.garry@huawei.com Signed-off-by: Will Deacon <will@kernel.org>	2020-07-30 17:05:34 +01:00
Robin Murphy	05fb3dbda1	arm64: csum: Fix handling of bad packets Although iph is expected to point to at least 20 bytes of valid memory, ihl may be bogus, for example on reception of a corrupt packet. If it happens to be less than 5, we really don't want to run away and dereference 16GB worth of memory until it wraps back to exactly zero... Fixes: `0e455d8e80` ("arm64: Implement optimised IP checksum helpers") Reported-by: guodeqing <geffrey.guo@huawei.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will@kernel.org>	2020-07-30 17:01:38 +01:00
Marc Zyngier	835d1c3a98	arm64: Drop unnecessary include from asm/smp.h asm/pointer_auth.h is not needed anymore in asm/smp.h, as `62a679cb28` ("arm64: simplify ptrauth initialization") removed the keys from the secondary_data structure. This also cures a compilation issue introduced by `f227e3ec3b` ("random32: update the net random state on interrupt and activity"). Fixes: `62a679cb28` ("arm64: simplify ptrauth initialization") Fixes: `f227e3ec3b` ("random32: update the net random state on interrupt and activity") Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>	2020-07-30 16:55:32 +01:00
Sami Tolvanen	966a0acce2	arm64/alternatives: move length validation inside the subsection Commit `f7b93d4294` ("arm64/alternatives: use subsections for replacement sequences") breaks LLVM's integrated assembler, because due to its one-pass design, it cannot compute instruction sequence lengths before the layout for the subsection has been finalized. This change fixes the build by moving the .org directives inside the subsection, so they are processed after the subsection layout is known. Fixes: `f7b93d4294` ("arm64/alternatives: use subsections for replacement sequences") Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1078 Link: https://lore.kernel.org/r/20200730153701.3892953-1-samitolvanen@google.com Signed-off-by: Will Deacon <will@kernel.org>	2020-07-30 16:50:14 +01:00
Vaibhav Gupta	bac6631728	ixgbevf: use generic power management With legacy PM, drivers themselves were responsible for managing the device's power states and takes care of register states. After upgrading to the generic structure, PCI core will take care of required tasks and drivers should do only device-specific operations. The driver was invoking PCI helper functions like pci_save/restore_state(), and pci_enable/disable_device(), which is not recommended. Compile-tested only. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-30 08:44:17 -07:00
Vaibhav Gupta	6f82b25587	ixgbe: use generic power management With legacy PM hooks, it was the responsibility of a driver to manage PCI states and also the device's power state. The generic approach is to let PCI core handle the work. ixgbe_suspend() calls __ixgbe_shutdown() to perform intermediate tasks. __ixgbe_shutdown() modifies the value of "wake" (device should be wakeup enabled or not), responsible for controlling the flow of legacy PM. Since, PCI core has no idea about the value of "wake", new code for generic PM may produce unexpected results. Thus, use "device_set_wakeup_enable()" to wakeup-enable the device accordingly. Compile-tested only. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-30 08:39:41 -07:00
Chen Yu	a472ad2bce	intel_idle: Customize IceLake server support On ICX platform, the C1E auto-promotion is enabled by default. As a result, the CPU might fall into C1E more offen than previous platforms. Besides, the C1E is not exposed to sysfs on ICX, which is inconsistent with previous server platforms. So disable C1E auto-promotion and expose C1E as a separate idle state, so the C1E and C6 can be disabled via sysfs when necessary. Beside C1 and C1E, the exit latency of C6 was measured by a dedicated tool. However the exit latency(41us) exposed by _CST is much smaller than the one we measured(128us). This is probably due to the _CST uses the exit latency when woken up from PC0+C6, rather than PC6+C6 when C6 was measured. Choose the latter as we need the longest latency in theory. Reported-by: kernel test robot <lkp@intel.com> Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-30 17:36:10 +02:00
Vaibhav Gupta	e9c971bdab	igbvf: use generic power management Remove legacy PM callbacks and use generic operations. With legacy code, drivers were responsible for handling PCI PM operations like pci_save_state(). In generic code, all these are handled by PCI core. The generic suspend() and resume() are called at the same point the legacy ones were called. Thus, it does not affect the normal functioning of the driver. __maybe_unused attribute is used with .resume() but not with .suspend(), as .suspend() is called by .shutdown(). Compile-tested only. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-30 08:36:08 -07:00
Vaibhav Gupta	bc5cbd73eb	iavf: use generic power management With the support of generic PM callbacks, drivers no longer need to use legacy .suspend() and .resume() in which they had to maintain PCI states changes and device's power state themselves. The required operations are done by PCI core. PCI drivers are not expected to invoke PCI helper functions like pci_save/restore_state(), pci_enable/disable_device(), pci_set_power_state(), etc. Their tasks are completed by PCI core itself. Compile-tested only. Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>	2020-07-30 08:32:03 -07:00
Weilong Chen	a96b0d061d	virtio-mem: Fix build error due to improper use 'select' As noted in: https://www.kernel.org/doc/Documentation/kbuild/kconfig-language.txt "select should be used with care. select will force a symbol to a value without visiting the dependencies." Config VIRTIO_MEM should not select CONTIG_ALLOC directly. Otherwise it will cause an error: https://bugzilla.kernel.org/show_bug.cgi?id=208245 Signed-off-by: Weilong Chen <chenweilong@huawei.com> Link: https://lore.kernel.org/r/20200619080333.194753-1-chenweilong@huawei.com Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>	2020-07-30 11:28:17 -04:00
Rafael J. Wysocki	f46cf33531	Merge branch 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull operating performance points (OPP) framework updates for v5.9 from Viresh Kumar: "This contains following changes: - Fix HTTP links (Alexander A. Klimov). - Allow disabled OPPs in dev_pm_opp_get_freq() (Andrew-sh.Cheng). - Add missing export (Valdis Kletnieks)." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: opp: Allow disabled OPPs in dev_pm_opp_get_freq() opp: ti-opp-supply: Replace HTTP links with HTTPS ones opp: core: Add missing export for dev_pm_opp_adjust_voltage	2020-07-30 17:27:46 +02:00
Logan Gunthorpe	dea286bb71	PCI/P2PDMA: Allow P2PDMA on AMD Zen and newer CPUs Allow P2PDMA if the CPU vendor is AMD and family is 0x17 (Zen) or greater. [bhelgaas: commit log, simplify #if/#else/#endif] Link: https://lore.kernel.org/r/20200729231844.4653-1-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Huang Rui <ray.huang@amd.com>	2020-07-30 10:18:17 -05:00
Marc Zyngier	16314874b1	Merge branch 'kvm-arm64/misc-5.9' into kvmarm-master/next Signed-off-by: Marc Zyngier <maz@kernel.org>	2020-07-30 16:13:04 +01:00
Alex Deucher	87004abfbc	Revert "drm/amdgpu: Fix NULL dereference in dpm sysfs handlers" This regressed some working configurations so revert it. Will fix this properly for 5.9 and backport then. This reverts commit `38e0c89a19`. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2020-07-30 11:03:28 -04:00
Will Deacon	022c8328dc	KVM: arm64: Move S1PTW S2 fault logic out of io_mem_abort() To allow for re-injection of stage-2 faults on stage-1 page-table walks due to either a missing or read-only memslot, move the triage logic out of io_mem_abort() and into kvm_handle_guest_abort(), where these aborts can be handled before anything else. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200729102821.23392-5-will@kernel.org	2020-07-30 16:02:37 +01:00
Will Deacon	54dc0d2404	KVM: arm64: Don't skip cache maintenance for read-only memslots If a guest performs cache maintenance on a read-only memslot, we should inform userspace rather than skip the instruction altogether. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200729102821.23392-4-will@kernel.org	2020-07-30 16:02:37 +01:00
Will Deacon	84b951a803	KVM: arm64: Handle data and instruction external aborts the same way If the guest generates a synchronous external abort which is not handled by the host, we inject it back into the guest as a virtual SError, but only if the original fault was reported on the data side. Instruction faults are reported as "Unsupported FSC", causing the vCPU run loop to bail with -EFAULT. Although synchronous external aborts from a guest are pretty unusual, treat them the same regardless of whether they are taken as data or instruction aborts by EL2. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200729102821.23392-3-will@kernel.org	2020-07-30 16:02:37 +01:00
Mazin Rezk	fde9f39ac7	drm/amd/display: Clear dm_state for fast updates This patch fixes a race condition that causes a use-after-free during amdgpu_dm_atomic_commit_tail. This can occur when 2 non-blocking commits are requested and the second one finishes before the first. Essentially, this bug occurs when the following sequence of events happens: 1. Non-blocking commit #1 is requested w/ a new dm_state #1 and is deferred to the workqueue. 2. Non-blocking commit #2 is requested w/ a new dm_state #2 and is deferred to the workqueue. 3. Commit #2 starts before commit #1, dm_state #1 is used in the commit_tail and commit #2 completes, freeing dm_state #1. 4. Commit #1 starts after commit #2 completes, uses the freed dm_state 1 and dereferences a freelist pointer while setting the context. Since this bug has only been spotted with fast commits, this patch fixes the bug by clearing the dm_state instead of using the old dc_state for fast updates. In addition, since dm_state is only used for its dc_state and amdgpu_dm_atomic_commit_tail will retain the dc_state if none is found, removing the dm_state should not have any consequences in fast updates. This use-after-free bug has existed for a while now, but only caused a noticeable issue starting from 5.7-rc1 due to `3202fa62f` ("slub: relocate freelist pointer to middle of object") moving the freelist pointer from dm_state->base (which was unused) to dm_state->context (which is dereferenced). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207383 Fixes: `bd200d190f` ("drm/amd/display: Don't replace the dc_state for fast updates") Reported-by: Duncan <1i5t5.duncan@cox.net> Signed-off-by: Mazin Rezk <mnrzk@protonmail.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org	2020-07-30 11:02:10 -04:00
Peilin Ye	543e8669ed	drm/amdgpu: Prevent kernel-infoleak in amdgpu_info_ioctl() Compiler leaves a 4-byte hole near the end of `dev_info`, causing amdgpu_info_ioctl() to copy uninitialized kernel stack memory to userspace when `size` is greater than 356. In 2015 we tried to fix this issue by doing `= {};` on `dev_info`, which unfortunately does not initialize that 4-byte hole. Fix it by using memset() instead. Cc: stable@vger.kernel.org Fixes: `c193fa91b9` ("drm/amdgpu: information leak in amdgpu_info_ioctl()") Fixes: `d38ceaf99e` ("drm/amdgpu: add core driver (v4)") Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2020-07-30 11:02:10 -04:00
Will Deacon	c9a636f29b	KVM: arm64: Rename kvm_vcpu_dabt_isextabt() kvm_vcpu_dabt_isextabt() is not specific to data aborts and, unlike kvm_vcpu_dabt_issext(), has nothing to do with sign extension. Rename it to 'kvm_vcpu_abt_issea()'. Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Link: https://lore.kernel.org/r/20200729102821.23392-2-will@kernel.org	2020-07-30 15:59:28 +01:00
Marc Zyngier	236a559919	Merge branch 'kvm-arm64/el2-obj-v4.1' into kvmarm-master/next Signed-off-by: Marc Zyngier <maz@kernel.org>	2020-07-30 15:56:49 +01:00
Weihang Li	eaaa98dedf	RDMA/hns: Remove redundant parameters in set_rc_wqe() There are some functions called by set_rc_wqe() use two parameters: "void wqe" and "struct hns_roce_v2_rc_send_wqe rc_sq_wqe", but the first one can be got from the second one. So remove the redundant wqe from related functions. Link: https://lore.kernel.org/r/1595932941-40613-5-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-07-30 11:31:02 -03:00
Lang Cheng	a247fd28c1	RDMA/hns: Remove support for HIP08_A HIP08_A is an temporary version and all features of it are supported by HIP08_B. So remove the relevant code. Link: https://lore.kernel.org/r/1595932941-40613-4-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-07-30 11:31:02 -03:00
Weihang Li	cdc1f3e946	RDMA/hns: Refactor hns_roce_v2_set_hem() The parts about preparing and sending mailbox to hardware is not strongly related to other codes in hns_roce_v2_set_hem(), and can be encapsulated into a separate function. Link: https://lore.kernel.org/r/1595932941-40613-3-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-07-30 11:31:02 -03:00
Lang Cheng	57005c96b7	RDMA/hns: Remove redundant hardware opcode definitions HNS_ROCE_SQ_OPCODE_XXXs and HNS_ROCE_V2_WQE_OP_XXXs have same values, so remove a set of redundant definitions. In addition, remove the suffix of HNS_ROCE_V2_WQE_OP_BIND_MW_TYPE. Link: https://lore.kernel.org/r/1595932941-40613-2-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-07-30 11:31:02 -03:00
Leon Romanovsky	fb448ce87a	RDMA/core: Free DIM memory in error unwind The memory allocated for the DIM wasn't freed in in error unwind path, fix it by calling to rdma_dim_destroy(). Fixes: `da6629793a` ("RDMA/core: Provide RDMA DIM support for ULPs") Link: https://lore.kernel.org/r/20200730082719.1582397-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com <mailto:maxg@mellanox.com>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-07-30 11:03:33 -03:00
Leon Romanovsky	5d46b289d0	RDMA/core: Stop DIM before destroying CQ HW destroy operation should be last operation after all possible CQ users completed their work, so move DIM work cancellation before such destroy call. Fixes: `da6629793a` ("RDMA/core: Provide RDMA DIM support for ULPs") Link: https://lore.kernel.org/r/20200730082719.1582397-3-leon@kernel.org Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-07-30 11:03:33 -03:00
Leon Romanovsky	7fa84b5708	RDMA/mlx5: Initialize QP mutex for the debug kernels In DCT and RSS RAW QP creation flows, the QP mutex wasn't initialized and the magic field inside lock was missing. This caused to the following kernel warning for kernels build with CONFIG_DEBUG_MUTEXES. DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 3 PID: 16261 at kernel/locking/mutex.c:938 __mutex_lock+0x60e/0x940 Modules linked in: bonding nf_tables ipip tunnel4 geneve ip6_udp_tunnel udp_tunnel ip6_gre ip6_tunnel tunnel6 ip_gre gre ip_tunnel mlx5_ib mlx5_core mlxfw ptp pps_core rdma_ucm ib_uverbs ib_ipoib ib_umad openvswitch nsh xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core [last unloaded: mlxfw] CPU: 3 PID: 16261 Comm: ib_send_bw Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mutex_lock+0x60e/0x940 Code: c0 0f 84 6d fa ff ff 44 8b 15 4e 9d ba 00 45 85 d2 0f 85 5d fa ff ff 48 c7 c6 f2 de 2b 82 48 c7 c7 f1 8a 2b 82 e8 d2 4d 72 ff <0f> 0b 4c 8b 4d 88 e9 3f fa ff ff f6 c2 04 0f 84 37 fe ff ff 48 89 RSP: 0018:ffff88810bb8b870 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88829f1dd880 RSI: 0000000000000000 RDI: ffffffff81192afa RBP: ffff88810bb8b910 R08: 0000000000000000 R09: 0000000000000028 R10: 0000000000000000 R11: 0000000000003f85 R12: 0000000000000002 R13: ffff88827d8d3ce0 R14: ffffffffa059f615 R15: ffff8882a4d02610 FS: 00007f3f6988e740(0000) GS:ffff8882f5b80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556556158000 CR3: 000000010a63c005 CR4: 0000000000360ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? cmd_exec+0x947/0xe60 [mlx5_core] ? __mutex_lock+0x76/0x940 ? mlx5_ib_qp_set_counter+0x25/0xa0 [mlx5_ib] mlx5_ib_qp_set_counter+0x25/0xa0 [mlx5_ib] mlx5_ib_counter_bind_qp+0x9b/0xe0 [mlx5_ib] __rdma_counter_bind_qp+0x6b/0xa0 [ib_core] rdma_counter_bind_qp_auto+0x363/0x520 [ib_core] _ib_modify_qp+0x316/0x580 [ib_core] ib_modify_qp_with_udata+0x19/0x30 [ib_core] modify_qp+0x4c4/0x600 [ib_uverbs] ib_uverbs_ex_modify_qp+0x87/0xe0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x129/0x1c0 [ib_uverbs] ib_uverbs_cmd_verbs.isra.5+0x5d5/0x11f0 [ib_uverbs] ? ib_uverbs_handler_UVERBS_METHOD_QUERY_CONTEXT+0x120/0x120 [ib_uverbs] ? lock_acquire+0xb9/0x3a0 ? ib_uverbs_ioctl+0xd0/0x210 [ib_uverbs] ? ib_uverbs_ioctl+0x175/0x210 [ib_uverbs] ib_uverbs_ioctl+0x14b/0x210 [ib_uverbs] ? ib_uverbs_ioctl+0xd0/0x210 [ib_uverbs] ksys_ioctl+0x234/0x7d0 ? exc_page_fault+0x202/0x640 ? do_syscall_64+0x1f/0x2e0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x59/0x2e0 ? asm_exc_page_fault+0x8/0x30 ? rcu_read_lock_sched_held+0x52/0x60 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: `b4aaa1f0b4` ("IB/mlx5: Handle type IB_QPT_DRIVER when creating a QP") Link: https://lore.kernel.org/r/20200730082719.1582397-2-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-07-30 11:03:33 -03:00
Alexander Graf	1ccf2fe35c	KVM: arm: Add trace name for ARM_NISV Commit `c726200dd1` ("KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace") introduced a mechanism to deflect MMIO traffic the kernel can not handle to user space. For that, it introduced a new exit reason. However, it did not update the trace point array that gives human readable names to these exit reasons inside the trace log. Let's fix that up after the fact, so that trace logs are pretty even when we get user space MMIO traps on ARM. Fixes: `c726200dd1` ("KVM: arm/arm64: Allow reporting non-ISV data aborts to userspace") Signed-off-by: Alexander Graf <graf@amazon.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200730094441.18231-1-graf@amazon.com	2020-07-30 14:54:19 +01:00
David Brazdil	bdbc0c7a07	KVM: arm64: Ensure that all nVHE hyp code is in .hyp.text Some compilers may put a subset of generated functions into '.text.' ELF sections and the linker may leverage this division to optimize ELF layout. Unfortunately, the recently introduced HYPCOPY command assumes that all executable code (with the exception of specialized sections such as '.hyp.idmap.text') is in the '.text' section. If this assumption is broken, code in '.text.' will be merged into kernel proper '.text' instead of the '.hyp.text' that is mapped in EL2. To ensure that this cannot happen, insert an OBJDUMP assertion into HYPCOPY. The command dumps a list of ELF sections in the input object file and greps for '.text.'. If found, compilation fails. Tested with both binutils' and LLVM's objdump (the output format is different). GCC offers '-fno-reorder-functions' to disable this behaviour. Select the flag if it is available. From inspection of GCC source (latest Git in July 2020), this flag does force all code into '.text'. By default, GCC uses profile data, heuristics and attributes to select a subsection. LLVM/Clang currently does not have a similar optimization pass. It can place static constructors into '.text.startup' and it's optimizer can be provided with profile data to reorder hot/cold functions. Neither of these is applicable to nVHE hyp code. If this changes in the future, the OBJDUMP assertion should alert users to the problem. Signed-off-by: David Brazdil <dbrazdil@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200730132519.48787-1-dbrazdil@google.com	2020-07-30 14:47:19 +01:00
Gautham R. Shenoy	d947fb4c96	cpuidle: pseries: Fixup exit latency for CEDE(0) We are currently assuming that CEDE(0) has exit latency 10us, since there is no way for us to query from the platform. However, if the wakeup latency of an Extended CEDE state is smaller than 10us, then we can be sure that the exit latency of CEDE(0) cannot be more than that. In this patch, we fix the exit latency of CEDE(0) if we discover an Extended CEDE state with wakeup latency smaller than 10us. Benchmark results: On POWER8, this patch does not have any impact since the advertized latency of Extended CEDE (1) is 30us which is higher than the default latency of CEDE (0) which is 10us. On POWER9 we see improvement the single-threaded performance of ebizzy, and no regression in the wakeup latency or the number of context-switches. ebizzy: 2 ebizzy threads bound to the same big-core. 25% improvement in the avg records/s with patch. x without_patch * with_patch N Min Max Median Avg Stddev x 10 2491089 5834307 5398375 4244335 1596244.9 * 10 2893813 5834474 5832448 5327281.3 1055941.4 context_switch2: There is no major regression observed with this patch as seen from the context_switch2 benchmark. context_switch2 across CPU0 CPU1 (Both belong to same big-core, but different small cores). We observe a minor 0.14% regression in the number of context-switches (higher is better). x without_patch * with_patch N Min Max Median Avg Stddev x 500 348872 362236 354712 354745.69 2711.827 * 500 349422 361452 353942 354215.4 2576.9258 Difference at 99.0% confidence -530.288 +/- 430.963 -0.149484% +/- 0.121485% (Student's t, pooled s = 2645.24) context_switch2 across CPU0 CPU8 (Different big-cores). We observe a 0.37% improvement in the number of context-switches (higher is better). x without_patch * with_patch N Min Max Median Avg Stddev x 500 287956 294940 288896 288977.23 646.59295 * 500 288300 294646 289582 290064.76 1161.9992 Difference at 99.0% confidence 1087.53 +/- 153.194 0.376337% +/- 0.0530125% (Student's t, pooled s = 940.299) schbench: No major difference could be seen until the 99.9th percentile. Without-patch: Latency percentiles (usec) 50.0th: 29 75.0th: 39 90.0th: 49 95.0th: 59 99.0th: 13104 99.5th: 14672 99.9th: 15824 min=0, max=17993 With-patch: Latency percentiles (usec) 50.0th: 29 75.0th: 40 90.0th: 50 95.0th: 61 99.0th: 13648 99.5th: 14768 99.9th: 15664 min=0, max=29812 Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> [mpe: Minor formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596087177-30329-4-git-send-email-ego@linux.vnet.ibm.com	2020-07-30 22:53:50 +10:00
Gautham R. Shenoy	054e44ba99	cpuidle: pseries: Add function to parse extended CEDE records Currently we use CEDE with latency-hint 0 as the only other idle state on a dedicated LPAR apart from the polling "snooze" state. The platform might support additional extended CEDE idle states, which can be discovered through the "ibm,get-system-parameter" rtas-call made with CEDE_LATENCY_TOKEN. This patch adds a function to obtain information about the extended CEDE idle states from the platform and parse the contents to populate an array of extended CEDE states. These idle states thus discovered will be added to the cpuidle framework in the next patch. dmesg on a POWER8 and POWER9 LPAR, demonstrating the output of parsing the extended CEDE latency parameters are as follows POWER8 [ 10.093279] xcede : xcede_record_size = 10 [ 10.093285] xcede : Record 0 : hint = 1, latency = 0x3c00 tb ticks, Wake-on-irq = 1 [ 10.093291] xcede : Record 1 : hint = 2, latency = 0x4e2000 tb ticks, Wake-on-irq = 0 [ 10.093297] cpuidle : Skipping the 2 Extended CEDE idle states POWER9 [ 5.913180] xcede : xcede_record_size = 10 [ 5.913183] xcede : Record 0 : hint = 1, latency = 0x400 tb ticks, Wake-on-irq = 1 [ 5.913188] xcede : Record 1 : hint = 2, latency = 0x3e8000 tb ticks, Wake-on-irq = 0 [ 5.913193] cpuidle : Skipping the 2 Extended CEDE idle states Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> [mpe: Make space for 16 records, drop memset, minor cleanup & formatting] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1596087177-30329-3-git-send-email-ego@linux.vnet.ibm.com	2020-07-30 22:53:50 +10:00

... 43 44 45 46 47 ...

948914 Commits