linux

Author	SHA1	Message	Date
Shyam Sundar S K	d751350826	amd-xgbe: Update DMA coherency values Based on the IOMMU configuration, the current cache control settings can result in possible coherency issues. The hardware team has recommended new settings for the PCI device path to eliminate the issue. Fixes: `6f595959c0` ("amd-xgbe: Adjust register settings to improve performance") Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:57:18 -07:00
Lu Wei	f1dcffcc8a	net: Fix a misspell in socket.c s/addres/address Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Lu Wei <luwei32@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:56:27 -07:00
Tvrtko Ursulin	8f922e4227	drm/i915: Restrict sentinel requests further Disallow sentinel requests follow previous sentinels to make request cancellation work better when faced with a chain of requests which have all been marked as in error. Because in cases where we end up with a stream of cancelled requests we want to turn off request coalescing so they each will get individually skipped by the execlists_schedule_in (which is called per ELSP port, not per request). Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> [danvet: Fix typo in the commit message that Matthew spotted.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-4-tvrtko.ursulin@linux.intel.com	2021-03-26 00:55:36 +01:00
Zheng Yongjun	a9bada338b	net: usb: lan78xx: remove unused including <linux/version.h> Remove including <linux/version.h> that don't need it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:55:33 -07:00
Chris Wilson	38b237eab2	drm/i915: Individual request cancellation Currently, we cancel outstanding requests within a context when the context is closed. We may also want to cancel individual requests using the same graceful preemption mechanism. v2 (Tvrtko): * Cancel waiters carefully considering no timeline lock and RCU. * Fixed selftests. v3 (Tvrtko): * Remove error propagation to waiters for now. v4 (Tvrtko): * Rebase for extracted i915_request_active_engine. (Matt) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> [danvet: Resolve conflict because intel_engine_flush_scheduler is still called intel_engine_flush_submission] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-3-tvrtko.ursulin@linux.intel.com	2021-03-26 00:55:30 +01:00
Hoang Le	b83e214b2e	tipc: add extack messages for bearer/media failure Add extack error messages for -EINVAL errors when enabling bearer, getting/setting properties for a media/bearer Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:54:45 -07:00
Martin Blumenstingl	3e6fdeb28f	net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock The xMII interface clock depends on the PHY interface (MII, RMII, RGMII) as well as the current link speed. Explicitly configure the GSWIP to automatically select the appropriate xMII interface clock. This fixes an issue seen by some users where ports using an external RMII or RGMII PHY were deaf (no RX or TX traffic could be seen). Most likely this is due to an "invalid" xMII clock being selected either by the bootloader or hardware-defaults. Fixes: `14fceff477` ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:53:38 -07:00
Dinh Nguyen	3ed14d8d47	dt-bindings: net: micrel-ksz90x1.txt: correct documentation Correct the Micrel phy documentation for the ksz9021 and ksz9031 phys for how the phy skews are set. Signed-off-by: Dinh Nguyen <dinguyen@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:51:53 -07:00
Daniel Mack	de9c7854e6	net: axienet: allow setups without MDIO In setups with fixed-link settings there is no mdio node in DTS. axienet_probe() already handles that gracefully but lp->mii_bus is then NULL. Fix code that tries to blindly grab the MDIO lock by introducing two helper functions that make the locking conditional. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:49:40 -07:00
Vladimir Oltean	479dc497db	net: dsa: only unset VLAN filtering when last port leaves last VLAN-aware bridge DSA is aware of switches with global VLAN filtering since the blamed commit, but it makes a bad decision when multiple bridges are spanning the same switch: ip link add br0 type bridge vlan_filtering 1 ip link add br1 type bridge vlan_filtering 1 ip link set swp2 master br0 ip link set swp3 master br0 ip link set swp4 master br1 ip link set swp5 master br1 ip link set swp5 nomaster ip link set swp4 nomaster [138665.939930] sja1105 spi0.1: port 3: dsa_core: VLAN filtering is a global setting [138665.947514] DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE When all ports leave br1, DSA blindly attempts to disable VLAN filtering on the switch, ignoring the fact that br0 still exists and is VLAN-aware too. It fails while doing that. This patch checks whether any port exists at all and is under a VLAN-aware bridge. Fixes: `d371b7c92d` ("net: dsa: Unset vlan_filtering when ports leave the bridge") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:48:45 -07:00
Tvrtko Ursulin	7dbc19da5d	drm/i915: Extract active lookup engine to a helper Move active engine lookup to exported i915_request_active_engine. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> [danvet: Slight rebase, engine->sched.lock is still called engine->active.lock.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20210324121335.2307063-2-tvrtko.ursulin@linux.intel.com	2021-03-26 00:48:08 +01:00
Anthony Iliopoulos	25dfa65f81	xfs: fix xfs_trans slab cache name Removal of kmem_zone_init wrappers accidentally changed a slab cache name from "xfs_trans" to "xf_trans". Fix this so that userspace consumers of /proc/slabinfo and /sys/kernel/slab can find it again. Fixes: `b1231760e4` ("xfs: Remove slab init wrappers") Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:53 -07:00
Gao Xiang	2b92faed55	xfs: add error injection for per-AG resv failure per-AG resv failure after fixing up freespace is hard to test in an effective way, so directly add an error injection path to observe such error handling path works as expected. Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:53 -07:00
Gao Xiang	fb2fc17201	xfs: support shrinking unused space in the last AG As the first step of shrinking, this attempts to enable shrinking unused space in the last allocation group by fixing up freespace btree, agi, agf and adjusting super block and use a helper xfs_ag_shrink_space() to fixup the last AG. This can be all done in one transaction for now, so I think no additional protection is needed. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00
Gao Xiang	46141dc891	xfs: introduce xfs_ag_shrink_space() This patch introduces a helper to shrink unused space in the last AG by fixing up the freespace btree. Also make sure that the per-AG reservation works under the new AG size. If such per-AG reservation or extent allocation fails, roll the transaction so the new transaction could cancel without any side effects. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00
Gao Xiang	c789c83c7e	xfs: hoist out xfs_resizefs_init_new_ags() Move out related logic for initializing new added AGs to a new helper in preparation for shrinking. No logic changes. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00
Gao Xiang	014695c0a7	xfs: update lazy sb counters immediately for resizefs sb_fdblocks will be updated lazily if lazysbcount is enabled, therefore when shrinking the filesystem sb_fdblocks could be larger than sb_dblocks and xfs_validate_sb_write() would fail. Even for growfs case, it'd be better to update lazy sb counters immediately to reflect the real sb counters. Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00
Bhaskar Chowdhury	f9dd7ba430	xfs: Fix a typo s/strutures/structures/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Reviewed-by: Pavel Reichl <preichl@redhat.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00
Bhaskar Chowdhury	0145225e35	xfs: Rudimentary spelling fix s/sytemcall/syscall/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00
Bhaskar Chowdhury	bd24a4f5f7	xfs: Rudimentary typo fixes s/filesytem/filesystem/ s/instrumention/instrumentation/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00
Dave Chinner	5825bea052	xfs: __percpu_counter_compare() inode count debug too expensive - 21.92% __xfs_trans_commit - 21.62% xfs_log_commit_cil - 11.69% xfs_trans_unreserve_and_mod_sb - 11.58% __percpu_counter_compare - 11.45% __percpu_counter_sum - 10.29% _raw_spin_lock_irqsave - 10.28% do_raw_spin_lock __pv_queued_spin_lock_slowpath We debated just getting rid of it last time this came up and there was no real objection to removing it. Now it's the biggest scalability limitation for debug kernels even on smallish machines, so let's just get rid of it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:52 -07:00
Dave Chinner	1fea323ff0	xfs: reduce debug overhead of dir leaf/node checks On debug kernels, we call xfs_dir3_leaf_check_int() multiple times on every directory modification. The robust hash ordering checks it does on every entry in the leaf on every call results in a massive CPU overhead which slows down debug kernels by a large amount. We use xfs_dir3_leaf_check_int() for the verifiers as well, so we can't just gut the function to reduce overhead. What we can do, however, is reduce the work it does when it is called from the debug interfaces, just leaving the high level checks in place and leaving the robust validation to the verifiers. This means the debug checks will catch gross errors, but subtle bugs might not be caught until a verifier is run. It is easy enough to restore the existing debug behaviour if the developer needs it (just change a call parameter in the debug code), but overwise the overhead makes testing large directory block sizes on debug kernels very slow. Profile at an unlink rate of ~80k file/s on a 64k block size filesystem before the patch: 40.30% [kernel] [k] xfs_dir3_leaf_check_int 10.98% [kernel] [k] __xfs_dir3_data_check 8.10% [kernel] [k] xfs_verify_dir_ino 4.42% [kernel] [k] memcpy 2.22% [kernel] [k] xfs_dir2_data_get_ftype 1.52% [kernel] [k] do_raw_spin_lock Profile after, at an unlink rate of ~125k files/s (+50% improvement) has largely dropped the leaf verification debug overhead out of the profile. 16.53% [kernel] [k] __xfs_dir3_data_check 12.53% [kernel] [k] xfs_verify_dir_ino 7.97% [kernel] [k] memcpy 3.36% [kernel] [k] xfs_dir2_data_get_ftype 2.86% [kernel] [k] __pv_queued_spin_lock_slowpath Create shows a similar change in profile and a +25% improvement in performance. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:51 -07:00
Dave Chinner	39d3c0b596	xfs: No need for inode number error injection in __xfs_dir3_data_check We call xfs_dir_ino_validate() for every dir entry in a directory when doing validity checking of the directory. It calls xfs_verify_dir_ino() then emits a corruption report if bad or does error injection if good. It is extremely costly: 43.27% [kernel] [k] xfs_dir3_leaf_check_int 10.28% [kernel] [k] __xfs_dir3_data_check 6.61% [kernel] [k] xfs_verify_dir_ino 4.16% [kernel] [k] xfs_errortag_test 4.00% [kernel] [k] memcpy 3.48% [kernel] [k] xfs_dir_ino_validate 7% of the cpu usage in this directory traversal workload is xfs_dir_ino_validate() doing absolutely nothing. We don't need error injection to simulate a bad inode numbers in the directory structure because we can do that by fuzzing the structure on disk. And we don't need a corruption report, because the __xfs_dir3_data_check() will emit one if the inode number is bad. So just call xfs_verify_dir_ino() directly here, and get rid of all this unnecessary overhead: 40.30% [kernel] [k] xfs_dir3_leaf_check_int 10.98% [kernel] [k] __xfs_dir3_data_check 8.10% [kernel] [k] xfs_verify_dir_ino 4.42% [kernel] [k] memcpy 2.22% [kernel] [k] xfs_dir2_data_get_ftype 1.52% [kernel] [k] do_raw_spin_lock Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:51 -07:00
Dave Chinner	ec08c14ba2	xfs: type verification is expensive From a concurrent rm -rf workload: 41.04% [kernel] [k] xfs_dir3_leaf_check_int 9.85% [kernel] [k] __xfs_dir3_data_check 5.60% [kernel] [k] xfs_verify_ino 5.32% [kernel] [k] xfs_agino_range 4.21% [kernel] [k] memcpy 3.06% [kernel] [k] xfs_errortag_test 2.57% [kernel] [k] xfs_dir_ino_validate 1.66% [kernel] [k] xfs_dir2_data_get_ftype 1.17% [kernel] [k] do_raw_spin_lock 1.11% [kernel] [k] xfs_verify_dir_ino 0.84% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock 0.83% [kernel] [k] xfs_buf_find 0.64% [kernel] [k] xfs_log_commit_cil THere's an awful lot of overhead in just range checking inode numbers in that, but each inode number check is not a lot of code. The total is a bit over 14.5% of the CPU time is spent validating inode numbers. The problem is that they deeply nested global scope functions so the overhead here is all in function call marshalling. text data bss dec hex filename 2077 0 0 2077 81d fs/xfs/libxfs/xfs_types.o.orig 2197 0 0 2197 895 fs/xfs/libxfs/xfs_types.o There's a small increase in binary size by inlining all the local nested calls in the verifier functions, but the same workload now profiles as: 40.69% [kernel] [k] xfs_dir3_leaf_check_int 10.52% [kernel] [k] __xfs_dir3_data_check 6.68% [kernel] [k] xfs_verify_dir_ino 4.22% [kernel] [k] xfs_errortag_test 4.15% [kernel] [k] memcpy 3.53% [kernel] [k] xfs_dir_ino_validate 1.87% [kernel] [k] xfs_dir2_data_get_ftype 1.37% [kernel] [k] do_raw_spin_lock 0.98% [kernel] [k] xfs_buf_find 0.94% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock 0.73% [kernel] [k] xfs_log_commit_cil Now we only spend just over 10% of the time validing inode numbers for the same workload. Hence a few "inline" keyworks is good enough to reduce the validation overhead by 30%... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:51 -07:00
Dave Chinner	929f8b0deb	xfs: optimise xfs_buf_item_size/format for contiguous regions We process the buf_log_item bitmap one set bit at a time with xfs_next_bit() so we can detect if a region crosses a memcpy discontinuity in the buffer data address. This has massive overhead on large buffers (e.g. 64k directory blocks) because we do a lot of unnecessary checks and xfs_buf_offset() calls. For example, 16-way concurrent create workload on debug kernel running CPU bound has this at the top of the profile at ~120k create/s on 64kb directory block size: 20.66% [kernel] [k] xfs_dir3_leaf_check_int 7.10% [kernel] [k] memcpy 6.22% [kernel] [k] xfs_next_bit 3.55% [kernel] [k] xfs_buf_offset 3.53% [kernel] [k] xfs_buf_item_format 3.34% [kernel] [k] __pv_queued_spin_lock_slowpath 3.04% [kernel] [k] do_raw_spin_lock 2.84% [kernel] [k] xfs_buf_item_size_segment.isra.0 2.31% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock 1.36% [kernel] [k] xfs_log_commit_cil (debug checks hurt large blocks) The only buffers with discontinuities in the data address are unmapped buffers, and they are only used for inode cluster buffers and only for logging unlinked pointers. IOWs, it is -rare- that we even need to detect a discontinuity in the buffer item formatting code. Optimise all this by using xfs_contig_bits() to find the size of the contiguous regions, then test for a discontiunity inside it. If we find one, do the slow "bit at a time" method we do now. If we don't, then just copy the entire contiguous range in one go. Profile now looks like: 25.26% [kernel] [k] xfs_dir3_leaf_check_int 9.25% [kernel] [k] memcpy 5.01% [kernel] [k] __pv_queued_spin_lock_slowpath 2.84% [kernel] [k] do_raw_spin_lock 2.22% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock 1.88% [kernel] [k] xfs_buf_find 1.53% [kernel] [k] memmove 1.47% [kernel] [k] xfs_log_commit_cil .... 0.34% [kernel] [k] xfs_buf_item_format .... 0.21% [kernel] [k] xfs_buf_offset .... 0.16% [kernel] [k] xfs_contig_bits .... 0.13% [kernel] [k] xfs_buf_item_size_segment.isra.0 So the bit scanning over for the dirty region tracking for the buffer log items is basically gone. Debug overhead hurts even more now... Perf comparison dir block creates unlink size (kb) time rate time Original 4 4m08s 220k 5m13s Original 64 7m21s 115k 13m25s Patched 4 3m59s 230k 5m03s Patched 64 6m23s 143k 12m33s Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:51 -07:00
Dave Chinner	c81ea11e03	xfs: xfs_buf_item_size_segment() needs to pass segment offset Otherwise it doesn't correctly calculate the number of vectors in a logged buffer that has a contiguous map that gets split into multiple regions because the range spans discontigous memory. Probably never been hit in practice - we don't log contiguous ranges on unmapped buffers (inode clusters). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:51 -07:00
Dave Chinner	accc661bf9	xfs: reduce buffer log item shadow allocations When we modify btrees repeatedly, we regularly increase the size of the logged region by a single chunk at a time (per transaction commit). This results in the CIL formatting code having to reallocate the log vector buffer every time the buffer dirty region grows. Hence over a typical 4kB btree buffer, we might grow the log vector 4096/128 = 32x over a short period where we repeatedly add or remove records to/from the buffer over a series of running transaction. This means we are doing 32 memory allocations and frees over this time during a performance critical path in the journal. The amount of space tracked in the CIL for the object is calculated during the ->iop_format() call for the buffer log item, but the buffer memory allocated for it is calculated by the ->iop_size() call. The size callout determines the size of the buffer, the format call determines the space used in the buffer. Hence we can oversize the buffer space required in the size calculation without impacting the amount of space used and accounted to the CIL for the changes being logged. This allows us to reduce the number of allocations by rounding up the buffer size to allow for future growth. This can safe a substantial amount of CPU time in this path: - 46.52% 2.02% [kernel] [k] xfs_log_commit_cil - 44.49% xfs_log_commit_cil - 30.78% _raw_spin_lock - 30.75% do_raw_spin_lock 30.27% __pv_queued_spin_lock_slowpath (oh, ouch!) .... - 1.05% kmem_alloc_large - 1.02% kmem_alloc 0.94% __kmalloc This overhead here us what this patch is aimed at. After: - 0.76% kmem_alloc_large - 0.75% kmem_alloc 0.70% __kmalloc The size of 512 bytes is based on the bitmap chunk size being 128 bytes and that random directory entry updates almost never require more than 3-4 128 byte regions to be logged in the directory block. The other observation is for per-ag btrees. When we are inserting into a new btree block, we'll pack it from the front. Hence the first few records land in the first 128 bytes so we log only 128 bytes, the next 8-16 records land in the second region so now we log 256 bytes. And so on. If we are doing random updates, it will only allocate every 4 random 128 byte regions that are dirtied instead of every single one. Any larger than 512 bytes and I noticed an increase in memory footprint in my scalability workloads. Any less than this and I didn't really see any significant benefit to CPU usage. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@redhat.com>	2021-03-25 16:47:51 -07:00
Dave Chinner	e6a688c332	xfs: initialise attr fork on inode create When we allocate a new inode, we often need to add an attribute to the inode as part of the create. This can happen as a result of needing to add default ACLs or security labels before the inode is made visible to userspace. This is highly inefficient right now. We do the create transaction to allocate the inode, then we do an "add attr fork" transaction to modify the just created empty inode to set the inode fork offset to allow attributes to be stored, then we go and do the attribute creation. This means 3 transactions instead of 1 to allocate an inode, and this greatly increases the load on the CIL commit code, resulting in excessive contention on the CIL spin locks and performance degradation: 18.99% [kernel] [k] __pv_queued_spin_lock_slowpath 3.57% [kernel] [k] do_raw_spin_lock 2.51% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock 2.48% [kernel] [k] memcpy 2.34% [kernel] [k] xfs_log_commit_cil The typical profile resulting from running fsmark on a selinux enabled filesytem is adds this overhead to the create path: - 15.30% xfs_init_security - 15.23% security_inode_init_security - 13.05% xfs_initxattrs - 12.94% xfs_attr_set - 6.75% xfs_bmap_add_attrfork - 5.51% xfs_trans_commit - 5.48% __xfs_trans_commit - 5.35% xfs_log_commit_cil - 3.86% _raw_spin_lock - do_raw_spin_lock __pv_queued_spin_lock_slowpath - 0.70% xfs_trans_alloc 0.52% xfs_trans_reserve - 5.41% xfs_attr_set_args - 5.39% xfs_attr_set_shortform.constprop.0 - 4.46% xfs_trans_commit - 4.46% __xfs_trans_commit - 4.33% xfs_log_commit_cil - 2.74% _raw_spin_lock - do_raw_spin_lock __pv_queued_spin_lock_slowpath 0.60% xfs_inode_item_format 0.90% xfs_attr_try_sf_addname - 1.99% selinux_inode_init_security - 1.02% security_sid_to_context_force - 1.00% security_sid_to_context_core - 0.92% sidtab_entry_to_string - 0.90% sidtab_sid2str_get 0.59% sidtab_sid2str_put.part.0 - 0.82% selinux_determine_inode_label - 0.77% security_transition_sid 0.70% security_compute_sid.part.0 And fsmark creation rate performance drops by ~25%. The key point to note here is that half the additional overhead comes from adding the attribute fork to the newly created inode. That's crazy, considering we can do this same thing at inode create time with a couple of lines of code and no extra overhead. So, if we know we are going to add an attribute immediately after creating the inode, let's just initialise the attribute fork inside the create transaction and chop that whole chunk of code out of the create fast path. This completely removes the performance drop caused by enabling SELinux, and the profile looks like: - 8.99% xfs_init_security - 9.00% security_inode_init_security - 6.43% xfs_initxattrs - 6.37% xfs_attr_set - 5.45% xfs_attr_set_args - 5.42% xfs_attr_set_shortform.constprop.0 - 4.51% xfs_trans_commit - 4.54% __xfs_trans_commit - 4.59% xfs_log_commit_cil - 2.67% _raw_spin_lock - 3.28% do_raw_spin_lock 3.08% __pv_queued_spin_lock_slowpath 0.66% xfs_inode_item_format - 0.90% xfs_attr_try_sf_addname - 0.60% xfs_trans_alloc - 2.35% selinux_inode_init_security - 1.25% security_sid_to_context_force - 1.21% security_sid_to_context_core - 1.19% sidtab_entry_to_string - 1.20% sidtab_sid2str_get - 0.86% sidtab_sid2str_put.part.0 - 0.62% _raw_spin_lock_irqsave - 0.77% do_raw_spin_lock __pv_queued_spin_lock_slowpath - 0.84% selinux_determine_inode_label - 0.83% security_transition_sid 0.86% security_compute_sid.part.0 Which indicates the XFS overhead of creating the selinux xattr has been halved. This doesn't fix the CIL lock contention problem, just means it's not a limiting factor for this workload. Lock contention in the security subsystems is going to be an issue soon, though... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> [djwong: fix compilation error when CONFIG_SECURITY=n] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@redhat.com>	2021-03-25 16:47:51 -07:00
Gao Xiang	b2c2974b8c	xfs: ensure xfs_errortag_random_default matches XFS_ERRTAG_MAX Add the BUILD_BUG_ON to xfs_errortag_add() in order to make sure that the length of xfs_errortag_random_default matches XFS_ERRTAG_MAX when building. Signed-off-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:50 -07:00
Pavel Reichl	92cf7d3638	xfs: Skip repetitive warnings about mount options Skip the warnings about mount option being deprecated if we are remounting and deprecated option state is not changing. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211605 Fix-suggested-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:50 -07:00
Pavel Reichl	0f98b4ece1	xfs: rename variable mp to parsing_mp Rename mp variable to parsisng_mp so it is easy to distinguish between current mount point handle and handle for mount point which mount options are being parsed. Suggested-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Pavel Reichl <preichl@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-03-25 16:47:50 -07:00
Darrick J. Wong	3fef46fc43	xfs: rename the blockgc workqueue Since we're about to start using the blockgc workqueue to dispose of inactivated inodes, strip the "block" prefix from the name; now it's merely the general garbage collection (gc) workqueue. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:50 -07:00
Darrick J. Wong	383e32b0d0	xfs: prevent metadata files from being inactivated Files containing metadata (quota records, rt bitmap and summary info) are fully managed by the filesystem, which means that all resource cleanup must be explicit, not automatic. This means that they should never be subjected automatic to post-eof truncation, nor should they be freed automatically even if the link count drops to zero. In other words, xfs_inactive() should leave these files alone. Add the necessary predicate functions to make this happen. This adds a second layer of prevention for the kinds of fs corruption that was fixed by commit `f4c32e87de`. If we ever decide to support removing metadata files, we should make all those metadata updates explicit. Rearrange the order of #includes to fix compiler errors, since xfs_mount.h is supposed to be included before xfs_inode.h Followup-to: `f4c32e87de` ("xfs: fix realtime bitmap/summary file truncation when growing rt volume") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:50 -07:00
Darrick J. Wong	973975b72a	xfs: validate ag btree levels using the precomputed values Use the AG btree height limits that we precomputed into the xfs_mount to validate the AG headers instead of using XFS_BTREE_MAXLEVELS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:50 -07:00
Darrick J. Wong	f53acface7	xfs: remove return value from xchk_ag_btcur_init Functions called by this function cannot fail, so get rid of the return and error checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:50 -07:00
Darrick J. Wong	de9d2a78ad	xfs: set the scrub AG number in xchk_ag_read_headers Since xchk_ag_read_headers initializes fields in struct xchk_ag, we might as well set the AG number and save the callers the trouble. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:49 -07:00
Darrick J. Wong	9de4b51449	xfs: mark a data structure sick if there are cross-referencing errors If scrub observes cross-referencing errors while scanning a data structure, mark the data structure sick. There's /something/ inconsistent, even if we can't really tell what it is. Fixes: `4860a05d24` ("xfs: scrub/repair should update filesystem metadata health") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:49 -07:00
Darrick J. Wong	7716ee54cb	xfs: bail out of scrub immediately if scan incomplete If a scrubber cannot complete its check and signals an incomplete check, we must bail out immediately without updating health status, trying a repair, etc. because our scan is incomplete and we therefore do not know much more. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:49 -07:00
Darrick J. Wong	05237032fd	xfs: fix dquot scrub loop cancellation When xchk_quota_item figures out that it needs to terminate the scrub operation, it needs to return some error code to abort the loop, but instead it returns zero and the loop keeps running. Fix this by making it use ECANCELED, and fix the other loop bailout condition check at the bottom too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:49 -07:00
Darrick J. Wong	1aa26707eb	xfs: fix uninitialized variables in xrep_calc_ag_resblks If we can't read the AGF header, we never actually set a value for freelen and usedlen. These two variables are used to make the worst case estimate of btree size, so it's safe to set them to the AG size as a fallback. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-03-25 16:47:49 -07:00
David S. Miller	50dad399ca	Merge branch 'ethtool-FEC' Jakub Kicinski says: ==================== ethtool: clarify the ethtool FEC interface Our FEC configuration interface is one of the more confusing. It also lacks any error checking in the core. This certainly shows in the varying implementations across the drivers. Improve the documentation and add most basic checks. Sadly, it's probably too late now to try to enforce much more uniformity. Any thoughts & suggestions welcome. Next step is to add netlink for FEC, then stats. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Jakub Kicinski	6dbf94b264	ethtool: clarify the ethtool FEC interface The definition of the FEC driver interface is quite unclear. Improve the documentation. This is based on current driver and user space code, as well as the discussions about the interface: RFC v1 (24 Oct 2016): https://lore.kernel.org/netdev/1477363849-36517-1-git-send-email-vidya@cumulusnetworks.com/ - this version has the autoneg field - no active_fec field - none vs off confusion is already present RFC v2 (10 Feb 2017): https://lore.kernel.org/netdev/1486727004-11316-1-git-send-email-vidya@cumulusnetworks.com/ - autoneg removed - active_fec added v1 (10 Feb 2017): https://lore.kernel.org/netdev/1486751311-42019-1-git-send-email-vidya@cumulusnetworks.com/ - no changes in the code v1 (24 Jun 2017): https://lore.kernel.org/netdev/1498331985-8525-1-git-send-email-roopa@cumulusnetworks.com/ - include in tree user v2 (27 Jul 2017): https://lore.kernel.org/netdev/1501199248-24695-1-git-send-email-roopa@cumulusnetworks.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Jakub Kicinski	42ce127d98	ethtool: fec: sanitize ethtool_fecparam->fec Reject NONE on set, this mode means device does not support FEC so it's a little out of place in the set interface. This should be safe to do - user space ethtool does not allow the use of NONE on set. A few drivers treat it the same as OFF, but none use it instead of OFF. Similarly reject an empty FEC mask. The common user space tool will not send such requests and most drivers correctly reject it already. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Jakub Kicinski	d3b37fc805	ethtool: fec: sanitize ethtool_fecparam->active_fec struct ethtool_fecparam::active_fec is a GET-only field, all in-tree drivers correctly ignore it on SET. Clear the field on SET to avoid any confusion. Again, we can't reject non-zero now since ethtool user space does not zero-init the param correctly. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Jakub Kicinski	240e114411	ethtool: fec: sanitize ethtool_fecparam->reserved struct ethtool_fecparam::reserved is never looked at by the core. Make sure it's actually 0. Unfortunately we can't return an error because old ethtool doesn't zero-initialize the structure for SET. On GET we can be more verbose, there are no in tree (ab)users. Fix up the kdoc on the structure. Remove the mention of FEC bypass. Seems like a niche thing to configure in the first place. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Jakub Kicinski	408386817a	ethtool: fec: remove long structure description Digging through the mailing list archive @autoneg was part of the first version of the RFC, this left over comment was pointed out twice in review but wasn't removed. The sentence is an exact copy-paste from pauseparam. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Jakub Kicinski	ed3038158e	ethtool: fec: fix typo in kdoc s/porte/the port/ Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:46:53 -07:00
Linus Torvalds	db24726bfe	integrity-v5.12-fix -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEjSMCCC7+cjo3nszSa3kkZrA+cVoFAmBc8t0UHHpvaGFyQGxp bnV4LmlibS5jb20ACgkQa3kkZrA+cVr82g/+M4Aj+geNTYg0so7iWuNoc1xrj32k AkNt82WWTOz5FCJdz5Svv6lS8fr0MLNlLV8Y2RqNjkbJfmrbuLT0nu8FmAgcEgO5 XeDasz3v/ij0a2A9nqnnk7UUFpx+7I/peFBEgmupLhsnVE3UOsnzz8GZ922tsZ/k UeRHH/Nm+dJ8hAoY0RdfC09YGL/bxLO7w7+NjoaA1GaiiHZ+Jwb6aVbUt8F8X4K3 R946QnOHqsyvXZ6f5LgSRr8eNEXszR4sJiUpzfpyAaEPF3eV/V3OEWAiv5YQmhnn /rLeprkW4C25jTwSVNWOwkSic9dQW3WRg8V6H51EAbm7BN67bO5B7GIsk5H61YPL wFvENmZpMxlXt8rcYI+Js8KnIR7ihYQVksRp6N5yTqYz28e+RZN/dqTLCBkRDjgs qLH6jEFcr4qzMpjF71nI2AZrepFXoXkH8vm9yz/uvzffdXyP+ZRbPoPEkx25meAt PC2neLco13unm2Yp/fuOOzE/sgVlI6n3o+9sLGSeexTkY+ewGKE+c7+bohoZtOk4 OETYJZ9a7TzTvKsOqan5gcQfdhru6/M73Ev9lR6shoWe1TgF+hLkaKhVLWFL5dUn EVcQ/gc8zKmljNji1TtlXSP6+1InvgRUmes42G7HEcf3H3C6CejvAVfdIX1sx9av vzL7JN0q7SAIDnI= =qVQp -----END PGP SIGNATURE----- Merge tag 'integrity-v5.12-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity fix from Mimi Zohar: "Just one patch to address a NULL ptr dereferencing when there is a mismatch between the user enabled LSMs and IMA/EVM" * tag 'integrity-v5.12-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: double check iint_cache was initialized	2021-03-25 16:46:43 -07:00
Daniel Borkmann	80847a71b2	bpf: Undo ptr_to_map_key alu sanitation for now Remove PTR_TO_MAP_KEY for the time being from being sanitized on pointer ALU through sanitize_ptr_alu() mainly for 3 reasons: 1) It's currently unused and not available from unprivileged. However that by itself is not yet a strong reason to drop the code. 2) Commit `69c087ba62` ("bpf: Add bpf_for_each_map_elem() helper") implemented the sanitation not fully correct in that unlike stack or map_value pointer it doesn't probe whether the access to the map key /after/ the simulated ALU operation is still in bounds. This means that the generated mask can truncate the offset in the non-speculative domain whereas it should only truncate in the speculative domain. The verifier should instead reject such program as we do for other types. 3) Given the recent fixes from `f232326f69` ("bpf: Prohibit alu ops for pointer types not defining ptr_limit"), `10d2bb2e6b` ("bpf: Fix off-by-one for area size in creating mask to left"), `b5871dca25` ("bpf: Simplify alu_limit masking for pointer arithmetic") as well as `1b1597e64e` ("bpf: Add sanity check for upper ptr_limit") the code changed quite a bit and the merge in `efd13b71a3` broke the PTR_TO_MAP_KEY case due to an incorrect merge conflict. Remove the relevant pieces for the time being and we can rework the PTR_TO_MAP_KEY case once everything settles. Fixes: `efd13b71a3` ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Fixes: `69c087ba62` ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2021-03-26 00:46:33 +01:00
David S. Miller	241949e488	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-03-24 The following pull-request contains BPF updates for your net-next tree. We've added 37 non-merge commits during the last 15 day(s) which contain a total of 65 files changed, 3200 insertions(+), 738 deletions(-). The main changes are: 1) Static linking of multiple BPF ELF files, from Andrii. 2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo. 3) Spelling fixes from various folks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 16:30:46 -07:00

... 209 210 211 212 213 ...

1014064 Commits