linux/include/drm/ttm/ttm_caching.h

/*
 * Copyright 2020 Advanced Micro Devices, Inc.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
 * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 *
 * Authors: Christian König
 */

#ifndef _TTM_CACHING_H_
#define _TTM_CACHING_H_

#include <linux/pgtable.h>

#define TTM_NUM_CACHING_TYPES	3

/**
 * enum ttm_caching - CPU caching and BUS snooping behavior.
 */
enum ttm_caching {
	/**
	 * @ttm_uncached: Most defensive option for device mappings,
	 * don't even allow write combining.
	 */
	ttm_uncached,

	/**
	 * @ttm_write_combined: Don't cache read accesses, but allow at least
	 * writes to be combined.
	 */
	ttm_write_combined,

	/**
	 * @ttm_cached: Fully cached like normal system memory, requires that
	 * devices snoop the CPU cache on accesses.
	 */
	ttm_cached
};

pgprot_t ttm_prot_from_caching(enum ttm_caching caching, pgprot_t tmp);

#endif
drm/ttm: set the tt caching state at creation time All drivers can determine the tt caching state at creation time, no need to do this on the fly during every validation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/394253/ 2020-09-30 08:38:48 +00:00			`/*`
			`* Copyright 2020 Advanced Micro Devices, Inc.`
			`*`
			`* Permission is hereby granted, free of charge, to any person obtaining a`
			`* copy of this software and associated documentation files (the "Software"),`
			`* to deal in the Software without restriction, including without limitation`
			`* the rights to use, copy, modify, merge, publish, distribute, sublicense,`
			`* and/or sell copies of the Software, and to permit persons to whom the`
			`* Software is furnished to do so, subject to the following conditions:`
			`*`
			`* The above copyright notice and this permission notice shall be included in`
			`* all copies or substantial portions of the Software.`
			`*`
			`* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR`
			`* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,`
			`* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL`
			`* THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR`
			`* OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,`
			`* ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR`
			`* OTHER DEALINGS IN THE SOFTWARE.`
			`*`
			`* Authors: Christian König`
			`*/`

			`#ifndef _TTM_CACHING_H_`
			`#define _TTM_CACHING_H_`

drm/ttm: make ttm_caching.h self-contained Include <linux/pgtable.h> for pgprot_t. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Huang Rui <ray.huang@amd.com> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/fb87ab4b4490c53e9ece66d53c4f178ead244cb5.1709898638.git.jani.nikula@intel.com 2024-03-08 11:55:48 +00:00			`#include <linux/pgtable.h>`

drm/ttm: new TT backend allocation pool v3 This replaces the spaghetti code in the two existing page pools. First of all depending on the allocation size it is between 3 (1GiB) and 5 (1MiB) times faster than the old implementation. It makes better use of buddy pages to allow for larger physical contiguous allocations which should result in better TLB utilization at least for amdgpu. Instead of a completely braindead approach of filling the pool with one CPU while another one is trying to shrink it we only give back freed pages. This also results in much less locking contention and a trylock free MM shrinker callback, so we can guarantee that pages are given back to the system when needed. Downside of this is that it takes longer for many small allocations until the pool is filled up. We could address this, but I couldn't find an use case where this actually matters. We also don't bother freeing large chunks of pages any more since the CPU overhead in that path isn't really that important. The sysfs files are replaced with a single module parameter, allowing users to override how many pages should be globally pooled in TTM. This unfortunately breaks the UAPI slightly, but as far as we know nobody ever depended on this. Zeroing memory coming from the pool was handled inconsistently. The alloc_pages() based pool was zeroing it, the dma_alloc_attr() based one wasn't. For now the new implementation isn't zeroing pages from the pool either and only sets the __GFP_ZERO flag when necessary. The implementation has only 768 lines of code compared to the over 2600 of the old one, and also allows for saving quite a bunch of code in the drivers since we don't need specialized handling there any more based on kernel config. Additional to all of that there was a neat bug with IOMMU, coherent DMA mappings and huge pages which is now fixed in the new code as well. v2: make ttm_pool_apply_caching static as reported by the kernel bot, add some more checks v3: fix some more checkpatch.pl warnings Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Madhav Chauhan <madhav.chauhan@amd.com> Tested-by: Huang Rui <ray.huang@amd.com> Link: https://patchwork.freedesktop.org/patch/397080/?series=83051&rev=1 2020-10-22 16:26:58 +00:00			`#define TTM_NUM_CACHING_TYPES 3`

drm/ttm: add kerneldoc for enum ttm_caching Briefly describe what this is all about. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210908132933.3269-4-christian.koenig@amd.com 2021-09-07 07:14:43 +00:00			`/**`
			`* enum ttm_caching - CPU caching and BUS snooping behavior.`
			`*/`
drm/ttm: set the tt caching state at creation time All drivers can determine the tt caching state at creation time, no need to do this on the fly during every validation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/394253/ 2020-09-30 08:38:48 +00:00			`enum ttm_caching {`
drm/ttm: add kerneldoc for enum ttm_caching Briefly describe what this is all about. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210908132933.3269-4-christian.koenig@amd.com 2021-09-07 07:14:43 +00:00			`/**`
			`* @ttm_uncached: Most defensive option for device mappings,`
			`* don't even allow write combining.`
			`*/`
drm/ttm: set the tt caching state at creation time All drivers can determine the tt caching state at creation time, no need to do this on the fly during every validation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/394253/ 2020-09-30 08:38:48 +00:00			`ttm_uncached,`
drm/ttm: add kerneldoc for enum ttm_caching Briefly describe what this is all about. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210908132933.3269-4-christian.koenig@amd.com 2021-09-07 07:14:43 +00:00
			`/**`
			`* @ttm_write_combined: Don't cache read accesses, but allow at least`
			`* writes to be combined.`
			`*/`
drm/ttm: set the tt caching state at creation time All drivers can determine the tt caching state at creation time, no need to do this on the fly during every validation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/394253/ 2020-09-30 08:38:48 +00:00			`ttm_write_combined,`
drm/ttm: add kerneldoc for enum ttm_caching Briefly describe what this is all about. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210908132933.3269-4-christian.koenig@amd.com 2021-09-07 07:14:43 +00:00
			`/**`
			`* @ttm_cached: Fully cached like normal system memory, requires that`
			`* devices snoop the CPU cache on accesses.`
			`*/`
drm/ttm: set the tt caching state at creation time All drivers can determine the tt caching state at creation time, no need to do this on the fly during every validation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/394253/ 2020-09-30 08:38:48 +00:00			`ttm_cached`
			`};`

drm/ttm: Add a generic TTM memcpy move for page-based iomem The internal ttm_bo_util memcpy uses ioremap functionality, and while it probably might be possible to use it for copying in- and out of sglist represented io memory, using io_mem_reserve() / io_mem_free() callbacks, that would cause problems with fault(). Instead, implement a method mapping page-by-page using kmap_local() semantics. As an additional benefit we then avoid the occasional global TLB flushes of ioremap() and consuming ioremap space, elimination of a critical point of failure and with a slight change of semantics we could also push the memcpy out async for testing and async driver development purposes. A special linear iomem iterator is introduced internally to mimic the old ioremap behaviour for code-paths that can't immediately be ported over. This adds to the code size and should be considered a temporary solution. Looking at the code we have a lot of checks for iomap tagged pointers. Ideally we should extend the core memremap functions to also accept uncached memory and kmap_local functionality. Then we could strip a lot of code. Cc: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20210602083818.241793-4-thomas.hellstrom@linux.intel.com 2021-06-02 08:38:10 +00:00			`pgprot_t ttm_prot_from_caching(enum ttm_caching caching, pgprot_t tmp);`

drm/ttm: set the tt caching state at creation time All drivers can determine the tt caching state at creation time, no need to do this on the fly during every validation. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Link: https://patchwork.freedesktop.org/patch/394253/ 2020-09-30 08:38:48 +00:00			`#endif`