linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-25 05:32:00 +00:00

Author	SHA1	Message	Date
André Almeida	142fa60f61	unicode: Recreate utf8_parse_version() All filesystems that currently support UTF-8 casefold can fetch the UTF-8 version from the filesystem metadata stored on disk. They can get the data stored and directly match it to a integer, so they can skip the string parsing step, which motivated the removal of this function in the first place. However, for tmpfs, the only way to tell the kernel which UTF-8 version we are about to use is via mount options, using a string. Re-introduce utf8_parse_version() to be used by tmpfs. This version differs from the original by skipping the intermediate step of copying the version string to an auxiliary string before calling match_token(). This versions calls match_token() in the argument string. The paramenters are simpler now as well. utf8_parse_version() was created by `9d53690f0d` ("unicode: implement higher level API for string handling") and later removed by `49bd03cc7e` ("unicode: pass a UNICODE_AGE() tripple to utf8_load"). Signed-off-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/20241021-tonyk-tmpfs-v8-4-f443d5814194@igalia.com Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-10-28 13:36:54 +01:00
Nick Alcock	573858e85d	unicode: remove MODULE_LICENSE in non-modules Since commit `8b41fc4454` ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Gabriel Krisman Bertazi <krisman@suse.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com> Cc: Gabriel Krisman Bertazi <krisman@collabora.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>	2023-04-13 13:13:54 -07:00
Christoph Hellwig	2b3d047870	unicode: Add utf8-data module utf8data.h contains a large database table which is an auto-generated decodification trie for the unicode normalization functions. Allow building it into a separate module. Based on a patch from Shreeya Patel <shreeya.patel@collabora.com>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>	2021-10-12 11:41:39 -03:00
Christoph Hellwig	6ca99ce756	unicode: cache the normalization tables in struct unicode_map Instead of repeatedly looking up the version add pointers to the NFD and NFD+CF tables to struct unicode_map, and pass a unicode_map plus index to the functions using the normalization tables. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>	2021-10-11 17:02:02 -03:00
Christoph Hellwig	49bd03cc7e	unicode: pass a UNICODE_AGE() tripple to utf8_load Don't bother with pointless string parsing when the caller can just pass the version in the format that the core expects. Also remove the fallback to the latest version that none of the callers actually uses. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>	2021-10-11 17:01:46 -03:00
Christoph Hellwig	a440943e68	unicode: remove the charset field from struct unicode_map It is hardcoded and only used for a f2fs sysfs file where it can be hardcoded just as easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>	2021-10-11 17:01:35 -03:00
Daniel Rosenberg	3d7bfea8b8	unicode: Add utf8_casefold_hash This adds a case insensitive hash function to allow taking the hash without needing to allocate a casefolded copy of the string. The existing d_hash implementations for casefolding allocate memory within rcu-walk, by avoiding it we can be more efficient and avoid worrying about a failed allocation. Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2020-09-10 14:03:31 -07:00
Colin Ian King	aa28b98d6d	unicode: make array 'token' static const, makes object smaller Don't populate the array 'token' on the stack but instead make it static const. Makes the object code smaller by 234 bytes. Before: text data bss dec hex filename 5371 272 0 5643 160b fs/unicode/utf8-core.o After: text data bss dec hex filename 5041 368 0 5409 1521 fs/unicode/utf8-core.o (gcc version 9.2.1, amd64) Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>	2019-09-17 11:48:24 -04:00
Gabriel Krisman Bertazi	3ae72562ad	ext4: optimize case-insensitive lookups Temporarily cache a casefolded version of the file name under lookup in ext4_filename, to avoid repeatedly casefolding it. I got up to 30% speedup on lookups of large directories (>100k entries), depending on the length of the string under lookup. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2019-06-19 23:45:09 -04:00
Gabriel Krisman Bertazi	9d53690f0d	unicode: implement higher level API for string handling This patch integrates the utf8n patches with some higher level API to perform UTF-8 string comparison, normalization and casefolding operations. Implemented is a variation of NFD, and casefold is performed by doing full casefold on top of NFD. These algorithms are based on the core implemented by Olaf Weber from SGI. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2019-04-25 13:51:22 -04:00

10 Commits