A mirror of the official Linux kernel repository just in case
Go to file
Petr Machata b72a6a7ab9 net: nexthop: Increase weight to u16
In CLOS networks, as link failures occur at various points in the network,
ECMP weights of the involved nodes are adjusted to compensate. With high
fan-out of the involved nodes, and overall high number of nodes,
a (non-)ECMP weight ratio that we would like to configure does not fit into
8 bits. Instead of, say, 255:254, we might like to configure something like
1000:999. For these deployments, the 8-bit weight may not be enough.

To that end, in this patch increase the next hop weight from u8 to u16.

Increasing the width of an integral type can be tricky, because while the
code still compiles, the types may not check out anymore, and numerical
errors come up. To prevent this, the conversion was done in two steps.
First the type was changed from u8 to a single-member structure, which
invalidated all uses of the field. This allowed going through them one by
one and audit for type correctness. Then the structure was replaced with a
vanilla u16 again. This should ensure that no place was missed.

The UAPI for configuring nexthop group members is that an attribute
NHA_GROUP carries an array of struct nexthop_grp entries:

	struct nexthop_grp {
		__u32	id;	  /* nexthop id - must exist */
		__u8	weight;   /* weight of this nexthop */
		__u8	resvd1;
		__u16	resvd2;
	};

The field resvd1 is currently validated and required to be zero. We can
lift this requirement and carry high-order bits of the weight in the
reserved field:

	struct nexthop_grp {
		__u32	id;	  /* nexthop id - must exist */
		__u8	weight;   /* weight of this nexthop */
		__u8	weight_high;
		__u16	resvd2;
	};

Keeping the fields split this way was chosen in case an existing userspace
makes assumptions about the width of the weight field, and to sidestep any
endianness issues.

The weight field is currently encoded as the weight value minus one,
because weight of 0 is invalid. This same trick is impossible for the new
weight_high field, because zero must mean actual zero. With this in place:

- Old userspace is guaranteed to carry weight_high of 0, therefore
  configuring 8-bit weights as appropriate. When dumping nexthops with
  16-bit weight, it would only show the lower 8 bits. But configuring such
  nexthops implies existence of userspace aware of the extension in the
  first place.

- New userspace talking to an old kernel will work as long as it only
  attempts to configure 8-bit weights, where the high-order bits are zero.
  Old kernel will bounce attempts at configuring >8-bit weights.

Renaming reserved fields as they are allocated for some purpose is commonly
done in Linux. Whoever touches a reserved field is doing so at their own
risk. nexthop_grp::resvd1 in particular is currently used by at least
strace, however they carry an own copy of UAPI headers, and the conversion
should be trivial. A helper is provided for decoding the weight out of the
two fields. Forcing a conversion seems preferable to bending backwards and
introducing anonymous unions or whatever.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://patch.msgid.link/483e2fcf4beb0d9135d62e7d27b46fa2685479d4.1723036486.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12 17:50:34 -07:00
arch LoongArch: KVM: Remove undefined a6 argument comment for kvm_hypercall() 2024-08-07 17:37:14 +08:00
block block: fix deadlock between sd_remove & sd_release 2024-07-24 09:51:21 -06:00
certs kbuild: use $(src) instead of $(srctree)/$(src) for source directory 2024-05-10 04:34:52 +09:00
crypto crypto: testmgr - generate power-of-2 lengths more often 2024-07-13 11:50:28 +12:00
Documentation netlink: specs: decode indirection table as u32 array 2024-08-12 14:16:24 +01:00
drivers eth: fbnic: add support for basic qstats 2024-08-12 15:44:23 -07:00
fs tracing fixes for v6.11: 2024-08-08 13:32:59 -07:00
include net: nexthop: Increase weight to u16 2024-08-12 17:50:34 -07:00
init rust: SHADOW_CALL_STACK is incompatible with Rust 2024-08-01 16:15:03 +01:00
io_uring io_uring: remove unused local list heads in NAPI functions 2024-07-30 06:20:20 -06:00
ipc sysctl: treewide: constify the ctl_table argument of proc_handlers 2024-07-24 20:59:29 +02:00
kernel tracing fixes for v6.11: 2024-08-08 13:32:59 -07:00
lib minmax: don't use max() in situations that want a C constant expression 2024-07-28 20:23:27 -07:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm 9 hotfixes. 5 are cc:stable, 4 either pertain to post-6.10 material or 2024-08-08 07:32:20 -07:00
net net: nexthop: Increase weight to u16 2024-08-12 17:50:34 -07:00
rust Rust changes for v6.11 2024-07-27 13:44:54 -07:00
samples Driver core changes for 6.11-rc1 2024-07-25 10:42:22 -07:00
scripts syscalls: fix syscall macros for newfstat/newfstatat 2024-08-02 15:20:47 +02:00
security apparmor-pr-2024-07-24 PR 2024-07-25 2024-07-27 13:28:39 -07:00
sound sound fixes for 6.11-rc2 2024-08-02 09:04:57 -07:00
tools selftests: drv-net: rss_ctx: test dumping RSS contexts 2024-08-12 14:16:25 +01:00
usr initramfs: shorten cmd_initfs in usr/Makefile 2024-07-16 01:07:52 +09:00
virt KVM: guest_memfd: abstract how prepared folios are recorded 2024-07-26 14:46:15 -04:00
.clang-format Docs: Move clang-format from process/ to dev-tools/ 2024-06-26 16:36:00 -06:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore .gitignore: add .gcda files 2024-08-09 13:18:46 +01:00
.mailmap mailmap: update entry for David Heidelberg 2024-08-07 18:33:56 -07:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS tracing: Update of MAINTAINERS and CREDITS file 2024-07-18 14:08:42 -07:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS selftests: rds: add testing infrastructure 2024-08-09 13:18:46 +01:00
Makefile Linux 6.11-rc2 2024-08-04 13:50:53 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.