linux

Author	SHA1	Message	Date
Michael Guralnik	2233c6609c	RDMA/uverbs: Add new relaxed ordering memory region access flag Add a new relaxed ordering access flag for memory regions. Using memory regions with relaxed ordeing set can enhance performance. This access flag is handled in a best-effort manner, drivers should ignore if they don't support setting relaxed ordering. Link: https://lore.kernel.org/r/1578506740-22188-9-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-01-16 15:55:46 -04:00
Michael Guralnik	68d384b906	RDMA/core: Add optional access flags range Define a range of access flags that are defined to be optional, both uverbs and drivers should enable getting them and use if they are applicable This will be used, for example, for the relaxed ordering access flag which unsupporting drivers can ignore. Link: https://lore.kernel.org/r/1578506740-22188-7-git-send-email-yishaih@mellanox.com Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-01-16 15:55:46 -04:00
Jason Gunthorpe	a1123418ba	RDMA/uverbs: Add ioctl command to get a device context Allow future extensions of the get context command through the uverbs ioctl kabi. Unlike the uverbs version this does not return an async_fd as well, that has to be done with another command. Link: https://lore.kernel.org/r/1578506740-22188-5-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-01-16 15:55:45 -04:00
Jason Gunthorpe	d680e88e20	RDMA/core: Add UVERBS_METHOD_ASYNC_EVENT_ALLOC Allow the async FD to be allocated separately from the context. This is necessary to introduce the ioctl to create a context, as an ioctl should only ever create a single uobject at a time. If multiple async FDs are created then the first one is used to deliver affiliated events from any ib_uevent_object, with all subsequent ones will receive only unaffiliated events. Link: https://lore.kernel.org/r/1578506740-22188-3-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-01-16 15:55:45 -04:00
Jeremy Sowden	567d746b55	netfilter: bitwise: add support for shifts. Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR and XOR. Extend it to do shifts as well. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-16 15:52:02 +01:00
Jeremy Sowden	779f725e14	netfilter: bitwise: add NFTA_BITWISE_DATA attribute. Add a new bitwise netlink attribute that will be used by shift operations to store the size of the shift. It is not used by boolean operations. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-16 15:52:02 +01:00
Jeremy Sowden	9d1f979986	netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute. Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a value of a new enum, nft_bitwise_ops. It describes the type of operation an expression contains. Currently, it only has one value: NFT_BITWISE_BOOL. More values will be added later to implement shifts. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-16 15:51:57 +01:00
Jeremy Sowden	4a7faaf4ad	netfilter: nft_bitwise: correct uapi header comment. The comment documenting how bitwise expressions work includes a table which summarizes the mask and xor arguments combined to express the supported boolean operations. However, the row for OR: mask xor 0 x is incorrect. dreg = (sreg & 0) ^ x is not equivalent to: dreg = sreg \| x What the code actually does is: dreg = (sreg & ~x) ^ x Update the documentation to match. Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-01-16 15:51:14 +01:00
David S. Miller	8fec380ac0	Merge tag 'batadv-next-for-davem-20200114' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - fix typo and kerneldocs, by Sven Eckelmann - use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer - silence some endian sparse warnings by adding annotations, by Sven Eckelmann - Update copyright years to 2020, by Sven Eckelmann - Disable deprecated sysfs configuration by default, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 23:04:04 +01:00
Yonghong Song	057996380a	bpf: Add batch ops to all htab bpf map htab can't use generic batch support due some problematic behaviours inherent to the data structre, i.e. while iterating the bpf map a concurrent program might delete the next entry that batch was about to use, in that case there's no easy solution to retrieve the next entry, the issue has been discussed multiple times (see [1] and [2]). The only way hmap can be traversed without the problem previously exposed is by making sure that the map is traversing entire buckets. This commit implements those strict requirements for hmap, the implementation follows the same interaction that generic support with some exceptions: - If keys/values buffer are not big enough to traverse a bucket, ENOSPC will be returned. - out_batch contains the value of the next bucket in the iteration, not the next key, but this is transparent for the user since the user should never use out_batch for other than bpf batch syscalls. This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete batch ops it is possible to use the generic implementations. [1] https://lore.kernel.org/bpf/20190724165803.87470-1-brianvv@google.com/ [2] https://lore.kernel.org/bpf/20190906225434.3635421-1-yhs@fb.com/ Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-6-brianvv@google.com	2020-01-15 14:00:35 -08:00
Brian Vazquez	aa2e93b8e5	bpf: Add generic support for update and delete batch ops This commit adds generic support for update and delete batch ops that can be used for almost all the bpf maps. These commands share the same UAPI attr that lookup and lookup_and_delete batch ops use and the syscall commands are: BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The main difference between update/delete and lookup batch ops is that for update/delete keys/values must be specified for userspace and because of that, neither in_batch nor out_batch are used. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-4-brianvv@google.com	2020-01-15 14:00:35 -08:00
Brian Vazquez	cb4d03ab49	bpf: Add generic support for lookup batch op This commit introduces generic support for the bpf_map_lookup_batch. This implementation can be used by almost all the bpf maps since its core implementation is relying on the existing map_get_next_key and map_lookup_elem. The bpf syscall subcommand introduced is: BPF_MAP_LOOKUP_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP__BATCH commands / __aligned_u64 in_batch; /* start batch, * NULL to start from beginning / __aligned_u64 out_batch; / output: next start batch / __aligned_u64 keys; __aligned_u64 values; __u32 count; / input/output: * input: # of key/value * elements * output: # of filled elements / __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch/out_batch are opaque values use to communicate between user/kernel space, in_batch/out_batch must be of key_size length. To start iterating from the beginning in_batch must be null, count is the # of key/value elements to retrieve. Note that the 'keys' buffer must be a buffer of key_size count size and the 'values' buffer must be value_size * count, where value_size must be aligned to 8 bytes by userspace if it's dealing with percpu maps. 'count' will contain the number of keys/values successfully retrieved. Note that 'count' is an input/output variable and it can contain a lower value after a call. If there's no more entries to retrieve, ENOENT will be returned. If error is ENOENT, count might be > 0 in case it copied some values but there were no more entries to retrieve. Note that if the return code is an error and not -EFAULT, count indicates the number of elements successfully processed. Suggested-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Brian Vazquez <brianvv@google.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115184308.162644-3-brianvv@google.com	2020-01-15 14:00:35 -08:00
Yonghong Song	8482941f09	bpf: Add bpf_send_signal_thread() helper Commit `8b401f9ed2` ("bpf: implement bpf_send_signal() helper") added helper bpf_send_signal() which permits bpf program to send a signal to the current process. The signal may be delivered to any threads in the process. We found a use case where sending the signal to the current thread is more preferable. - A bpf program will collect the stack trace and then send signal to the user application. - The user application will add some thread specific information to the just collected stack trace for later analysis. If bpf_send_signal() is used, user application will need to check whether the thread receiving the signal matches the thread collecting the stack by checking thread id. If not, it will need to send signal to another thread through pthread_kill(). This patch proposed a new helper bpf_send_signal_thread(), which sends the signal to the thread corresponding to the current kernel task. This way, user space is guaranteed that bpf_program execution context and user space signal handling context are the same thread. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200115035002.602336-1-yhs@fb.com	2020-01-15 11:44:51 -08:00
Kelvin Cao	4efa1d2e36	PCI/switchtec: Add Gen4 flash information interface support Add the new flash_info registers struct and the implementation of ioctl_flash_part_info() for the new Gen4 hardware. [logang@deltatee.com: rewrote commit message] Link: https://lore.kernel.org/r/20200115035648.2578-7-logang@deltatee.com Signed-off-by: Kelvin Cao <kelvin.cao@microchip.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2020-01-15 11:00:39 -06:00
Logan Gunthorpe	fcccd282b6	PCI/switchtec: Rename generation-specific constants Gen4 hardware will have different values for the SWITCHTEC_X_RUNNING and SWITCHTEC_IOCTL_NUM_PARTITIONS, so rename them with GEN3 in their name. No functional changes intended. Link: https://lore.kernel.org/r/20200115035648.2578-2-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2020-01-15 11:00:37 -06:00
Logan Gunthorpe	a6b0ef9a7d	PCI/switchtec: Add support for Intercomm Notify and Upstream Error Containment Add support for the Inter Fabric Manager Communication (Intercomm) Notify event in PAX variants of Switchtec hardware and the Upstream Error Containment port in the MR1 release of Gen3 firmware. Link: https://lore.kernel.org/r/20200106190337.2428-4-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2020-01-15 11:00:27 -06:00
Nikolay Aleksandrov	cf5bddb95c	net: bridge: vlan: add rtnetlink group and notify support Add a new rtnetlink group for bridge vlan notifications - RTNLGRP_BRVLAN and add support for sending vlan notifications (both single and ranges). No functional changes intended, the notification support will be used by later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov	0ab5587951	net: bridge: vlan: add rtm range support Add a new vlandb nl attribute - BRIDGE_VLANDB_ENTRY_RANGE which causes RTM_NEWVLAN/DELVAN to act on a range. Dumps now automatically compress similar vlans into ranges. This will be also used when per-vlan options are introduced and vlans' options match, they will be put into a single range which is encapsulated in one netlink attribute. We need to run similar checks as br_process_vlan_info() does because these ranges will be used for options setting and they'll be able to skip br_process_vlan_info(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:18 +01:00
Nikolay Aleksandrov	8dcea18708	net: bridge: vlan: add rtm definitions and dump support This patch adds vlan rtm definitions: - NEWVLAN: to be used for creating vlans, setting options and notifications - DELVLAN: to be used for deleting vlans - GETVLAN: used for dumping vlan information Dumping vlans which can span multiple messages is added now with basic information (vid and flags). We use nlmsg_parse() to validate the header length in order to be able to extend the message with filtering attributes later. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-15 13:48:17 +01:00
Roland Scheidegger	cb92a32359	drm/vmwgfx: add ioctl for messaging from/to guest userspace to/from host Up to now, guest userspace does logging directly to host using essentially the same rather complex port assembly stuff as the kernel. We'd rather use the same mechanism than duplicate it (it may also change in the future), hence add a new ioctl for relaying guest/host messaging (logging is just one application of it). Signed-off-by: Roland Scheidegger <sroland@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>	2020-01-15 11:54:16 +01:00
John Crispin	5c5e52d1bb	nl80211: add handling for BSS color This patch adds the attributes, policy and parsing code to allow userland to send the info about the BSS coloring settings to the kernel. Signed-off-by: John Crispin <john@phrozen.org> Link: https://lore.kernel.org/r/20191217141921.8114-1-john@phrozen.org [johannes: remove the strict policy parsing, that was a misunderstanding] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2020-01-15 11:14:24 +01:00
Ido Schimmel	90b93f1b31	ipv4: Add "offload" and "trap" indications to routes When performing L3 offload, routes and nexthops are usually programmed into two different tables in the underlying device. Therefore, the fact that a nexthop resides in hardware does not necessarily mean that all the associated routes also reside in hardware and vice-versa. While the kernel can signal to user space the presence of a nexthop in hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag for routes. In addition, the fact that a route resides in hardware does not necessarily mean that the traffic is offloaded. For example, unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap packets to the CPU so that the kernel will be able to generate the appropriate ICMP error packet. This patch adds an "offload" and "trap" indications to IPv4 routes, so that users will have better visibility into the offload process. 'struct fib_alias' is extended with two new fields that indicate if the route resides in hardware or not and if it is offloading traffic from the kernel or trapping packets to it. Note that the new fields are added in the 6 bytes hole and therefore the struct still fits in a single cache line [1]. Capable drivers are expected to invoke fib_alias_hw_flags_set() with the route's key in order to set the flags. The indications are dumped to user space via a new flags (i.e., 'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the ancillary header. v2: * Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set() [1] struct fib_alias { struct hlist_node fa_list; /* 0 16 / struct fib_info fa_info; /* 16 8 / u8 fa_tos; / 24 1 / u8 fa_type; / 25 1 / u8 fa_state; / 26 1 / u8 fa_slen; / 27 1 / u32 tb_id; / 28 4 / s16 fa_default; / 32 2 / u8 offload:1; / 34: 0 1 / u8 trap:1; / 34: 1 1 / u8 unused:6; / 34: 2 1 / / XXX 5 bytes hole, try to pack / struct callback_head rcu __attribute__((__aligned__(8))); / 40 16 / / size: 56, cachelines: 1, members: 12 / / sum members: 50, holes: 1, sum holes: 5 / / sum bitfield members: 8 bits (1 bytes) / / forced alignments: 1, forced holes: 1, sum forced holes: 5 / / last cacheline: 56 bytes */ } __attribute__((__aligned__(8))); Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-14 18:53:35 -08:00
Daniel Stone	455e00f141	drm: Add getfb2 ioctl getfb2 allows us to pass multiple planes and modifiers, just like addfb2 over addfb. Changes since v2: - add privilege checks from getfb1 since handles should only be returned to master/root Changes since v1: - unused modifiers set to 0 instead of DRM_FORMAT_MOD_INVALID - update ioctl number Signed-off-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Juston Li <juston.li@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191217034642.3814-1-juston.li@intel.com	2020-01-14 16:22:17 -05:00
Antoine Tenart	dcb780fb27	net: macsec: add nla support for changing the offloading selection MACsec offloading to underlying hardware devices is disabled by default (the software implementation is used). This patch adds support for changing this setting through the MACsec netlink interface. Many checks are done when enabling offloading on a given MACsec interface as there are limitations (it must be supported by the hardware, only a single interface can be offloaded on a given physical device at a time, rules can't be moved for now). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-14 11:31:41 -08:00
Antoine Tenart	76564261a7	net: macsec: introduce the macsec_context structure This patch introduces the macsec_context structure. It will be used in the kernel to exchange information between the common MACsec implementation (macsec.c) and the MACsec hardware offloading implementations. This structure contains pointers to MACsec specific structures which contain the actual MACsec configuration, and to the underlying device (phydev for now). Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-14 11:31:41 -08:00
zhenwei pi	191941692a	misc: pvpanic: add crash loaded event Some users prefer kdump tools to generate guest kernel dumpfile, at the same time, need a out-of-band kernel panic event. Currently if booting guest kernel with 'crash_kexec_post_notifiers', QEMU will receive PVPANIC_PANICKED event and stop VM. If booting guest kernel without 'crash_kexec_post_notifiers', guest will not call notifier chain. Add PVPANIC_CRASH_LOADED bit for pvpanic event, it means that guest kernel actually hit a kernel panic, but the guest kernel wants to handle by itself. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Link: https://lore.kernel.org/r/20200102023513.318836-3-pizhenwei@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-01-14 15:07:37 +01:00
zhenwei pi	e0b9a42735	misc: pvpanic: move bit definition to uapi header file Some processes outside of the kernel(Ex, QEMU) should know what the value really is for, so move the bit definition to a uapi file. Suggested-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Link: https://lore.kernel.org/r/20200102023513.318836-2-pizhenwei@bytedance.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-01-14 15:07:37 +01:00
Andrei Vagin	769071ac9f	ns: Introduce Time Namespace Time Namespace isolates clock values. The kernel provides access to several clocks CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc. CLOCK_REALTIME System-wide clock that measures real (i.e., wall-clock) time. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_BOOTTIME Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. For many users, the time namespace means the ability to changes date and time in a container (CLOCK_REALTIME). Providing per namespace notions of CLOCK_REALTIME would be complex with a massive overhead, but has a dubious value. But in the context of checkpoint/restore functionality, monotonic and boottime clocks become interesting. Both clocks are monotonic with unspecified starting points. These clocks are widely used to measure time slices and set timers. After restoring or migrating processes, it has to be guaranteed that they never go backward. In an ideal case, the behavior of these clocks should be the same as for a case when a whole system is suspended. All this means that it is required to set CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace offsets for clocks. A time namespace is similar to a pid namespace in the way how it is created: unshare(CLONE_NEWTIME) system call creates a new time namespace, but doesn't set it to the current process. Then all children of the process will be born in the new time namespace, or a process can use the setns() system call to join a namespace. This scheme allows setting clock offsets for a namespace, before any processes appear in it. All available clone flags have been used, so CLONE_NEWTIME uses the highest bit of CSIGNAL. It means that it can be used only with the unshare() and the clone3() system calls. [ tglx: Adjusted paragraph about clone3() to reality and massaged the changelog a bit. ] Co-developed-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Andrei Vagin <avagin@gmail.com> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://criu.org/Time_namespace Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com	2020-01-14 12:20:48 +01:00
Sargun Dhillon	9a2cef09c8	arch: wire up pidfd_getfd syscall This wires up the pidfd_getfd syscall for all architectures. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2020-01-13 21:49:47 +01:00
Jason Gunthorpe	3e032c0e92	RDMA/core: Make ib_uverbs_async_event_file into a uobject This makes async events aligned with completion events as both are full uobjects of FD type and use the same uobject lifecycle. A bunch of duplicate code is consolidated and the general flow between the two FDs is now very similar. Link: https://lore.kernel.org/r/1578504126-9400-14-git-send-email-yishaih@mellanox.com Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-01-13 16:20:16 -04:00
Greg Kroah-Hartman	d40310f657	Merge 5.5-rc6 into staging-next We need the staging fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-01-13 07:52:17 +01:00
Yishai Hadas	7be76bef32	IB/mlx5: Introduce VAR object and its alloc/destroy methods Introduce VAR object and its alloc/destroy KABI methods. The internal implementation uses the IB core API to manage mmap/munamp calls. Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-01-12 19:49:13 -04:00
Alexei Starovoitov	51c39bb1d5	bpf: Introduce function-by-function verification New llvm and old llvm with libbpf help produce BTF that distinguish global and static functions. Unlike arguments of static function the arguments of global functions cannot be removed or optimized away by llvm. The compiler has to use exactly the arguments specified in a function prototype. The argument type information allows the verifier validate each global function independently. For now only supported argument types are pointer to context and scalars. In the future pointers to structures, sizes, pointer to packet data can be supported as well. Consider the following example: static int f1(int ...) { ... } int f3(int b); int f2(int a) { f1(a) + f3(a); } int f3(int b) { ... } int main(...) { f1(...) + f2(...) + f3(...); } The verifier will start its safety checks from the first global function f2(). It will recursively descend into f1() because it's static. Then it will check that arguments match for the f3() invocation inside f2(). It will not descend into f3(). It will finish f2() that has to be successfully verified for all possible values of 'a'. Then it will proceed with f3(). That function also has to be safe for all possible values of 'b'. Then it will start subprog 0 (which is main() function). It will recursively descend into f1() and will skip full check of f2() and f3(), since they are global. The order of processing global functions doesn't affect safety, since all global functions must be proven safe based on their arguments only. Such function by function verification can drastically improve speed of the verification and reduce complexity. Note that the stack limit of 512 still applies to the call chain regardless whether functions were static or global. The nested level of 8 also still applies. The same recursion prevention checks are in place as well. The type information and static/global kind is preserved after the verification hence in the above example global function f2() and f3() can be replaced later by equivalent functions with the same types that are loaded and verified later without affecting safety of this main() program. Such replacement (re-linking) of global functions is a subject of future patches. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org	2020-01-10 17:20:07 +01:00
Martin K. Petersen	1c46a2cf2d	Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue Pull compat_ioctl cleanup from Arnd. Here's his description: This series concludes the work I did for linux-5.5 on the compat_ioctl() cleanup, killing off fs/compat_ioctl.c and block/compat_ioctl.c by moving everything into drivers. Overall this would be a reduction both in complexity and line count, but as I'm also adding documentation the overall number of lines increases in the end. My plan was originally to keep the SCSI and block parts separate. This did not work easily because of interdependencies: I cannot do the final SCSI cleanup in a good way without first addressing the CDROM ioctls, so this is one series that I hope could be merged through either the block or the scsi git trees, or possibly both if you can pull in the same branch. The series comes in these steps: 1. clean up the sg v3 interface as suggested by Linus. I have talked about this with Doug Gilbert as well, and he would rebase his sg v4 patches on top of "compat: scsi: sg: fix v3 compat read/write interface" 2. Actually moving handlers out of block/compat_ioctl.c and block/scsi_ioctl.c into drivers, mixed in with cleanup patches 3. Document how to do this right. I keep getting asked about this, and it helps to point to some documentation file. The branch is based on another one that fixes a couple of bugs found during the creation of this series. Changes since v3: https://lore.kernel.org/lkml/20200102145552.1853992-1-arnd@arndb.de/ - Move sr_compat_ioctl fixup to correct patch (Ben Hutchings) - Add Reviewed-by tags Changes since v2: https://lore.kernel.org/lkml/20191217221708.3730997-1-arnd@arndb.de/ - Rebase to v5.5-rc4, which contains the earlier bugfixes - Fix sr_block_compat_ioctl() error handling bug found by Ben Hutchings - Fix idecd_locked_compat_ioctl() compat_ptr() bug - Don't try to handle HDIO_DRIVE_TASKFILE in drivers/ide - More documentation improvements Changes since v1: https://lore.kernel.org/lkml/20191211204306.1207817-1-arnd@arndb.de/ - move out the bugfixes into a branch for itself - clean up scsi sg driver further as suggested by Christoph Hellwig - avoid some ifdefs by moving compat_ptr() out of asm/compat.h - split out the blkdev_compat_ptr_ioctl function; bug spotted by Ben Hutchings - Improve formatting of documentation Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-01-10 00:14:46 -05:00
Mat Martineau	faf391c382	tcp: Define IPPROTO_MPTCP To open a MPTCP socket with socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP), IPPROTO_MPTCP needs a value that differs from IPPROTO_TCP. The existing IPPROTO numbers mostly map directly to IANA-specified protocol numbers. MPTCP does not have a protocol number allocated because MPTCP packets use the TCP protocol number. Use private number not used OTA. Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-09 18:41:41 -08:00
Linus Torvalds	b5b3159cff	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a few small fixups here" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: imx_sc_key - only take the valid data from SCU firmware as key state Input: add safety guards to input_set_keycode() Input: input_event - fix struct padding on sparc64 Input: uinput - always report EPOLLOUT	2020-01-09 15:37:40 -08:00
David S. Miller	a2d6d7ae59	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The ungrafting from PRIO bug fixes in net, when merged into net-next, merge cleanly but create a build failure. The resolution used here is from Petr Machata. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-09 12:13:43 -08:00
Andrey Ignatov	f5bfcd953d	bpf: Document BPF_F_QUERY_EFFECTIVE flag Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects attach_flags what may not be obvious and what may lead to confision. Specifically attach_flags is returned only for target_fd but if programs are inherited from an ancestor cgroup then returned attach_flags for current cgroup may be confusing. For example, two effective programs of same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in attach_flags. Simple repro: # bpftool c s /sys/fs/cgroup/path/to/task ID AttachType AttachFlags Name # bpftool c s /sys/fs/cgroup/path/to/task effective ID AttachType AttachFlags Name 95043 ingress tw_ipt_ingress 95048 ingress tw_ingress Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com	2020-01-09 09:40:06 -08:00
Martin KaFai Lau	206057fe02	bpf: Add BPF_FUNC_tcp_send_ack helper Add a helper to send out a tcp-ack. It will be used in the later bpf_dctcp implementation that requires to send out an ack when the CE state changed. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109004551.3900448-1-kafai@fb.com	2020-01-09 08:46:18 -08:00
Martin KaFai Lau	85d33df357	bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS The patch introduces BPF_MAP_TYPE_STRUCT_OPS. The map value is a kernel struct with its func ptr implemented in bpf prog. This new map is the interface to register/unregister/introspect a bpf implemented kernel struct. The kernel struct is actually embedded inside another new struct (or called the "value" struct in the code). For example, "struct tcp_congestion_ops" is embbeded in: struct bpf_struct_ops_tcp_congestion_ops { refcount_t refcnt; enum bpf_struct_ops_state state; struct tcp_congestion_ops data; /* <-- kernel subsystem struct here / } The map value is "struct bpf_struct_ops_tcp_congestion_ops". The "bpftool map dump" will then be able to show the state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g. number of tcp_sock in the tcp_congestion_ops case). This "value" struct is created automatically by a macro. Having a separate "value" struct will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding "void (init)(void)" to "struct bpf_struct_ops_XYZ" to do some initialization works before registering the struct_ops to the kernel subsystem). The libbpf will take care of finding and populating the "struct bpf_struct_ops_XYZ" from "struct XYZ". Register a struct_ops to a kernel subsystem: 1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s) 2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the running kernel. Instead of reusing the attr->btf_value_type_id, btf_vmlinux_value_type_id s added such that attr->btf_fd can still be used as the "user" btf which could store other useful sysadmin/debug info that may be introduced in the furture, e.g. creation-date/compiler-details/map-creator...etc. 3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described in the running kernel btf. Populate the value of this object. The function ptr should be populated with the prog fds. 4. Call BPF_MAP_UPDATE with the object created in (3) as the map value. The key is always "0". During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's args as an array of u64 is generated. BPF_MAP_UPDATE also allows the specific struct_ops to do some final checks in "st_ops->init_member()" (e.g. ensure all mandatory func ptrs are implemented). If everything looks good, it will register this kernel struct to the kernel subsystem. The map will not allow further update from this point. Unregister a struct_ops from the kernel subsystem: BPF_MAP_DELETE with key "0". Introspect a struct_ops: BPF_MAP_LOOKUP_ELEM with key "0". The map value returned will have the prog _id_ populated as the func ptr. The map value state (enum bpf_struct_ops_state) will transit from: INIT (map created) => INUSE (map updated, i.e. reg) => TOBEFREE (map value deleted, i.e. unreg) The kernel subsystem needs to call bpf_struct_ops_get() and bpf_struct_ops_put() to manage the "refcnt" in the "struct bpf_struct_ops_XYZ". This patch uses a separate refcnt for the purose of tracking the subsystem usage. Another approach is to reuse the map->refcnt and then "show" (i.e. during map_lookup) the subsystem's usage by doing map->refcnt - map->usercnt to filter out the map-fd/pinned-map usage. However, that will also tie down the future semantics of map->refcnt and map->usercnt. The very first subsystem's refcnt (during reg()) holds one count to map->refcnt. When the very last subsystem's refcnt is gone, it will also release the map->refcnt. All bpf_prog will be freed when the map->refcnt reaches 0 (i.e. during map_free()). Here is how the bpftool map command will look like: [root@arch-fb-vm1 bpf]# bpftool map show 6: struct_ops name dctcp flags 0x0 key 4B value 256B max_entries 1 memlock 4096B btf_id 6 [root@arch-fb-vm1 bpf]# bpftool map dump id 6 [{ "value": { "refcnt": { "refs": { "counter": 1 } }, "state": 1, "data": { "list": { "next": 0, "prev": 0 }, "key": 0, "flags": 2, "init": 24, "release": 0, "ssthresh": 25, "cong_avoid": 30, "set_state": 27, "cwnd_event": 28, "in_ack_event": 26, "undo_cwnd": 29, "pkts_acked": 0, "min_tso_segs": 0, "sndbuf_expand": 0, "cong_control": 0, "get_info": 0, "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0 ], "owner": 0 } } } ] Misc Notes: * bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup. It does an inplace update on "value" instead returning a pointer to syscall.c. Otherwise, it needs a separate copy of "zero" value for the BPF_STRUCT_OPS_STATE_INIT to avoid races. The bpf_struct_ops_map_delete_elem() is also called without preempt_disable() from map_delete_elem(). It is because the "->unreg()" may requires sleepable context, e.g. the "tcp_unregister_congestion_control()". * "const" is added to some of the existing "struct btf_func_model *" function arg to avoid a compiler warning caused by this patch. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com	2020-01-09 08:46:18 -08:00
Martin KaFai Lau	27ae7997a6	bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS This patch allows the kernel's struct ops (i.e. func ptr) to be implemented in BPF. The first use case in this series is the "struct tcp_congestion_ops" which will be introduced in a latter patch. This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS. The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular func ptr of a kernel struct. The attr->attach_btf_id is the btf id of a kernel struct. The attr->expected_attach_type is the member "index" of that kernel struct. The first member of a struct starts with member index 0. That will avoid ambiguity when a kernel struct has multiple func ptrs with the same func signature. For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written to implement the "init" func ptr of the "struct tcp_congestion_ops". The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops" of the _running_ kernel. The attr->expected_attach_type is 3. The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved by arch_prepare_bpf_trampoline that will be done in the next patch when introducing BPF_MAP_TYPE_STRUCT_OPS. "struct bpf_struct_ops" is introduced as a common interface for the kernel struct that supports BPF_PROG_TYPE_STRUCT_OPS prog. The supporting kernel struct will need to implement an instance of the "struct bpf_struct_ops". The supporting kernel struct also needs to implement a bpf_verifier_ops. During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right bpf_verifier_ops by searching the attr->attach_btf_id. A new "btf_struct_access" is also added to the bpf_verifier_ops such that the supporting kernel struct can optionally provide its own specific check on accessing the func arg (e.g. provide limited write access). After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called to initialize some values (e.g. the btf id of the supporting kernel struct) and it can only be done once the btf_vmlinux is available. The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog if the return type of the prog->aux->attach_func_proto is "void". Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com	2020-01-09 08:46:18 -08:00
Jani Nikula	ec027b33c8	Merge drm/drm-next into drm-intel-next-queued Sync with drm-next to get the new logging macros, among other things. Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2020-01-09 17:19:12 +02:00
Dilip Kota	ed22aaaede	PCI: dwc: intel: PCIe RC controller driver Add support to PCIe RC controller on Intel Gateway SoCs. PCIe controller is based of Synopsys DesignWare PCIe core. Intel PCIe driver requires Upconfigure support, Fast Training Sequence and link speed configurations. So adding the respective helper functions in the PCIe DesignWare framework. It also programs hardware autonomous speed during speed configuration so defining it in pci_regs.h. Also, mark Intel PCIe driver depends on MSI IRQ Domain as Synopsys DesignWare framework depends on the PCI_MSI_IRQ_DOMAIN. Signed-off-by: Dilip Kota <eswara.kota@linux.intel.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>	2020-01-09 11:57:18 +00:00
Mauro Carvalho Chehab	8821e92879	Merge tag 'v5.5-rc5' into patchwork Linux 5.5-rc5 * tag 'v5.5-rc5': (1006 commits) Linux 5.5-rc5 Documentation: riscv: add patch acceptance guidelines riscv: prefix IRQ_ macro names with an RV_ namespace clocksource: riscv: add notrace to riscv_sched_clock apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock hexagon: define ioremap_uc ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less ocfs2: call journal flush to mark journal as empty after journal recovery when mount mm/hugetlb: defer freeing of huge pages if in non-task context mm/gup: fix memory leak in __gup_benchmark_ioctl mm/oom: fix pgtables units mismatch in Killed process message fs/posix_acl.c: fix kernel-doc warnings hexagon: work around compiler crash hexagon: parenthesize registers in asm predicates fs/namespace.c: make to_mnt_ns() static fs/nsfs.c: include headers for missing declarations fs/direct-io.c: include fs/internal.h for missing prototype mm: move_pages: return valid node id in status if the page is already on the target node memcg: account security cred as well to kmemcg kcov: fix struct layout for kcov_remote_arg ...	2020-01-08 11:13:25 +01:00
Andy Lutomirski	48446f198f	random: ignore GRND_RANDOM in getentropy(2) The separate blocking pool is going away. Start by ignoring GRND_RANDOM in getentropy(2). This should not materially break any API. Any code that worked without this change should work at least as well with this change. Signed-off-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/705c5a091b63cc5da70c99304bb97e0109be0a26.1577088521.git.luto@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2020-01-07 16:07:01 -05:00
Andy Lutomirski	75551dbf11	random: add GRND_INSECURE to return best-effort non-cryptographic bytes Signed-off-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/d5473b56cf1fa900ca4bd2b3fc1e5b8874399919.1577088521.git.luto@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2020-01-07 16:07:00 -05:00
Dhinakaran Pandiyan	0d3d29d0f8	drm/framebuffer: Format modifier for Intel Gen-12 media compression Gen-12 display can decompress surfaces compressed by the media engine, add a new modifier as the driver needs to know the surface was compressed by the media or render engine. v2: Update code comment describing the color plane order for YUV semiplanar formats. Cc: Nanley G Chery <nanley.g.chery@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Mika Kahola <mika.kahola@intel.com> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191231233756.18753-6-imre.deak@intel.com	2020-01-07 13:15:48 +02:00
Vladimir Oltean	6c93099450	mii: Add helpers for parsing SGMII auto-negotiation Typically a MAC PCS auto-configures itself after it receives the negotiated copper-side link settings from the PHY, but some MAC devices are more special and need manual interpretation of the SGMII AN result. In other cases, the PCS exposes the entire tx_config_reg base page as it is transmitted on the wire during auto-negotiation, so it makes sense to be able to decode the equivalent lp_advertised bit mask from the raw u16 (of course, "lp" considering the PCS to be the local PHY). Therefore, add the bit definitions for the SGMII registers 4 and 5 (local device ability, link partner ability), as well as a link_mode conversion helper that can be used to feed the AN results into phy_resolve_aneg_linkmode. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-05 23:22:32 -08:00
Andrey Konovalov	a69b83e1ae	kcov: fix struct layout for kcov_remote_arg Make the layout of kcov_remote_arg the same for 32-bit and 64-bit code. This makes it more convenient to write userspace apps that can be compiled into 32-bit or 64-bit binaries and still work with the same 64-bit kernel. Also use proper __u32 types in uapi headers instead of unsigned ints. Link: http://lkml.kernel.org/r/9e91020876029cfefc9211ff747685eba9536426.1575638983.git.andreyknvl@google.com Fixes: `eec028c938` ("kcov: remote coverage support") Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Felipe Balbi <balbi@kernel.org> Cc: Chunfeng Yun <chunfeng.yun@mediatek.com> Cc: "Jacky . Cao @ sony . com" <Jacky.Cao@sony.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-01-04 13:55:09 -08:00
Rijo Thomas	757cc3e9ff	tee: add AMD-TEE driver Adds AMD-TEE driver. * targets AMD APUs which has AMD Secure Processor with software-based Trusted Execution Environment (TEE) support * registers with TEE subsystem * defines tee_driver_ops function callbacks * kernel allocated memory is used as shared memory between normal world and secure world. * acts as REE (Rich Execution Environment) communication agent, which uses the services of AMD Secure Processor driver to submit commands for processing in TEE environment Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Tom Lendacky <thomas.lendacky@amd.com> Acked-by: Jens Wiklander <jens.wiklander@linaro.org> Co-developed-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com> Signed-off-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com> Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com> Reviewed-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-01-04 13:49:51 +08:00

... 6 7 8 9 10 ...

8711 Commits