Commit Graph

8490 Commits

Author SHA1 Message Date
John Crispin
5c5e52d1bb nl80211: add handling for BSS color
This patch adds the attributes, policy and parsing code to allow userland
to send the info about the BSS coloring settings to the kernel.

Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20191217141921.8114-1-john@phrozen.org
[johannes: remove the strict policy parsing, that was a misunderstanding]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-15 11:14:24 +01:00
Ido Schimmel
90b93f1b31 ipv4: Add "offload" and "trap" indications to routes
When performing L3 offload, routes and nexthops are usually programmed
into two different tables in the underlying device. Therefore, the fact
that a nexthop resides in hardware does not necessarily mean that all
the associated routes also reside in hardware and vice-versa.

While the kernel can signal to user space the presence of a nexthop in
hardware (via 'RTNH_F_OFFLOAD'), it does not have a corresponding flag
for routes. In addition, the fact that a route resides in hardware does
not necessarily mean that the traffic is offloaded. For example,
unreachable routes (i.e., 'RTN_UNREACHABLE') are programmed to trap
packets to the CPU so that the kernel will be able to generate the
appropriate ICMP error packet.

This patch adds an "offload" and "trap" indications to IPv4 routes, so
that users will have better visibility into the offload process.

'struct fib_alias' is extended with two new fields that indicate if the
route resides in hardware or not and if it is offloading traffic from
the kernel or trapping packets to it. Note that the new fields are added
in the 6 bytes hole and therefore the struct still fits in a single
cache line [1].

Capable drivers are expected to invoke fib_alias_hw_flags_set() with the
route's key in order to set the flags.

The indications are dumped to user space via a new flags (i.e.,
'RTM_F_OFFLOAD' and 'RTM_F_TRAP') in the 'rtm_flags' field in the
ancillary header.

v2:
* Make use of 'struct fib_rt_info' in fib_alias_hw_flags_set()

[1]
struct fib_alias {
        struct hlist_node  fa_list;                      /*     0    16 */
        struct fib_info *          fa_info;              /*    16     8 */
        u8                         fa_tos;               /*    24     1 */
        u8                         fa_type;              /*    25     1 */
        u8                         fa_state;             /*    26     1 */
        u8                         fa_slen;              /*    27     1 */
        u32                        tb_id;                /*    28     4 */
        s16                        fa_default;           /*    32     2 */
        u8                         offload:1;            /*    34: 0  1 */
        u8                         trap:1;               /*    34: 1  1 */
        u8                         unused:6;             /*    34: 2  1 */

        /* XXX 5 bytes hole, try to pack */

        struct callback_head rcu __attribute__((__aligned__(8))); /*    40    16 */

        /* size: 56, cachelines: 1, members: 12 */
        /* sum members: 50, holes: 1, sum holes: 5 */
        /* sum bitfield members: 8 bits (1 bytes) */
        /* forced alignments: 1, forced holes: 1, sum forced holes: 5 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 18:53:35 -08:00
Antoine Tenart
dcb780fb27 net: macsec: add nla support for changing the offloading selection
MACsec offloading to underlying hardware devices is disabled by default
(the software implementation is used). This patch adds support for
changing this setting through the MACsec netlink interface. Many checks
are done when enabling offloading on a given MACsec interface as there
are limitations (it must be supported by the hardware, only a single
interface can be offloaded on a given physical device at a time, rules
can't be moved for now).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 11:31:41 -08:00
Antoine Tenart
76564261a7 net: macsec: introduce the macsec_context structure
This patch introduces the macsec_context structure. It will be used
in the kernel to exchange information between the common MACsec
implementation (macsec.c) and the MACsec hardware offloading
implementations. This structure contains pointers to MACsec specific
structures which contain the actual MACsec configuration, and to the
underlying device (phydev for now).

Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 11:31:41 -08:00
zhenwei pi
191941692a misc: pvpanic: add crash loaded event
Some users prefer kdump tools to generate guest kernel dumpfile,
at the same time, need a out-of-band kernel panic event.

Currently if booting guest kernel with 'crash_kexec_post_notifiers',
QEMU will receive PVPANIC_PANICKED event and stop VM. If booting
guest kernel without 'crash_kexec_post_notifiers', guest will not
call notifier chain.

Add PVPANIC_CRASH_LOADED bit for pvpanic event, it means that guest
kernel actually hit a kernel panic, but the guest kernel wants to
handle by itself.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Link: https://lore.kernel.org/r/20200102023513.318836-3-pizhenwei@bytedance.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14 15:07:37 +01:00
zhenwei pi
e0b9a42735 misc: pvpanic: move bit definition to uapi header file
Some processes outside of the kernel(Ex, QEMU) should know what the
value really is for, so move the bit definition to a uapi file.

Suggested-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Link: https://lore.kernel.org/r/20200102023513.318836-2-pizhenwei@bytedance.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14 15:07:37 +01:00
Andrei Vagin
769071ac9f ns: Introduce Time Namespace
Time Namespace isolates clock values.

The kernel provides access to several clocks CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.

CLOCK_REALTIME
      System-wide clock that measures real (i.e., wall-clock) time.

CLOCK_MONOTONIC
      Clock that cannot be set and represents monotonic time since
      some unspecified starting point.

CLOCK_BOOTTIME
      Identical to CLOCK_MONOTONIC, except it also includes any time
      that the system is suspended.

For many users, the time namespace means the ability to changes date and
time in a container (CLOCK_REALTIME). Providing per namespace notions of
CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
value.

But in the context of checkpoint/restore functionality, monotonic and
boottime clocks become interesting. Both clocks are monotonic with
unspecified starting points. These clocks are widely used to measure time
slices and set timers. After restoring or migrating processes, it has to be
guaranteed that they never go backward. In an ideal case, the behavior of
these clocks should be the same as for a case when a whole system is
suspended. All this means that it is required to set CLOCK_MONOTONIC and
CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
offsets for clocks.

A time namespace is similar to a pid namespace in the way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of the process
will be born in the new time namespace, or a process can use the setns()
system call to join a namespace.

This scheme allows setting clock offsets for a namespace, before any
processes appear in it.

All available clone flags have been used, so CLONE_NEWTIME uses the highest
bit of CSIGNAL. It means that it can be used only with the unshare() and
the clone3() system calls.

[ tglx: Adjusted paragraph about clone3() to reality and massaged the
  	changelog a bit. ]

Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://criu.org/Time_namespace
Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
2020-01-14 12:20:48 +01:00
Sargun Dhillon
9a2cef09c8 arch: wire up pidfd_getfd syscall
This wires up the pidfd_getfd syscall for all architectures.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13 21:49:47 +01:00
Jason Gunthorpe
3e032c0e92 RDMA/core: Make ib_uverbs_async_event_file into a uobject
This makes async events aligned with completion events as both are full
uobjects of FD type and use the same uobject lifecycle.

A bunch of duplicate code is consolidated and the general flow between the
two FDs is now very similar.

Link: https://lore.kernel.org/r/1578504126-9400-14-git-send-email-yishaih@mellanox.com
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-13 16:20:16 -04:00
Greg Kroah-Hartman
d40310f657 Merge 5.5-rc6 into staging-next
We need the staging fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-13 07:52:17 +01:00
Yishai Hadas
7be76bef32 IB/mlx5: Introduce VAR object and its alloc/destroy methods
Introduce VAR object and its alloc/destroy KABI methods. The internal
implementation uses the IB core API to manage mmap/munamp calls.

Link: https://lore.kernel.org/r/20191212110928.334995-5-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-12 19:49:13 -04:00
Alexei Starovoitov
51c39bb1d5 bpf: Introduce function-by-function verification
New llvm and old llvm with libbpf help produce BTF that distinguish global and
static functions. Unlike arguments of static function the arguments of global
functions cannot be removed or optimized away by llvm. The compiler has to use
exactly the arguments specified in a function prototype. The argument type
information allows the verifier validate each global function independently.
For now only supported argument types are pointer to context and scalars. In
the future pointers to structures, sizes, pointer to packet data can be
supported as well. Consider the following example:

static int f1(int ...)
{
  ...
}

int f3(int b);

int f2(int a)
{
  f1(a) + f3(a);
}

int f3(int b)
{
  ...
}

int main(...)
{
  f1(...) + f2(...) + f3(...);
}

The verifier will start its safety checks from the first global function f2().
It will recursively descend into f1() because it's static. Then it will check
that arguments match for the f3() invocation inside f2(). It will not descend
into f3(). It will finish f2() that has to be successfully verified for all
possible values of 'a'. Then it will proceed with f3(). That function also has
to be safe for all possible values of 'b'. Then it will start subprog 0 (which
is main() function). It will recursively descend into f1() and will skip full
check of f2() and f3(), since they are global. The order of processing global
functions doesn't affect safety, since all global functions must be proven safe
based on their arguments only.

Such function by function verification can drastically improve speed of the
verification and reduce complexity.

Note that the stack limit of 512 still applies to the call chain regardless whether
functions were static or global. The nested level of 8 also still applies. The
same recursion prevention checks are in place as well.

The type information and static/global kind is preserved after the verification
hence in the above example global function f2() and f3() can be replaced later
by equivalent functions with the same types that are loaded and verified later
without affecting safety of this main() program. Such replacement (re-linking)
of global functions is a subject of future patches.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200110064124.1760511-3-ast@kernel.org
2020-01-10 17:20:07 +01:00
Martin K. Petersen
1c46a2cf2d Merge tag 'block-ioctl-cleanup-5.6' into 5.6/scsi-queue
Pull compat_ioctl cleanup from Arnd. Here's his description:

This series concludes the work I did for linux-5.5 on the compat_ioctl()
cleanup, killing off fs/compat_ioctl.c and block/compat_ioctl.c by moving
everything into drivers.

Overall this would be a reduction both in complexity and line count, but
as I'm also adding documentation the overall number of lines increases
in the end.

My plan was originally to keep the SCSI and block parts separate.
This did not work easily because of interdependencies: I cannot
do the final SCSI cleanup in a good way without first addressing the
CDROM ioctls, so this is one series that I hope could be merged through
either the block or the scsi git trees, or possibly both if you can
pull in the same branch.

The series comes in these steps:

1. clean up the sg v3 interface as suggested by Linus. I have
   talked about this with Doug Gilbert as well, and he would
   rebase his sg v4 patches on top of "compat: scsi: sg: fix v3
   compat read/write interface"

2. Actually moving handlers out of block/compat_ioctl.c and
   block/scsi_ioctl.c into drivers, mixed in with cleanup
   patches

3. Document how to do this right. I keep getting asked about this,
   and it helps to point to some documentation file.

The branch is based on another one that fixes a couple of bugs found
during the creation of this series.

Changes since v3:
  https://lore.kernel.org/lkml/20200102145552.1853992-1-arnd@arndb.de/

- Move sr_compat_ioctl fixup to correct patch (Ben Hutchings)
- Add Reviewed-by tags

Changes since v2:
  https://lore.kernel.org/lkml/20191217221708.3730997-1-arnd@arndb.de/

- Rebase to v5.5-rc4, which contains the earlier bugfixes
- Fix sr_block_compat_ioctl() error handling bug found by
  Ben Hutchings
- Fix idecd_locked_compat_ioctl() compat_ptr() bug
- Don't try to handle HDIO_DRIVE_TASKFILE in drivers/ide
- More documentation improvements

Changes since v1:
  https://lore.kernel.org/lkml/20191211204306.1207817-1-arnd@arndb.de/

- move out the bugfixes into a branch for itself
- clean up scsi sg driver further as suggested by Christoph Hellwig
- avoid some ifdefs by moving compat_ptr() out of asm/compat.h
- split out the blkdev_compat_ptr_ioctl function; bug spotted by
  Ben Hutchings
- Improve formatting of documentation

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-01-10 00:14:46 -05:00
Mat Martineau
faf391c382 tcp: Define IPPROTO_MPTCP
To open a MPTCP socket with socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP),
IPPROTO_MPTCP needs a value that differs from IPPROTO_TCP. The existing
IPPROTO numbers mostly map directly to IANA-specified protocol numbers.
MPTCP does not have a protocol number allocated because MPTCP packets
use the TCP protocol number. Use private number not used OTA.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09 18:41:41 -08:00
Linus Torvalds
b5b3159cff Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
 "Just a few small fixups here"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: imx_sc_key - only take the valid data from SCU firmware as key state
  Input: add safety guards to input_set_keycode()
  Input: input_event - fix struct padding on sparc64
  Input: uinput - always report EPOLLOUT
2020-01-09 15:37:40 -08:00
David S. Miller
a2d6d7ae59 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The ungrafting from PRIO bug fixes in net, when merged into net-next,
merge cleanly but create a build failure.  The resolution used here is
from Petr Machata.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-09 12:13:43 -08:00
Andrey Ignatov
f5bfcd953d bpf: Document BPF_F_QUERY_EFFECTIVE flag
Document BPF_F_QUERY_EFFECTIVE flag, mostly to clarify how it affects
attach_flags what may not be obvious and what may lead to confision.

Specifically attach_flags is returned only for target_fd but if programs
are inherited from an ancestor cgroup then returned attach_flags for
current cgroup may be confusing. For example, two effective programs of
same attach_type can be returned but w/o BPF_F_ALLOW_MULTI in
attach_flags.

Simple repro:
  # bpftool c s /sys/fs/cgroup/path/to/task
  ID       AttachType      AttachFlags     Name
  # bpftool c s /sys/fs/cgroup/path/to/task effective
  ID       AttachType      AttachFlags     Name
  95043    ingress                         tw_ipt_ingress
  95048    ingress                         tw_ingress

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200108014006.938363-1-rdna@fb.com
2020-01-09 09:40:06 -08:00
Martin KaFai Lau
206057fe02 bpf: Add BPF_FUNC_tcp_send_ack helper
Add a helper to send out a tcp-ack.  It will be used in the later
bpf_dctcp implementation that requires to send out an ack
when the CE state changed.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109004551.3900448-1-kafai@fb.com
2020-01-09 08:46:18 -08:00
Martin KaFai Lau
85d33df357 bpf: Introduce BPF_MAP_TYPE_STRUCT_OPS
The patch introduces BPF_MAP_TYPE_STRUCT_OPS.  The map value
is a kernel struct with its func ptr implemented in bpf prog.
This new map is the interface to register/unregister/introspect
a bpf implemented kernel struct.

The kernel struct is actually embedded inside another new struct
(or called the "value" struct in the code).  For example,
"struct tcp_congestion_ops" is embbeded in:
struct bpf_struct_ops_tcp_congestion_ops {
	refcount_t refcnt;
	enum bpf_struct_ops_state state;
	struct tcp_congestion_ops data;  /* <-- kernel subsystem struct here */
}
The map value is "struct bpf_struct_ops_tcp_congestion_ops".
The "bpftool map dump" will then be able to show the
state ("inuse"/"tobefree") and the number of subsystem's refcnt (e.g.
number of tcp_sock in the tcp_congestion_ops case).  This "value" struct
is created automatically by a macro.  Having a separate "value" struct
will also make extending "struct bpf_struct_ops_XYZ" easier (e.g. adding
"void (*init)(void)" to "struct bpf_struct_ops_XYZ" to do some
initialization works before registering the struct_ops to the kernel
subsystem).  The libbpf will take care of finding and populating the
"struct bpf_struct_ops_XYZ" from "struct XYZ".

Register a struct_ops to a kernel subsystem:
1. Load all needed BPF_PROG_TYPE_STRUCT_OPS prog(s)
2. Create a BPF_MAP_TYPE_STRUCT_OPS with attr->btf_vmlinux_value_type_id
   set to the btf id "struct bpf_struct_ops_tcp_congestion_ops" of the
   running kernel.
   Instead of reusing the attr->btf_value_type_id,
   btf_vmlinux_value_type_id s added such that attr->btf_fd can still be
   used as the "user" btf which could store other useful sysadmin/debug
   info that may be introduced in the furture,
   e.g. creation-date/compiler-details/map-creator...etc.
3. Create a "struct bpf_struct_ops_tcp_congestion_ops" object as described
   in the running kernel btf.  Populate the value of this object.
   The function ptr should be populated with the prog fds.
4. Call BPF_MAP_UPDATE with the object created in (3) as
   the map value.  The key is always "0".

During BPF_MAP_UPDATE, the code that saves the kernel-func-ptr's
args as an array of u64 is generated.  BPF_MAP_UPDATE also allows
the specific struct_ops to do some final checks in "st_ops->init_member()"
(e.g. ensure all mandatory func ptrs are implemented).
If everything looks good, it will register this kernel struct
to the kernel subsystem.  The map will not allow further update
from this point.

Unregister a struct_ops from the kernel subsystem:
BPF_MAP_DELETE with key "0".

Introspect a struct_ops:
BPF_MAP_LOOKUP_ELEM with key "0".  The map value returned will
have the prog _id_ populated as the func ptr.

The map value state (enum bpf_struct_ops_state) will transit from:
INIT (map created) =>
INUSE (map updated, i.e. reg) =>
TOBEFREE (map value deleted, i.e. unreg)

The kernel subsystem needs to call bpf_struct_ops_get() and
bpf_struct_ops_put() to manage the "refcnt" in the
"struct bpf_struct_ops_XYZ".  This patch uses a separate refcnt
for the purose of tracking the subsystem usage.  Another approach
is to reuse the map->refcnt and then "show" (i.e. during map_lookup)
the subsystem's usage by doing map->refcnt - map->usercnt to filter out
the map-fd/pinned-map usage.  However, that will also tie down the
future semantics of map->refcnt and map->usercnt.

The very first subsystem's refcnt (during reg()) holds one
count to map->refcnt.  When the very last subsystem's refcnt
is gone, it will also release the map->refcnt.  All bpf_prog will be
freed when the map->refcnt reaches 0 (i.e. during map_free()).

Here is how the bpftool map command will look like:
[root@arch-fb-vm1 bpf]# bpftool map show
6: struct_ops  name dctcp  flags 0x0
	key 4B  value 256B  max_entries 1  memlock 4096B
	btf_id 6
[root@arch-fb-vm1 bpf]# bpftool map dump id 6
[{
        "value": {
            "refcnt": {
                "refs": {
                    "counter": 1
                }
            },
            "state": 1,
            "data": {
                "list": {
                    "next": 0,
                    "prev": 0
                },
                "key": 0,
                "flags": 2,
                "init": 24,
                "release": 0,
                "ssthresh": 25,
                "cong_avoid": 30,
                "set_state": 27,
                "cwnd_event": 28,
                "in_ack_event": 26,
                "undo_cwnd": 29,
                "pkts_acked": 0,
                "min_tso_segs": 0,
                "sndbuf_expand": 0,
                "cong_control": 0,
                "get_info": 0,
                "name": [98,112,102,95,100,99,116,99,112,0,0,0,0,0,0,0
                ],
                "owner": 0
            }
        }
    }
]

Misc Notes:
* bpf_struct_ops_map_sys_lookup_elem() is added for syscall lookup.
  It does an inplace update on "*value" instead returning a pointer
  to syscall.c.  Otherwise, it needs a separate copy of "zero" value
  for the BPF_STRUCT_OPS_STATE_INIT to avoid races.

* The bpf_struct_ops_map_delete_elem() is also called without
  preempt_disable() from map_delete_elem().  It is because
  the "->unreg()" may requires sleepable context, e.g.
  the "tcp_unregister_congestion_control()".

* "const" is added to some of the existing "struct btf_func_model *"
  function arg to avoid a compiler warning caused by this patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003505.3855919-1-kafai@fb.com
2020-01-09 08:46:18 -08:00
Martin KaFai Lau
27ae7997a6 bpf: Introduce BPF_PROG_TYPE_STRUCT_OPS
This patch allows the kernel's struct ops (i.e. func ptr) to be
implemented in BPF.  The first use case in this series is the
"struct tcp_congestion_ops" which will be introduced in a
latter patch.

This patch introduces a new prog type BPF_PROG_TYPE_STRUCT_OPS.
The BPF_PROG_TYPE_STRUCT_OPS prog is verified against a particular
func ptr of a kernel struct.  The attr->attach_btf_id is the btf id
of a kernel struct.  The attr->expected_attach_type is the member
"index" of that kernel struct.  The first member of a struct starts
with member index 0.  That will avoid ambiguity when a kernel struct
has multiple func ptrs with the same func signature.

For example, a BPF_PROG_TYPE_STRUCT_OPS prog is written
to implement the "init" func ptr of the "struct tcp_congestion_ops".
The attr->attach_btf_id is the btf id of the "struct tcp_congestion_ops"
of the _running_ kernel.  The attr->expected_attach_type is 3.

The ctx of BPF_PROG_TYPE_STRUCT_OPS is an array of u64 args saved
by arch_prepare_bpf_trampoline that will be done in the next
patch when introducing BPF_MAP_TYPE_STRUCT_OPS.

"struct bpf_struct_ops" is introduced as a common interface for the kernel
struct that supports BPF_PROG_TYPE_STRUCT_OPS prog.  The supporting kernel
struct will need to implement an instance of the "struct bpf_struct_ops".

The supporting kernel struct also needs to implement a bpf_verifier_ops.
During BPF_PROG_LOAD, bpf_struct_ops_find() will find the right
bpf_verifier_ops by searching the attr->attach_btf_id.

A new "btf_struct_access" is also added to the bpf_verifier_ops such
that the supporting kernel struct can optionally provide its own specific
check on accessing the func arg (e.g. provide limited write access).

After btf_vmlinux is parsed, the new bpf_struct_ops_init() is called
to initialize some values (e.g. the btf id of the supporting kernel
struct) and it can only be done once the btf_vmlinux is available.

The R0 checks at BPF_EXIT is excluded for the BPF_PROG_TYPE_STRUCT_OPS prog
if the return type of the prog->aux->attach_func_proto is "void".

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200109003503.3855825-1-kafai@fb.com
2020-01-09 08:46:18 -08:00
Jani Nikula
ec027b33c8 Merge drm/drm-next into drm-intel-next-queued
Sync with drm-next to get the new logging macros, among other things.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2020-01-09 17:19:12 +02:00
Dilip Kota
ed22aaaede PCI: dwc: intel: PCIe RC controller driver
Add support to PCIe RC controller on Intel Gateway SoCs.
PCIe controller is based of Synopsys DesignWare PCIe core.

Intel PCIe driver requires Upconfigure support, Fast Training
Sequence and link speed configurations. So adding the respective
helper functions in the PCIe DesignWare framework.
It also programs hardware autonomous speed during speed
configuration so defining it in pci_regs.h.

Also, mark Intel PCIe driver depends on MSI IRQ Domain
as Synopsys DesignWare framework depends on the
PCI_MSI_IRQ_DOMAIN.

Signed-off-by: Dilip Kota <eswara.kota@linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Andrew Murray <andrew.murray@arm.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com>
2020-01-09 11:57:18 +00:00
Mauro Carvalho Chehab
8821e92879 Merge tag 'v5.5-rc5' into patchwork
Linux 5.5-rc5

* tag 'v5.5-rc5': (1006 commits)
  Linux 5.5-rc5
  Documentation: riscv: add patch acceptance guidelines
  riscv: prefix IRQ_ macro names with an RV_ namespace
  clocksource: riscv: add notrace to riscv_sched_clock
  apparmor: fix aa_xattrs_match() may sleep while holding a RCU lock
  hexagon: define ioremap_uc
  ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less
  ocfs2: call journal flush to mark journal as empty after journal recovery when mount
  mm/hugetlb: defer freeing of huge pages if in non-task context
  mm/gup: fix memory leak in __gup_benchmark_ioctl
  mm/oom: fix pgtables units mismatch in Killed process message
  fs/posix_acl.c: fix kernel-doc warnings
  hexagon: work around compiler crash
  hexagon: parenthesize registers in asm predicates
  fs/namespace.c: make to_mnt_ns() static
  fs/nsfs.c: include headers for missing declarations
  fs/direct-io.c: include fs/internal.h for missing prototype
  mm: move_pages: return valid node id in status if the page is already on the target node
  memcg: account security cred as well to kmemcg
  kcov: fix struct layout for kcov_remote_arg
  ...
2020-01-08 11:13:25 +01:00
Andy Lutomirski
48446f198f random: ignore GRND_RANDOM in getentropy(2)
The separate blocking pool is going away.  Start by ignoring
GRND_RANDOM in getentropy(2).

This should not materially break any API.  Any code that worked
without this change should work at least as well with this change.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/705c5a091b63cc5da70c99304bb97e0109be0a26.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-07 16:07:01 -05:00
Andy Lutomirski
75551dbf11 random: add GRND_INSECURE to return best-effort non-cryptographic bytes
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/d5473b56cf1fa900ca4bd2b3fc1e5b8874399919.1577088521.git.luto@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-01-07 16:07:00 -05:00
Dhinakaran Pandiyan
0d3d29d0f8 drm/framebuffer: Format modifier for Intel Gen-12 media compression
Gen-12 display can decompress surfaces compressed by the media engine, add
a new modifier as the driver needs to know the surface was compressed by
the media or render engine.

v2: Update code comment describing the color plane order for YUV
    semiplanar formats.

Cc: Nanley G Chery <nanley.g.chery@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191231233756.18753-6-imre.deak@intel.com
2020-01-07 13:15:48 +02:00
Vladimir Oltean
6c93099450 mii: Add helpers for parsing SGMII auto-negotiation
Typically a MAC PCS auto-configures itself after it receives the
negotiated copper-side link settings from the PHY, but some MAC devices
are more special and need manual interpretation of the SGMII AN result.

In other cases, the PCS exposes the entire tx_config_reg base page as it
is transmitted on the wire during auto-negotiation, so it makes sense to
be able to decode the equivalent lp_advertised bit mask from the raw u16
(of course, "lp" considering the PCS to be the local PHY).

Therefore, add the bit definitions for the SGMII registers 4 and 5
(local device ability, link partner ability), as well as a link_mode
conversion helper that can be used to feed the AN results into
phy_resolve_aneg_linkmode.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-05 23:22:32 -08:00
Andrey Konovalov
a69b83e1ae kcov: fix struct layout for kcov_remote_arg
Make the layout of kcov_remote_arg the same for 32-bit and 64-bit code.
This makes it more convenient to write userspace apps that can be
compiled into 32-bit or 64-bit binaries and still work with the same
64-bit kernel.

Also use proper __u32 types in uapi headers instead of unsigned ints.

Link: http://lkml.kernel.org/r/9e91020876029cfefc9211ff747685eba9536426.1575638983.git.andreyknvl@google.com
Fixes: eec028c938 ("kcov: remote coverage support")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Chunfeng Yun <chunfeng.yun@mediatek.com>
Cc: "Jacky . Cao @ sony . com" <Jacky.Cao@sony.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-01-04 13:55:09 -08:00
Rijo Thomas
757cc3e9ff tee: add AMD-TEE driver
Adds AMD-TEE driver.
* targets AMD APUs which has AMD Secure Processor with software-based
  Trusted Execution Environment (TEE) support
* registers with TEE subsystem
* defines tee_driver_ops function callbacks
* kernel allocated memory is used as shared memory between normal
  world and secure world.
* acts as REE (Rich Execution Environment) communication agent, which
  uses the services of AMD Secure Processor driver to submit commands
  for processing in TEE environment

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Jens Wiklander <jens.wiklander@linaro.org>
Co-developed-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Signed-off-by: Devaraj Rangasamy <Devaraj.Rangasamy@amd.com>
Signed-off-by: Rijo Thomas <Rijo-john.Thomas@amd.com>
Reviewed-by: Gary R Hook <gary.hook@amd.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-01-04 13:49:51 +08:00
Michal Kalderon
93a3d05f9d RDMA/qedr: Add kernel capability flags for dpm enabled mode
HW/FW support two types of latency enhancement features.  Until now
user-space implemented only edpm (enhanced dpm).  We add kernel capability
flags to differentiate between current FW in kernel that supports both
ldpm and edpm.  Since edpm is not yet supported for iWARP we add different
flags for iWARP + RoCE.  We also fix bad practice of defining sizes in
rdma-core and pass initialization to kernel, for forward compatibility.

The capability flags are added for backward-forward compatibility between
kernel and rdma-core for qedr.

Before this change there was a field called dpm_enabled which could hold
either 0 or 1 value, this indicated whether RoCE edpm was enabled or
not. We modified this field to be dpm_flags, and bit 1 still holds the
same meaning of RoCE edpm being enabled or not.

Link: https://lore.kernel.org/r/20191121112957.25162-1-michal.kalderon@marvell.com
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-01-03 12:37:00 -04:00
Arnd Bergmann
577c89b0ce media: v4l2-core: fix v4l2_buffer handling for time64 ABI
The v4l2_buffer structure contains a 'struct timeval' member that is
defined by the user space C library, creating an ABI incompatibility
when that gets updated to a 64-bit time_t.

As in v4l2_event, handle this with a special case in video_put_user()
and video_get_user() to replace the memcpy there.

Since the structure also contains a pointer, there are now two
native versions (on 32-bit systems) as well as two compat versions
(on 64-bit systems), which unfortunately complicates the compat
handler quite a bit.

Duplicating the existing handlers for the new types is a safe
conversion for now, but unfortunately this may turn into a
maintenance burden later. A larger-scale rework of the
compat code might be a better alternative, but is out of scope
of the y2038 work.

Sparc64 needs a special case because of their special suseconds_t
definition.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-01-03 15:50:21 +01:00
Arnd Bergmann
1a6c0b36dd media: v4l2-core: fix VIDIOC_DQEVENT for time64 ABI
The v4l2_event structure contains a 'struct timespec' member that is
defined by the user space C library, creating an ABI incompatibility
when that gets updated to a 64-bit time_t.

While passing a 32-bit time_t here would be sufficient for CLOCK_MONOTONIC
timestamps, simply redefining the structure to use the kernel's
__kernel_old_timespec would not work for any library that uses a copy
of the linux/videodev2.h header file rather than including the copy from
the latest kernel headers.

This means the kernel has to be changed to handle both versions of the
structure layout on a 32-bit architecture. The easiest way to do this
is during the copy from/to user space.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-01-03 15:47:57 +01:00
Arnd Bergmann
77cdffcb0b media: v4l2: abstract timeval handling in v4l2_buffer
As a preparation for adding 64-bit time_t support in the uapi,
change the drivers to no longer care about the format of the
timestamp field in struct v4l2_buffer.

The v4l2_timeval_to_ns() function is no longer needed in the
kernel after this, but there is userspace code relying on
it to be part of the uapi header.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil-cisco@xs4all.nl: replace spaces by tabs]
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2020-01-03 15:43:35 +01:00
Dave Airlie
f5c547efa1 Merge tag 'drm-misc-next-2020-01-02' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v5.6:

UAPI Changes:
- Commandline parser: Add support for panel orientation, and per-mode options.
- Fix IOCTL naming for dma-buf heaps.

Cross-subsystem Changes:
- Rename DMA_HEAP_IOC_ALLOC to DMA_HEAP_IOCTL_ALLOC before it becomes abi.
- Change DMA-BUF system-heap's name to system.
- Fix leak in error handling in dma_heap_ioctl(), and make a symbol static.
- Fix udma-buf cpu access.
- Fix ti devicetree bindings.

Core Changes:
- Add CTA-861-G modes with VIC >= 193.
- Change error handling and remove bug_on in *drm_dev_init.
- Export drm_panel_of_backlight() correctly once more.
- Add support for lvds decoders.
- Convert drm/client and drm/(gem-,)fb-helper to drm-device based logging and update logging todo.

Driver Changes:
- Add support for dsi/px30 to rockchip.
- Add fb damage support to virtio.
- Use dma_resv locking wrappers in vc4, msm, etnaviv.
- Make functions in virtio static, and perform some simplifications.
- Add suspend support to sun4i.
- Add A64 mipi dsi support to sun4i.
- Add runtime pm suspend to komeda.
- Associated driver fixes.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/efc11139-1653-86bc-1b0f-0aefde219850@linux.intel.com
2020-01-03 11:43:44 +10:00
David Ahern
6b102db50c net: Add device index to tcp_md5sig
Add support for userspace to specify a device index to limit the scope
of an entry via the TCP_MD5SIG_EXT setsockopt. The existing __tcpm_pad
is renamed to tcpm_ifindex and the new field is only checked if the new
TCP_MD5SIG_FLAG_IFINDEX is set in tcpm_flags. For now, the device index
must point to an L3 master device (e.g., VRF). The API and error
handling are setup to allow the constraint to be relaxed in the future
to any device index.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-02 15:51:22 -08:00
Johannes Berg
7d6aa9ba4f Merge remote-tracking branch 'net-next/master' into mac80211-next
Merging to get the mac80211 updates that have since propagated
into net-next.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-01-02 14:50:58 +01:00
Sven Eckelmann
68e039f966 batman-adv: Update copyright years for 2020
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-01-01 00:00:33 +01:00
Eric Biggers
e933adde6f fscrypt: include <linux/ioctl.h> in UAPI header
<linux/fscrypt.h> defines ioctl numbers using the macros like _IOWR()
which are defined in <linux/ioctl.h>, so <linux/ioctl.h> should be
included as a prerequisite, like it is in many other kernel headers.

In practice this doesn't really matter since anyone referencing these
ioctl numbers will almost certainly include <sys/ioctl.h> too in order
to actually call ioctl().  But we might as well fix this.

Link: https://lore.kernel.org/r/20191219185624.21251-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-31 10:33:51 -06:00
Eric Biggers
93edd392ca fscrypt: support passing a keyring key to FS_IOC_ADD_ENCRYPTION_KEY
Extend the FS_IOC_ADD_ENCRYPTION_KEY ioctl to allow the raw key to be
specified by a Linux keyring key, rather than specified directly.

This is useful because fscrypt keys belong to a particular filesystem
instance, so they are destroyed when that filesystem is unmounted.
Usually this is desired.  But in some cases, userspace may need to
unmount and re-mount the filesystem while keeping the keys, e.g. during
a system update.  This requires keeping the keys somewhere else too.

The keys could be kept in memory in a userspace daemon.  But depending
on the security architecture and assumptions, it can be preferable to
keep them only in kernel memory, where they are unreadable by userspace.

We also can't solve this by going back to the original fscrypt API
(where for each file, the master key was looked up in the process's
keyring hierarchy) because that caused lots of problems of its own.

Therefore, add the ability for FS_IOC_ADD_ENCRYPTION_KEY to accept a
Linux keyring key.  This solves the problem by allowing userspace to (if
needed) save the keys securely in a Linux keyring for re-provisioning,
while still using the new fscrypt key management ioctls.

This is analogous to how dm-crypt accepts a Linux keyring key, but the
key is then stored internally in the dm-crypt data structures rather
than being looked up again each time the dm-crypt device is accessed.

Use a custom key type "fscrypt-provisioning" rather than one of the
existing key types such as "logon".  This is strongly desired because it
enforces that these keys are only usable for a particular purpose: for
fscrypt as input to a particular KDF.  Otherwise, the keys could also be
passed to any kernel API that accepts a "logon" key with any service
prefix, e.g. dm-crypt, UBIFS, or (recently proposed) AF_ALG.  This would
risk leaking information about the raw key despite it ostensibly being
unreadable.  Of course, this mistake has already been made for multiple
kernel APIs; but since this is a new API, let's do it right.

This patch has been tested using an xfstest which I wrote to test it.

Link: https://lore.kernel.org/r/20191119222447.226853-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-12-31 10:33:49 -06:00
David S. Miller
ba4028105e Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Remove #ifdef pollution around nf_ingress(), from Lukas Wunner.

2) Document ingress hook in netdevice, also from Lukas.

3) Remove htons() in tunnel metadata port netlink attributes,
   from Xin Long.

4) Missing erspan netlink attribute validation also from Xin Long.

5) Missing erspan version in tunnel, from Xin Long.

6) Missing attribute nest in NFTA_TUNNEL_KEY_OPTS_{VXLAN,ERSPAN}
   Patch from Xin Long.

7) Missing nla_nest_cancel() in tunnel netlink dump path,
   from Xin Long.

8) Remove two exported conntrack symbols with no clients,
   from Florian Westphal.

9) Add nft_meta_get_eval_time() helper to nft_meta, from Florian.

10) Add nft_meta_pkttype helper for loopback, also from Florian.

11) Add nft_meta_socket uid helper, from Florian Westphal.

12) Add nft_meta_cgroup helper, from Florian.

13) Add nft_meta_ifkind helper, from Florian.

14) Group all interface related meta selector, from Florian.

15) Add nft_prandom_u32() helper, from Florian.

16) Add nft_meta_rtclassid helper, from Florian.

17) Add support for matching on the slave device index,
    from Florian.

This batch, among other things, contains updates for the netfilter
tunnel netlink interface: This extension is still incomplete and lacking
proper userspace support which is actually my fault, I did not find the
time to go back and finish this. This update is breaking tunnel UAPI in
some aspects to fix it but do it better sooner than never.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-30 14:22:11 -08:00
Takashi Iwai
d06ed0c209 ALSA: uapi: Add linux/types.h include back (but carefully)
A few uapi/sound/*.h headers have been corrected for recovering from
the compile errors with the existing user-space code (alsa-lib) by the
recent commits.  OTOH, these introduced another regression, as now
linux/types.h inclusion became mandatory for the uapi header checks.

As a compromise, this patch re-adds linux/types.h inclusions again,
but conditionally not to break other non-standard user-space stuff
again.

Fixes: 2e46886763 ("ALSA: emu10k1: Make uapi/emu10k1.h compilable again")
Fixes: d63e63d421 ("ALSA: hdsp: Make uapi/hdsp.h compilable again")
Fixes: 4fa406caf9 ("ALSA: hdspm: Drop linux/types.h inclusion in uapi header")
Reported-by: kbuild test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20191230212742.28925-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-12-30 22:28:59 +01:00
Michal Kubecek
3d2b847fb9 ethtool: provide link state with LINKSTATE_GET request
Implement LINKSTATE_GET netlink request to get link state information.

At the moment, only link up flag as provided by ETHTOOL_GLINK ioctl command
is returned.

LINKSTATE_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
1b1b1847c8 ethtool: add LINKMODES_NTF notification
Send ETHTOOL_MSG_LINKMODES_NTF notification message whenever device link
settings or advertised modes are modified using ETHTOOL_MSG_LINKMODES_SET
netlink message or ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.

The notification message has the same format as reply to LINKMODES_GET
request. ETHTOOL_MSG_LINKMODES_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.

As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKMODES_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKMODES_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
bfbcfe2032 ethtool: set link modes related data with LINKMODES_SET request
Implement LINKMODES_SET netlink request to set advertised linkmodes and
related attributes as ETHTOOL_SLINKSETTINGS and ETHTOOL_SSET commands do.

The request allows setting autonegotiation flag, speed, duplex and
advertised link modes.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
f625aa9be8 ethtool: provide link mode information with LINKMODES_GET request
Implement LINKMODES_GET netlink request to get link modes related
information provided by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl
commands.

This request provides supported, advertised and peer advertised link modes,
autonegotiation flag, speed and duplex.

LINKMODES_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
73286734c1 ethtool: add LINKINFO_NTF notification
Send ETHTOOL_MSG_LINKINFO_NTF notification message whenever device link
settings are modified using ETHTOOL_MSG_LINKINFO_SET netlink message or
ETHTOOL_SLINKSETTINGS or ETHTOOL_SSET ioctl commands.

The notification message has the same format as reply to LINKINFO_GET
request. ETHTOOL_MSG_LINKINFO_SET netlink request only triggers the
notification if there is a change but the ioctl command handlers do not
check if there is an actual change and trigger the notification whenever
the commands are executed.

As all work is done by ethnl_default_notify() handler and callback
functions introduced to handle LINKINFO_GET requests, all that remains is
adding entries for ETHTOOL_MSG_LINKINFO_NTF into ethnl_notify_handlers and
ethnl_default_notify_ops lookup tables and calls to ethtool_notify() where
needed.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
a53f3d41e4 ethtool: set link settings with LINKINFO_SET request
Implement LINKINFO_SET netlink request to set link settings queried by
LINKINFO_GET message.

Only physical port, phy MDIO address and MDI(-X) control can be set,
attempt to modify MDI(-X) status and transceiver is rejected.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
459e0b81b3 ethtool: provide link settings with LINKINFO_GET request
Implement LINKINFO_GET netlink request to get basic link settings provided
by ETHTOOL_GLINKSETTINGS and ETHTOOL_GSET ioctl commands.

This request provides settings not directly related to autonegotiation and
link mode selection: physical port, phy MDIO address, MDI(-X) status,
MDI(-X) control and transceiver.

LINKINFO_GET request can be used with NLM_F_DUMP (without device
identification) to request the information for all devices in current
network namespace providing the data.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:02 -08:00
Michal Kubecek
71921690f9 ethtool: provide string sets with STRSET_GET request
Requests a contents of one or more string sets, i.e. indexed arrays of
strings; this information is provided by ETHTOOL_GSSET_INFO and
ETHTOOL_GSTRINGS commands of ioctl interface. Unlike ioctl interface, all
information can be retrieved with one request and mulitple string sets can
be requested at once.

There are three types of requests:

  - no NLM_F_DUMP, no device: get "global" stringsets
  - no NLM_F_DUMP, with device: get string sets related to the device
  - NLM_F_DUMP, no device: get device related string sets for all devices

Client can request either all string sets of given type (global or device
related) or only specific sets. With ETHTOOL_A_STRSET_COUNTS flag set, only
set sizes (numbers of strings) are returned.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:01 -08:00
Michal Kubecek
6b08d6c146 ethtool: support for netlink notifications
Add infrastructure for ethtool netlink notifications. There is only one
multicast group "monitor" which is used to notify userspace about changes
and actions performed. Notification messages (types using suffix _NTF)
share the format with replies to GET requests.

Notifications are supposed to be broadcasted on every configuration change,
whether it is done using the netlink interface or ioctl one. Netlink SET
requests only trigger a notification if some data is actually changed.

To trigger an ethtool notification, both ethtool netlink and external code
use ethtool_notify() helper. This helper requires RTNL to be held and may
sleep. Handlers sending messages for specific notification message types
are registered in ethnl_notify_handlers array. As notifications can be
triggered from other code, ethnl_ok flag is used to prevent an attempt to
send notification before genetlink family is registered.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-27 16:40:01 -08:00