Commit Graph

6197 Commits

Author SHA1 Message Date
David S. Miller
3efa70d78f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflict was an interaction between a bug fix in the
netvsc driver in 'net' and an optimization of the RX path
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-07 16:29:30 -05:00
Paolo Bonzini
d5b798c15f Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
The big feature this time is support for POWER9 using the radix-tree
MMU for host and guest.  This required some changes to arch/powerpc
code, so I talked with Michael Ellerman and he created a topic branch
with this patchset, which I merged into kvm-ppc-next and which Michael
will pull into his tree.  Michael also put in some patches from Nick
Piggin which fix bugs in the interrupt vector code in relocatable
kernels when coming from a KVM guest.

Other notable changes include:

* Add the ability to change the size of the hashed page table,
  from David Gibson.

* XICS (interrupt controller) emulation fixes and improvements,
  from Li Zhong.

* Bug fixes from myself and Thomas Huth.

These patches define some new KVM capabilities and ioctls, but there
should be no conflicts with anything else currently upstream, as far
as I am aware.
2017-02-07 18:17:46 +01:00
Marcelo Tosatti
55dd00a73a KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall
Add a hypercall to retrieve the host realtime clock and the TSC value
used to calculate that clock read.

Used to implement clock synchronization between host and guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-07 18:16:45 +01:00
Scott Bauer
8c87fe7220 Fix SED-OPAL UAPI structs to prevent 32/64 bit size differences.
This patch is a quick fixup of the user structures that will prevent
the structures from being different sizes on 32 and 64 bit archs.
Taking this fix will allow us to *NOT* have to do compat ioctls for
the sed code.

Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Fixes: 19641f2d76 ("Include: Uapi: Add user ABI for Sed/Opal")
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-06 17:45:29 -07:00
Scott Bauer
19641f2d76 Include: Uapi: Add user ABI for Sed/Opal
Signed-off-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Rafael Antognolli <Rafael.Antognolli@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-06 09:44:17 -07:00
Kevin Cernekee
a963d710f3 netfilter: ctnetlink: Fix regression in CTA_STATUS processing
The libnetfilter_conntrack userland library always sets IPS_CONFIRMED
when building a CTA_STATUS attribute.  If this toggles the bit from
0->1, the parser will return an error.  On Linux 4.4+ this will cause any
NFQA_EXP attribute in the packet to be ignored.  This breaks conntrackd's
userland helpers because they operate on unconfirmed connections.

Instead of returning -EBUSY if the user program asks to modify an
unchangeable bit, simply ignore the change.

Also, fix the logic so that user programs are allowed to clear
the bits that they are allowed to change.

Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-06 12:48:26 +01:00
Greg Kroah-Hartman
a769f30c7b Merge 4.10-rc7 into staging-next
This resolves the merge errors that were reported in linux-next and it
picks up the staging and IIO fixes that we need/want in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-06 09:36:10 +01:00
David S. Miller
52e01b84a2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next
tree, they are:

1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
   sk_buff so we only access one single cacheline in the conntrack
   hotpath. Patchset from Florian Westphal.

2) Don't leak pointer to internal structures when exporting x_tables
   ruleset back to userspace, from Willem DeBruijn. This includes new
   helper functions to copy data to userspace such as xt_data_to_user()
   as well as conversions of our ip_tables, ip6_tables and arp_tables
   clients to use it. Not surprinsingly, ebtables requires an ad-hoc
   update. There is also a new field in x_tables extensions to indicate
   the amount of bytes that we copy to userspace.

3) Add nf_log_all_netns sysctl: This new knob allows you to enable
   logging via nf_log infrastructure for all existing netnamespaces.
   Given the effort to provide pernet syslog has been discontinued,
   let's provide a way to restore logging using netfilter kernel logging
   facilities in trusted environments. Patch from Michal Kubecek.

4) Validate SCTP checksum from conntrack helper, from Davide Caratti.

5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
   a copy&paste from the original helper, from Florian Westphal.

6) Reset netfilter state when duplicating packets, also from Florian.

7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
   nft_meta, from Liping Zhang.

8) Add missing code to deal with loopback packets from nft_meta when
   used by the netdev family, also from Liping.

9) Several cleanups on nf_tables, one to remove unnecessary check from
   the netlink control plane path to add table, set and stateful objects
   and code consolidation when unregister chain hooks, from Gao Feng.

10) Fix harmless reference counter underflow in IPVS that, however,
    results in problems with the introduction of the new refcount_t
    type, from David Windsor.

11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
    from Davide Caratti.

12) Missing documentation on nf_tables uapi header, from Liping Zhang.

13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 16:58:20 -05:00
Roopa Prabhu
b3c7ef0ada bridge: uapi: add per vlan tunnel info
New nested netlink attribute to associate tunnel info per vlan.
This is used by bridge driver to send tunnel metadata to
bridge ports in vlan tunnel mode. This patch also adds new per
port flag IFLA_BRPORT_VLAN_TUNNEL to enable vlan tunnel mode.
off by default.

One example use for this is a vxlan bridging gateway or vtep
which maps vlans to vn-segments (or vnis). User can configure
per-vlan tunnel information which the bridge driver can use
to bridge vlan into the corresponding vn-segment.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 15:21:21 -05:00
Roopa Prabhu
3ad7a4b141 vxlan: support fdb and learning in COLLECT_METADATA mode
Vxlan COLLECT_METADATA mode today solves the per-vni netdev
scalability problem in l3 networks. It expects all forwarding
information to be present in dst_metadata. This patch series
enhances collect metadata mode to include the case where only
vni is present in dst_metadata, and the vxlan driver can then use
the rest of the forwarding information datbase to make forwarding
decisions. There is no change to default COLLECT_METADATA
behaviour. These changes only apply to COLLECT_METADATA when
used with the bridging use-case with a special dst_metadata
tunnel info flag (eg: where vxlan device is part of a bridge).
For all this to work, the vxlan driver will need to now support a
single fdb table hashed by mac + vni. This series essentially makes
this happen.

use-case and workflow:
vxlan collect metadata device participates in bridging vlan
to vn-segments. Bridge driver above the vxlan device,
sends the vni corresponding to the vlan in the dst_metadata.
vxlan driver will lookup forwarding database with (mac + vni)
for the required remote destination information to forward the
packet.

Changes introduced by this patch:
    - allow learning and forwarding database state in vxlan netdev in
      COLLECT_METADATA mode. Current behaviour is not changed
      by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
      to support the new bridge friendly mode.
    - A single fdb table hashed by (mac, vni) to allow fdb entries with
      multiple vnis in the same fdb table
    - rx path already has the vni
    - tx path expects a vni in the packet with dst_metadata
    - prior to this series, fdb remote_dsts carried remote vni and
      the vxlan device carrying the fdb table represented the
      source vni. With the vxlan device now representing multiple vnis,
      this patch adds a src vni attribute to the fdb entry. The remote
      vni already uses NDA_VNI attribute. This patch introduces
      NDA_SRC_VNI netlink attribute to represent the src vni in a multi
      vni fdb table.

iproute2 example (patched and pruned iproute2 output to just show
relevant fdb entries):
example shows same host mac learnt on two vni's.

before (netdev per vni):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self

after this patch with collect metadata in bridged mode (single netdev):
$bridge fdb show | grep "00:02:00:00:00:03"
00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 15:21:21 -05:00
Yotam Gigi
295a6e06d2 net/sched: act_ife: Change to use ife module
Use the encode/decode functionality from the ife module instead of using
implementation inside the act_ife.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 15:16:46 -05:00
Yotam Gigi
1ce8460496 net: Introduce ife encapsulation module
This module is responsible for the ife encapsulation protocol
encode/decode logics. That module can:
 - ife_encode: encode skb and reserve space for the ife meta header
 - ife_decode: decode skb and extract the meta header size
 - ife_tlv_meta_encode - encodes one tlv entry into the reserved ife
   header space.
 - ife_tlv_meta_decode - decodes one tlv entry from the packet
 - ife_tlv_meta_next - advance to the next tlv

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 15:16:45 -05:00
David Lebrun
013e816789 ipv6: sr: remove cleanup flag and fix HMAC computation
In the latest version of the IPv6 Segment Routing IETF draft [1] the
cleanup flag is removed and the flags field length is shrunk from 16 bits
to 8 bits. As a consequence, the input of the HMAC computation is modified
in a non-backward compatible way by covering the whole octet of flags
instead of only the cleanup bit. As such, if an implementation compatible
with the latest draft computes the HMAC of an SRH who has other flags set
to 1, then the HMAC result would differ from the current implementation.

This patch carries those modifications to prevent conflict with other
implementations of IPv6 SR.

[1] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-05

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-03 11:05:23 -05:00
Eric Dumazet
8fe809a992 net: add LINUX_MIB_PFMEMALLOCDROP counter
Debugging issues caused by pfmemalloc is often tedious.

Add a new SNMP counter to more easily diagnose these problems.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 23:34:19 -05:00
Andrey Vagin
ba94f3088b unix: add ioctl to open a unix socket file with O_PATH
This ioctl opens a file to which a socket is bound and
returns a file descriptor. The caller has to have CAP_NET_ADMIN
in the socket network namespace.

Currently it is impossible to get a path and a mount point
for a socket file. socket_diag reports address, device ID and inode
number for unix sockets. An address can contain a relative path or
a file may be moved somewhere. And these properties say nothing about
a mount namespace and a mount point of a socket file.

With the introduced ioctl, we can get a path by reading
/proc/self/fd/X and get mnt_id from /proc/self/fdinfo/X.

In CRIU we are going to use this ioctl to dump and restore unix socket.

Here is an example how it can be used:

$ strace -e socket,bind,ioctl ./test /tmp/test_sock
socket(AF_UNIX, SOCK_STREAM, 0)         = 3
bind(3, {sa_family=AF_UNIX, sun_path="test_sock"}, 11) = 0
ioctl(3, SIOCUNIXFILE, 0)           = 4
^Z

$ ss -a | grep test_sock
u_str  LISTEN     0      1      test_sock 17798                 * 0

$ ls -l /proc/760/fd/{3,4}
lrwx------ 1 root root 64 Feb  1 09:41 3 -> 'socket:[17798]'
l--------- 1 root root 64 Feb  1 09:41 4 -> /tmp/test_sock

$ cat /proc/760/fdinfo/4
pos:	0
flags:	012000000
mnt_id:	40

$ cat /proc/self/mountinfo | grep "^40\s"
40 19 0:37 / /tmp rw shared:23 - tmpfs tmpfs rw

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 21:58:02 -05:00
Eric W. Biederman
015bb305b8 Merge branch 'nsfs-discovery'
Michael Kerrisk <<mtk.manpages@gmail.com> writes:

I would like to write code that discovers the namespace setup on a live
system.  The NS_GET_PARENT and NS_GET_USERNS ioctl() operations added in
Linux 4.9 provide much of what I want, but there are still a couple of
small pieces missing. Those pieces are added with this patch series.

Here's an example program that makes use of the new ioctl() operations.

8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---
/* ns_capable.c

   (C) 2016 Michael Kerrisk, <mtk.manpages@gmail.com>

   Licensed under the GNU General Public License v2 or later.

   Test whether a process (identified by PID) might (subject to LSM checks)
   have capabilities in a namespace (identified by a /proc/PID/ns/xxx file).
*/

			} while (0)

     			     exit(EXIT_FAILURE); } while (0)

/* Display capabilities sets of process with specified PID */

static void
show_cap(pid_t pid)
{
    cap_t caps;
    char *cap_string;

    caps = cap_get_pid(pid);
    if (caps == NULL)
	errExit("cap_get_proc");

    cap_string = cap_to_text(caps, NULL);
    if (cap_string == NULL)
	errExit("cap_to_text");

    printf("Capabilities: %s\n", cap_string);
}

/* Obtain the effective UID pf the process 'pid' by
   scanning its /proc/PID/file */

static uid_t
get_euid_of_process(pid_t pid)
{
    char path[PATH_MAX];
    char line[1024];
    int uid;

    snprintf(path, sizeof(path), "/proc/%ld/status", (long) pid);

    FILE *fp;
    fp = fopen(path, "r");
    if (fp == NULL)
	errExit("fopen-/proc/PID/status");

    for (;;) {
	if (fgets(line, sizeof(line), fp) == NULL) {

	    /* Should never happen... */

	    fprintf(stderr, "Failure scanning %s\n", path);
	    exit(EXIT_FAILURE);
	}

	if (strstr(line, "Uid:") == line) {
	    sscanf(line, "Uid: %*d %d %*d %*d", &uid);
	    return uid;
	}
    }
}

int
main(int argc, char *argv[])
{
    int ns_fd, userns_fd, pid_userns_fd;
    int nstype;
    int next_fd;
    struct stat pid_stat;
    struct stat target_stat;
    char *pid_str;
    pid_t pid;
    char path[PATH_MAX];

    if (argc < 2) {
	fprintf(stderr, "Usage: %s PID [ns-file]\n", argv[0]);
	fprintf(stderr, "\t'ns-file' is a /proc/PID/ns/xxxx file; "
		        "if omitted, use the namespace\n"
			"\treferred to by standard input "
			"(file descriptor 0)\n");
	exit(EXIT_FAILURE);
    }

    pid_str = argv[1];
    pid = atoi(pid_str);

    if (argc <= 2) {
	ns_fd = STDIN_FILENO;
    } else {
        ns_fd = open(argv[2], O_RDONLY);
        if (ns_fd == -1)
	    errExit("open-ns-file");
    }

    /* Get the relevant user namespace FD, which is 'ns_fd' if 'ns_fd' refers
       to a user namespace, otherwise the user namespace that owns 'ns_fd' */

    nstype = ioctl(ns_fd, NS_GET_NSTYPE);
    if (nstype == -1)
	errExit("ioctl-NS_GET_NSTYPE");

    if (nstype == CLONE_NEWUSER) {
	userns_fd = ns_fd;
    } else {
	userns_fd = ioctl(ns_fd, NS_GET_USERNS);
        if (userns_fd == -1)
	    errExit("ioctl-NS_GET_USERNS");
    }

    /* Obtain 'stat' info for the user namespace of the specified PID */

    snprintf(path, sizeof(path), "/proc/%s/ns/user", pid_str);

    pid_userns_fd = open(path, O_RDONLY);
    if (pid_userns_fd == -1)
	errExit("open-PID");

    if (fstat(pid_userns_fd, &pid_stat) == -1)
	errExit("fstat-PID");

    /* Get 'stat' info for the target user namesapce */

    if (fstat(userns_fd, &target_stat) == -1)
	errExit("fstat-PID");

    /* If the PID is in the target user namespace, then it has
       whatever capabilities are in its sets. */

    if (pid_stat.st_dev == target_stat.st_dev &&
		pid_stat.st_ino == target_stat.st_ino) {
        printf("PID is in target namespace\n");
	printf("Subject to LSM checks, it has the following capabilities\n");

	show_cap(pid);

	exit(EXIT_SUCCESS);
    }

    /* Otherwise, we need to walk through the ancestors of the target
       user namespace to see if PID is in an ancestor namespace */

    for (;;) {
	int f;

	next_fd = ioctl(userns_fd, NS_GET_PARENT);

	if (next_fd == -1) {

	    /* The error here should be EPERM... */

	    if (errno != EPERM)
	        errExit("ioctl-NS_GET_PARENT");

	    printf("PID is not in an ancestor namespace\n");
	    printf("It has no capabilities in the target namespace\n");

	    exit(EXIT_SUCCESS);
	}

        if (fstat(next_fd, &target_stat) == -1)
	    errExit("fstat-PID");

	/* If the 'stat' info for this user namespace matches the 'stat'
	 * info for 'next_fd', then the PID is in an ancestor namespace */

        if (pid_stat.st_dev == target_stat.st_dev &&
		    pid_stat.st_ino == target_stat.st_ino)
	    break;

	/* Next time round, get the next parent */

	f = userns_fd;
	userns_fd = next_fd;
	close(f);
    }

    /* At this point, we found that PID is in an ancestor of the target
       user namespace, and 'userns_fd' refers to the immediate descendant
       user namespace of PID in the chain of user namespaces from PID to
       the target user namespace. If the effective UID of PID matches the
       owner UID of descendant user namespace, then PID has all
       capabilities in the descendant namespace(s); otherwise, it just has
       the capabilities that are in its sets. */

    uid_t owner_uid, uid;
    if (ioctl(userns_fd, NS_GET_OWNER_UID, &owner_uid) == -1) {
	perror("ioctl-NS_GET_OWNER_UID");
	exit(EXIT_FAILURE);
    }

    uid = get_euid_of_process(pid);

    printf("PID is in an ancestor namespace\n");
    if (owner_uid == uid) {
	printf("And its effective UID matches the owner "
		"of the namespace\n");
	printf("Subject to LSM checks, PID has all capabilities in "
		"that namespace!\n");
    } else {
	printf("But its effective UID does not match the owner "
		"of the namespace\n");
	printf("Subject to LSM checks, it has the following capabilities\n");
	show_cap(pid);
    }

    exit(EXIT_SUCCESS);
}
8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---8x---

Michael Kerrisk (2):
  nsfs: Add an ioctl() to return the namespace type
  nsfs: Add an ioctl() to return owner UID of a userns

 fs/nsfs.c                 | 13 +++++++++++++
 include/uapi/linux/nsfs.h |  9 +++++++--
 2 files changed, 20 insertions(+), 2 deletions(-)
2017-02-03 14:53:27 +13:00
Michael Kerrisk (man-pages)
d95fa3c76a nsfs: Add an ioctl() to return owner UID of a userns
I'd like to write code that discovers the user namespace hierarchy on a
running system, and also shows who owns the various user namespaces.
Currently, there is no way of getting the owner UID of a user namespace.
Therefore, this patch adds a new NS_GET_CREATOR_UID ioctl() that fetches
the UID (as seen in the user namespace of the caller) of the creator of
the user namespace referred to by the specified file descriptor.

If the supplied file descriptor does not refer to a user namespace,
the operation fails with the error EINVAL. If the owner UID does
not have a mapping in the caller's user namespace return the
overflow UID as that appears easier to deal with in practice
in user-space applications.

-- EWB Changed the handling of unmapped UIDs from -EOVERFLOW
   back to the overflow uid.  Per conversation with
   Michael Kerrisk after examining his test code.

Acked-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Michael Kerrisk <mtk-manpages@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-02-03 14:35:43 +13:00
David S. Miller
e2160156bf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All merge conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-02 16:54:00 -05:00
Eric W. Biederman
93faccbbfa fs: Better permission checking for submounts
To support unprivileged users mounting filesystems two permission
checks have to be performed: a test to see if the user allowed to
create a mount in the mount namespace, and a test to see if
the user is allowed to access the specified filesystem.

The automount case is special in that mounting the original filesystem
grants permission to mount the sub-filesystems, to any user who
happens to stumble across the their mountpoint and satisfies the
ordinary filesystem permission checks.

Attempting to handle the automount case by using override_creds
almost works.  It preserves the idea that permission to mount
the original filesystem is permission to mount the sub-filesystem.
Unfortunately using override_creds messes up the filesystems
ordinary permission checks.

Solve this by being explicit that a mount is a submount by introducing
vfs_submount, and using it where appropriate.

vfs_submount uses a new mount internal mount flags MS_SUBMOUNT, to let
sget and friends know that a mount is a submount so they can take appropriate
action.

sget and sget_userns are modified to not perform any permission checks
on submounts.

follow_automount is modified to stop using override_creds as that
has proven problemantic.

do_mount is modified to always remove the new MS_SUBMOUNT flag so
that we know userspace will never by able to specify it.

autofs4 is modified to stop using current_real_cred that was put in
there to handle the previous version of submount permission checking.

cifs is modified to pass the mountpoint all of the way down to vfs_submount.

debugfs is modified to pass the mountpoint all of the way down to
trace_automount by adding a new parameter.  To make this change easier
a new typedef debugfs_automount_t is introduced to capture the type of
the debugfs automount function.

Cc: stable@vger.kernel.org
Fixes: 069d5ac9ae ("autofs:  Fix automounts by using current_real_cred()->uid")
Fixes: aeaa4a79ff ("fs: Call d_automount with the filesystems creds")
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2017-02-02 04:36:12 +13:00
Daniel Vetter
e2b06d71bd Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queued
Chris Wilson wants the new fence tracepoint added in

commit 8c96c67801
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Jan 24 11:57:58 2017 +0000

    dma/fence: Export enable-signaling tracepoint for emission by drivers

Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2017-02-01 10:58:11 +01:00
Dave Airlie
29a73d906b Merge branch 'drm-next-4.11' of git://people.freedesktop.org/~agd5f/linux into drm-next
This is the main feature pull for radeon and amdgpu for 4.11.  Highlights:
- Power and clockgating improvements
- Preliminary SR-IOV support
- ttm buffer priority support
- ttm eviction fixes
- Removal of the ttm lru callbacks
- Remove SI DPM quirks due to MC firmware issues
- Handle VFCT with multiple vbioses
- Powerplay improvements
- Lots of driver cleanups

* 'drm-next-4.11' of git://people.freedesktop.org/~agd5f/linux: (120 commits)
  drm/amdgpu: fix amdgpu_bo_va_mapping flags
  drm/amdgpu: access stolen VRAM directly on CZ (v2)
  drm/amdgpu: access stolen VRAM directly on KV/KB (v2)
  drm/amdgpu: fix kernel panic when dpm disabled on Kv.
  drm/amdgpu: fix dpm bug on Kv.
  drm/amd/powerplay: fix regresstion issue can't set manual dpm mode.
  drm/amdgpu: handle vfct with multiple vbios images
  drm/radeon: handle vfct with multiple vbios images
  drm/amdgpu: move misc si headers into amdgpu
  drm/amdgpu: remove unused header si_reg.h
  drm/radeon: drop pitcairn dpm quirks
  drm/amdgpu: drop pitcairn dpm quirks
  drm: radeon: radeon_ttm: Handle return NULL error from ioremap_nocache
  drm/amd/amdgpu/amdgpu_ttm: Handle return NULL error from ioremap_nocache
  drm/amdgpu: add new virtual display ID
  drm/amd/amdgpu: remove the uncessary parameter for ib scheduler
  drm/amdgpu: Bring bo creation in line with radeon driver (v2)
  drm/amd/powerplay: fix misspelling in header guard
  drm/ttm: revert "add optional LRU removal callback v2"
  drm/ttm: revert "implement LRU add callbacks v2"
  ...
2017-02-01 08:39:35 +10:00
Dave Airlie
012bbe28c0 Merge tag 'drm-misc-next-2017-01-30' of git://anongit.freedesktop.org/git/drm-misc into drm-next
Another round of -misc stuff:
- Noralf debugfs cleanup cleanup (not yet everything, some more driver
  patches awaiting acks).
- More doc work.
- edid/infoframe fixes from Ville.
- misc 1-patch fixes all over, as usual

Noralf needs this for his tinydrm pull request.

* tag 'drm-misc-next-2017-01-30' of git://anongit.freedesktop.org/git/drm-misc: (48 commits)
  drm/vc4: Remove vc4_debugfs_cleanup()
  dma/fence: Export enable-signaling tracepoint for emission by drivers
  drm/tilcdc: Remove tilcdc_debugfs_cleanup()
  drm/tegra: Remove tegra_debugfs_cleanup()
  drm/sti: Remove drm_debugfs_remove_files() calls
  drm/radeon: Remove drm_debugfs_remove_files() call
  drm/omap: Remove omap_debugfs_cleanup()
  drm/hdlcd: Remove hdlcd_debugfs_cleanup()
  drm/etnaviv: Remove etnaviv_debugfs_cleanup()
  drm/etnaviv: allow build with COMPILE_TEST
  drm/amd/amdgpu: Remove drm_debugfs_remove_files() call
  drm/prime: Clarify DMA-BUF/GEM Object lifetime
  drm/ttm: Make sure BOs being swapped out are cacheable
  drm/atomic: Remove drm_atomic_debugfs_cleanup()
  drm: drm_minor_register(): Clean up debugfs on failure
  drm: debugfs: Remove all files automatically on cleanup
  drm/fourcc: add vivante tiled layout format modifiers
  drm/edid: Set YQ bits in the AVI infoframe according to CEA-861-F
  drm/edid: Set AVI infoframe Q even when QS=0
  drm/edid: Introduce drm_hdmi_avi_infoframe_quant_range()
  ...
2017-02-01 08:31:09 +10:00
J. Bruce Fields
32ddd944a0 nfsd: opt in to labeled nfs per export
Currently turning on NFSv4.2 results in 4.2 clients suddenly seeing the
individual file labels as they're set on the server.  This is not what
they've previously seen, and not appropriate in may cases.  (In
particular, if clients have heterogenous security policies then one
client's labels may not even make sense to another.)  Labeled NFS should
be opted in only in those cases when the administrator knows it makes
sense.

It's helpful to be able to turn 4.2 on by default, and otherwise the
protocol upgrade seems free of regressions.  So, default labeled NFS to
off and provide an export flag to reenable it.

Users wanting labeled NFS support on an export will henceforth need to:

	- make sure 4.2 support is enabled on client and server (as
	  before), and
	- upgrade the server nfs-utils to a version supporting the new
	  "security_label" export flag.
	- set that "security_label" flag on the export.

This is commit may be seen as a regression to anyone currently depending
on security labels.  We believe those cases are currently rare.

Reported-by: tibbs@math.uh.edu
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-31 12:31:54 -05:00
Matias Bjørling
84d4add793 lightnvm: add ioctls for vector I/Os
Enable user-space to issue vector I/O commands through ioctls. To issue
a vector I/O, the ppa list with addresses is also required and must be
mapped for the controller to access.

For each ioctl, the result and status bits are returned as well, such
that user-space can retrieve the open-channel SSD completion bits.

The implementation covers the traditional use-cases of bad block
management, and vectored read/write/erase.

Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Metadata implementation, test, and fixes.
Signed-off-by: Simon A.F. Lund <slund@cnexlabs.com>
Signed-off-by: Matias Bjørling <matias@cnexlabs.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 08:32:13 -07:00
David Gibson
ef1ead0c3b KVM: PPC: Book3S HV: HPT resizing documentation and reserved numbers
This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime.  It also adds
definitions for two new VM ioctl()s to implement this extension, and
documentation of the same.

Note that, HPT resizing is already possible with KVM PR without kernel
modification, since the HPT is managed within userspace (qemu).  The
capability defined here will only be set where an in-kernel implementation
of resizing is necessary, i.e. for KVM HV.  To determine if the userspace
resize implementation can be used, it's necessary to check
KVM_CAP_PPC_ALLOC_HTAB.  Unfortunately older kernels incorrectly set
KVM_CAP_PPC_ALLOC_HTAB even with KVM PR.  If userspace it want to support
resizing with KVM PR on such kernels, it will need a workaround.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:58:59 +11:00
Paul Mackerras
c927013227 KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest.  The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables.  (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).

The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.

The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU.  It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.

Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:47 +11:00
Pavel Belous
94842b4fc4 net: ethtool: add support for 2500BaseT and 5000BaseT link modes
This patch introduce support for 2500BaseT and 5000BaseT link modes.
These modes are included in the new IEEE 802.3bz standard.

Signed-off-by: Pavel Belous <pavel.s.belous@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-30 10:14:28 -05:00
Yuchung Cheng
7e98102f48 tcp: record pkts sent and retransmistted
Add two stats in SCM_TIMESTAMPING_OPT_STATS:

TCP_NLA_DATA_SEGS_OUT: total data packets sent including retransmission
TCP_NLA_TOTAL_RETRANS: total data packets retransmitted

The names are picked to be consistent with corresponding fields in
TCP_INFO. This allows applications that are using the timestamping
API to measure latency stats to also retrive retransmission rate
of application write.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-29 19:17:23 -05:00
David S. Miller
4e8f2fc1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two trivial overlapping changes conflicts in MPLS and mlx5.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28 10:33:06 -05:00
Linus Torvalds
1b1bc42c16 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) GTP fixes from Andreas Schultz (missing genl module alias, clear IP
    DF on transmit).

 2) Netfilter needs to reflect the fwmark when sending resets, from Pau
    Espin Pedrol.

 3) nftable dump OOPS fix from Liping Zhang.

 4) Fix erroneous setting of VIRTIO_NET_HDR_F_DATA_VALID on transmit,
    from Rolf Neugebauer.

 5) Fix build error of ipt_CLUSTERIP when procfs is disabled, from Arnd
    Bergmann.

 6) Fix regression in handling of NETIF_F_SG in harmonize_features(),
    from Eric Dumazet.

 7) Fix RTNL deadlock wrt. lwtunnel module loading, from David Ahern.

 8) tcp_fastopen_create_child() needs to setup tp->max_window, from
    Alexey Kodanev.

 9) Missing kmemdup() failure check in ipv6 segment routing code, from
    Eric Dumazet.

10) Don't execute unix_bind() under the bindlock, otherwise we deadlock
    with splice. From WANG Cong.

11) ip6_tnl_parse_tlv_enc_lim() potentially reallocates the skb buffer,
    therefore callers must reload cached header pointers into that skb.
    Fix from Eric Dumazet.

12) Fix various bugs in legacy IRQ fallback handling in alx driver, from
    Tobias Regnery.

13) Do not allow lwtunnel drivers to be unloaded while they are
    referenced by active instances, from Robert Shearman.

14) Fix truncated PHY LED trigger names, from Geert Uytterhoeven.

15) Fix a few regressions from virtio_net XDP support, from John
    Fastabend and Jakub Kicinski.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (102 commits)
  ISDN: eicon: silence misleading array-bounds warning
  net: phy: micrel: add support for KSZ8795
  gtp: fix cross netns recv on gtp socket
  gtp: clear DF bit on GTP packet tx
  gtp: add genl family modules alias
  tcp: don't annotate mark on control socket from tcp_v6_send_response()
  ravb: unmap descriptors when freeing rings
  virtio_net: reject XDP programs using header adjustment
  virtio_net: use dev_kfree_skb for small buffer XDP receive
  r8152: check rx after napi is enabled
  r8152: re-schedule napi for tx
  r8152: avoid start_xmit to schedule napi when napi is disabled
  r8152: avoid start_xmit to call napi_schedule during autosuspend
  net: dsa: Bring back device detaching in dsa_slave_suspend()
  net: phy: leds: Fix truncated LED trigger names
  net: phy: leds: Break dependency of phy.h on phy_led_triggers.h
  net: phy: leds: Clear phy_num_led_triggers on failure to avoid crash
  net-next: ethernet: mediatek: change the compatible string
  Documentation: devicetree: change the mediatek ethernet compatible string
  bnxt_en: Fix RTNL lock usage on bnxt_get_port_module_status().
  ...
2017-01-27 12:54:16 -08:00
Linus Torvalds
dd3b9f25c8 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
 "Second round of -rc fixes for 4.10.

  This -rc cycle has been slow for the rdma subsystem. I had already
  sent you the first batch before the Holiday break. After that, we kept
  only getting a few here or there. Up until this week, when I got a
  drop of 13 to one driver (qedr). So, here's the -rc patches I have. I
  currently have none held in reserve, so unless something new comes in,
  this is it until the next merge window opens.

  Summary:

   - series of iw_cxgb4 fixes to make it work with the drain cq API

   - one or two patches each to: srp, iser, cxgb3, vmw_pvrdma, umem,
     rxe, and ipoib

   - one big series (13 patches) for the new qedr driver"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (27 commits)
  RDMA/cma: Fix unknown symbol when CONFIG_IPV6 is not enabled
  IB/rxe: Prevent from completer to operate on non valid QP
  IB/rxe: Fix rxe dev insertion to rxe_dev_list
  IB/umem: Release pid in error and ODP flow
  RDMA/qedr: Dispatch port active event from qedr_add
  RDMA/qedr: Fix and simplify memory leak in PD alloc
  RDMA/qedr: Fix RDMA CM loopback
  RDMA/qedr: Fix formatting
  RDMA/qedr: Mark three functions as static
  RDMA/qedr: Don't reset QP when queues aren't flushed
  RDMA/qedr: Don't spam dmesg if QP is in error state
  RDMA/qedr: Remove CQ spinlock from CM completion handlers
  RDMA/qedr: Return max inline data in QP query result
  RDMA/qedr: Return success when not changing QP state
  RDMA/qedr: Add uapi header qedr-abi.h
  RDMA/qedr: Fix MTU returned from QP query
  RDMA/core: Add the function ib_mtu_int_to_enum
  IB/vmw_pvrdma: Fix incorrect cleanup on pvrdma_pci_probe error path
  IB/vmw_pvrdma: Don't leak info from alloc_ucontext
  IB/cxgb3: fix misspelling in header guard
  ...
2017-01-27 12:29:30 -08:00
Chris Wilson
fec0445caa drm/i915: Support explicit fencing for execbuf
Now that the user can opt-out of implicit fencing, we need to give them
back control over the fencing. We employ sync_file to wrap our
drm_i915_gem_request and provide an fd that userspace can merge with
other sync_file fds and pass back to the kernel to wait upon before
future execution.

Testcase: igt/gem_exec_fence
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-2-chris@chris-wilson.co.uk
2017-01-27 19:55:48 +00:00
Chris Wilson
77ae995789 drm/i915: Enable userspace to opt-out of implicit fencing
Userspace is faced with a dilemma. The kernel requires implicit fencing
to manage resource usage (we always must wait for the GPU to finish
before releasing its PTE) and for third parties. However, userspace may
wish to avoid this serialisation if it is either using explicit fencing
between parties and wants more fine-grained access to buffers (e.g. it
may partition the buffer between uses and track fences on ranges rather
than the implicit fences tracking the whole object). It follows that
userspace needs a mechanism to avoid the kernel's serialisation on its
implicit fences before execbuf execution.

The next question is whether this is an object, execbuf or context flag.
Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on
shared winsys buffers, but implicit fencing on internal surfaces)
require a per-object level flag. Given that this flag need to be only
set once for the lifetime of the object, this reduces the convenience of
having an execbuf or context level flag (and avoids having multiple
pieces of uABI controlling the same feature).

Incorrect use of this flag will result in rendering corruption and GPU
hangs - but will not result in use-after-free or similar resource
tracking issues.

Serious caveat: write ordering is not strictly correct after setting
this flag on a render target on multiple engines. This affects all
subsequent GEM operations (execbuf, set-domain, pread) and shared
dma-buf operations. A fix is possible - but costly (both in terms of
further ABI changes and runtime overhead).

Testcase: igt/gem_exec_async
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Chad Versace <chadversary@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170127094008.27489-1-chris@chris-wilson.co.uk
2017-01-27 19:55:35 +00:00
Linus Torvalds
9d1d166f18 Merge tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:

 - fix a regression on tvp5150 causing failures at input selection and
   image glitches

 - CEC was moved out of staging for v4.10. Fix some bugs on it while not
   too late

 - fix a regression on pctv452e caused by VM stack changes

 - fix suspend issued with smiapp

 - fix a regression on cobalt driver

 - fix some warnings and Kconfig issues with some random configs.

* tag 'media/v4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] s5k4ecgx: select CRC32 helper
  [media] dvb: avoid warning in dvb_net
  [media] v4l: tvp5150: Don't override output pinmuxing at stream on/off time
  [media] v4l: tvp5150: Fix comment regarding output pin muxing
  [media] v4l: tvp5150: Reset device at probe time, not in get/set format handlers
  [media] pctv452e: move buffer to heap, no mutex
  [media] media/cobalt: use pci_irq_allocate_vectors
  [media] cec: fix race between configuring and unconfiguring
  [media] cec: move cec_report_phys_addr into cec_config_thread_func
  [media] cec: replace cec_report_features by cec_fill_msg_report_features
  [media] cec: update log_addr[] before finishing configuration
  [media] cec: CEC_MSG_GIVE_FEATURES should abort for CEC version < 2
  [media] cec: when canceling a message, don't overwrite old status info
  [media] cec: fix report_current_latency
  [media] smiapp: Make suspend and resume functions __maybe_unused
  [media] smiapp: Implement power-on and power-off sequences without runtime PM
2017-01-27 10:29:33 -08:00
David S. Miller
a00ebc464d Merge tag 'linux-can-next-for-4.11-20170124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:

====================
pull-request: can-next 2017-01-24

this is a pull request of 4 patches for net-next/master.

The first patch by Oliver Hartkopp adds a netlink API to configure the
interface termination of a CAN card. The next two patches are by me and
add a netlink API to query and configure CAN interfaces that only
support fixed bitrates. The last patch by Colin Ian King simplifies the
return path in the softing_cs driver's softingcs_probe() function.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 11:19:29 -05:00
Arindam Nath
44879b6261 drm/amd/amdgpu: get maximum and used UVD handles (v4)
Change History
--------------

v4: Changes suggested by Emil, Christian
- return -ENODATA for asics with unlimited sessions

v3: changes suggested by Christian
- Add a check for UVD IP block using AMDGPU_HW_IP_UVD
  query type.
- Add a check for asic_type to be less than
  CHIP_POLARIS10 since starting Polaris, we support
  unlimited UVD instances.
- Add kerneldoc style comment for
  amdgpu_uvd_used_handles().

v2: as suggested by Christian
- Add a new query AMDGPU_INFO_NUM_HANDLES
- Create a helper function to return the number
  of currently used UVD handles.
- Modify the logic to count the number of used
  UVD handles since handles can be freed in
  non-linear fashion.

v1:
- User might want to query the maximum number of UVD
  instances supported by firmware. In addition to that,
  if there are multiple applications using UVD handles
  at the same time, he might also want to query the
  currently used number of handles.

  For this we add two variables max_handles and
  used_handles inside drm_amdgpu_info_hw_ip. So now
  an application (or libdrm) can use AMDGPU_INFO IOCTL
  with AMDGPU_INFO_HW_IP_INFO query type to get these
  values.

Signed-off-by: Arindam Nath <arindam.nath@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-01-27 11:12:38 -05:00
Felix Jia
d35a00b8e3 net/ipv6: allow sysctl to change link-local address generation mode
The address generation mode for IPv6 link-local can only be configured
by netlink messages. This patch adds the ability to change the address
generation mode via sysctl.

v1 -> v2
Removed the rtnl lock and switch to use RCU lock to iterate through
the netdev list.

v2 -> v3
Removed the addrgenmode variable from the idev structure and use the
systcl storage for the flag.

Simplifed the logic for sysctl handling by removing the supported
for all operation.

Added support for more types of tunnel interfaces for link-local
address generation.

Based the patches from net-next.

v3 -> v4
Removed unnecessary whitespace changes.

Signed-off-by: Felix Jia <felix.jia@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-27 10:25:34 -05:00
Philipp Zabel
73f1a5858b drm/fourcc: add vivante tiled layout format modifiers
Vivante GC hardware uses simple 4x4 tiled and nested 64x64 supertiled
formats as well as so-called split-tiled variants for dual-pipe
hardware, where even and odd tiles start at different base addresses.

Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170126153217.26916-1-p.zabel@pengutronix.de
2017-01-27 08:54:16 +01:00
Dave Airlie
3875623c56 Merge tag 'drm-misc-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-misc into drm-next
- cleanups&fixes for dw-hdmi bride driver (Laurent)
- updates for adv bridge driver (John Stultz) for nexus
- drm_crtc_from_index helper rollout (Shawn Guo)
- removing drm_framebuffer_unregister_private from drivers&core
- target_vblank (Andrey Grodzovsky)
- misc tiny stuff

* tag 'drm-misc-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-misc: (49 commits)
  drm: qxl: Open code teardown function for qxl
  drm: qxl: Open code probing sequence for qxl
  drm/bridge: adv7511: Re-write the i2c address before EDID probing
  drm/bridge: adv7511: Reuse __adv7511_power_on/off() when probing EDID
  drm/bridge: adv7511: Rework adv7511_power_on/off() so they can be reused internally
  drm/bridge: adv7511: Enable HPD interrupts to support hotplug and improve monitor detection
  drm/bridge: adv7511: Switch to using drm_kms_helper_hotplug_event()
  drm/bridge: adv7511: Use work_struct to defer hotplug handing to out of irq context
  drm: vc4: use crtc helper drm_crtc_from_index()
  drm: tegra: use crtc helper drm_crtc_from_index()
  drm: nouveau: use crtc helper drm_crtc_from_index()
  drm: mediatek: use crtc helper drm_crtc_from_index()
  drm: kirin: use crtc helper drm_crtc_from_index()
  drm: exynos: use crtc helper drm_crtc_from_index()
  dt-bindings: display: dw-hdmi: Clean up DT bindings documentation
  drm: bridge: dw-hdmi: Assert SVSRET before resetting the PHY
  drm: bridge: dw-hdmi: Fix the name of the PHY reset macros
  drm: bridge: dw-hdmi: Define and use macros for PHY register addresses
  drm: bridge: dw-hdmi: Detect PHY type at runtime
  drm: bridge: dw-hdmi: Handle overflow workaround based on device version
  ...
2017-01-27 12:10:24 +10:00
Dave Airlie
a7e2641aaf Merge tag 'drm-intel-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-intel into drm-next
Final block of feature work for 4.11:

- gen8 pd cleanup from Matthew Auld
- more cleanups for view/vma (Chris)
- dmc support on glk (Anusha Srivatsa)
- use core crc api (Tomue)
- track wedged requests using fence.error (Chris)
- lots of psr fixes (Nagaraju, Vathsala)
- dp mst support, acked for merging through drm-intel by Takashi
  (Libin)
- huc loading support, including uapi for libva to use it (Anusha
  Srivatsa)

* tag 'drm-intel-next-2017-01-23' of git://anongit.freedesktop.org/git/drm-intel: (111 commits)
  drm/i915: Update DRIVER_DATE to 20170123
  drm/i915: reinstate call to trace_i915_vma_bind
  drm/i915: Assert that created vma has a whole number of pages
  drm/i915: Assert the drm_mm_node is allocated when on the VM lists
  drm/i915: Treat an error from i915_vma_instance() as unlikely
  drm/i915: Reject vma creation larger than address space
  drm/i915: Use common LRU inactive vma bumping for unpin_from_display
  drm/i915: Do an unlocked wait before set-cache-level ioctl
  drm/i915/huc: Assert that HuC vma is placed in GuC accessible range
  drm/i915/huc: Avoid attempting to authenticate non-existent fw
  drm/i915: Set adjustment to zero on Up/Down interrupts if freq is already max/min
  drm/i915: Remove the double handling of 'flags from intel_mode_from_pipe_config()
  drm/i915: Remove crtc->config usage from intel_modeset_readout_hw_state()
  drm/i915: Release temporary load-detect state upon switching
  drm/i915: Remove i915_gem_object_to_ggtt()
  drm/i915: Remove i915_vma_create from VMA API
  drm/i915: Add a check that the VMA instance we lookup matches the request
  drm/i915: Rename some warts in the VMA API
  drm/i915: Track pinned vma in intel_plane_state
  drm/i915/get_params: Add HuC status to getparams
  ...
2017-01-27 12:08:32 +10:00
Dave Airlie
b0df0b251b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next
Backmerge Linus master to get the connector locking revert.

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux: (645 commits)
  sysctl: fix proc_doulongvec_ms_jiffies_minmax()
  Revert "drm/probe-helpers: Drop locking from poll_enable"
  MAINTAINERS: add Dan Streetman to zbud maintainers
  MAINTAINERS: add Dan Streetman to zswap maintainers
  mm: do not export ioremap_page_range symbol for external module
  mn10300: fix build error of missing fpu_save()
  romfs: use different way to generate fsid for BLOCK or MTD
  frv: add missing atomic64 operations
  mm, page_alloc: fix premature OOM when racing with cpuset mems update
  mm, page_alloc: move cpuset seqcount checking to slowpath
  mm, page_alloc: fix fast-path race with cpuset update or removal
  mm, page_alloc: fix check for NULL preferred_zone
  kernel/panic.c: add missing \n
  fbdev: color map copying bounds checking
  frv: add atomic64_add_unless()
  mm/mempolicy.c: do not put mempolicy before using its nodemask
  radix-tree: fix private list warnings
  Documentation/filesystems/proc.txt: add VmPin
  mm, memcg: do not retry precharge charges
  proc: add a schedule point in proc_pid_readdir()
  ...
2017-01-27 11:00:42 +10:00
David S. Miller
49b3eb7725 Merge tag 'batadv-next-for-davem-20170126' of git://git.open-mesh.org/linux-merge
Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich

 - ignore self-generated loop detect MAC addresses in translation table,
   by Simon Wunderlich

 - install uapi batman_adv.h header, by Sven Eckelmann

 - bump copyright years, by Sven Eckelmann

 - Remove an unused variable in translation table code, by Sven Eckelmann

 - Handle NET_XMIT_CN like NET_XMIT_SUCCESS (revised according to Davids
   suggestion), and a follow up code clean up, by Gao Feng (2 patches)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 14:31:08 -05:00
David S. Miller
086cb6a412 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains a large batch with Netfilter fixes for
your net tree, they are:

1) Two patches to solve conntrack garbage collector cpu hogging, one to
   remove GC_MAX_EVICTS and another to look at the ratio (scanned entries
   vs. evicted entries) to make a decision on whether to reduce or not
   the scanning interval. From Florian Westphal.

2) Two patches to fix incorrect set element counting if NLM_F_EXCL is
   is not set. Moreover, don't decrenent set->nelems from abort patch
   if -ENFILE which leaks a spare slot in the set. This includes a
   patch to deconstify the set walk callback to update set->ndeact.

3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to
   reply packets both from nf_reject and local stack, from Pau Espin Pedrol.

4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables
   fib expression, from Liping Zhang.

5) Fix oops on stateful objects netlink dump, when no filter is specified.
   Also from Liping Zhang.

6) Fix a build error if proc is not available in ipt_CLUSTERIP, related
   to fix that was applied in the previous batch for net. From Arnd Bergmann.

7) Fix lack of string validation in table, chain, set and stateful
   object names in nf_tables, from Liping Zhang. Moreover, restrict
   maximum log prefix length to 127 bytes, otherwise explicitly bail
   out.

8) Two patches to fix spelling and typos in nf_tables uapi header file
   and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-26 12:54:50 -05:00
Sven Eckelmann
ac79cbb96b batman-adv: update copyright years for 2017
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:34:19 +01:00
Sven Eckelmann
e60bf3ea67 uapi: install batman_adv.h header
09748a22f4 ("batman-adv: add generic netlink family for batman-adv")
introduced the new batman_adv.h which describes the netlink attributes and
commands of batman-adv. But the Kbuild entry to install the header was not
added.

All currently known tools ship their own copy of batman_adv.h but it should
be installed anyway to later be able to migrate to the system batman_adv.h.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-26 08:34:19 +01:00
Wei Wang
19f6d3f3c8 net/tcp-fastopen: Add new API support
This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
alternative way to perform Fast Open on the active side (client). Prior
to this patch, a client needs to replace the connect() call with
sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
to use Fast Open: these socket operations are often done in lower layer
libraries used by many other applications. Changing these libraries
and/or the socket call sequences are not trivial. A more convenient
approach is to perform Fast Open by simply enabling a socket option when
the socket is created w/o changing other socket calls sequence:
  s = socket()
    create a new socket
  setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
    newly introduced sockopt
    If set, new functionality described below will be used.
    Return ENOTSUPP if TFO is not supported or not enabled in the
    kernel.

  connect()
    With cookie present, return 0 immediately.
    With no cookie, initiate 3WHS with TFO cookie-request option and
    return -1 with errno = EINPROGRESS.

  write()/sendmsg()
    With cookie present, send out SYN with data and return the number of
    bytes buffered.
    With no cookie, and 3WHS not yet completed, return -1 with errno =
    EINPROGRESS.
    No MSG_FASTOPEN flag is needed.

  read()
    Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
    write() is not called yet.
    Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
    established but no msg is received yet.
    Return number of bytes read if socket is established and there is
    msg received.

The new API simplifies life for applications that always perform a write()
immediately after a successful connect(). Such applications can now take
advantage of Fast Open by merely making one new setsockopt() call at the time
of creating the socket. Nothing else about the application's socket call
sequence needs to change.

Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 14:04:38 -05:00
David S. Miller
716dcaebed Merge tag 'mlx5-updates-2017-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2017-24-01

The first seven patches from Or Gerlitz in this series further enhances
the mlx5 SRIOV switchdev mode to support offloading IPv6 tunnels using the
TC tunnel key set (encap) and unset (decap) actions.

Or Gerlitz says:
========================
As part of doing this change, few cleanups are done in the IPv4 code,
later we move to use the full tunnel key info provided to the driver as
the key for our internal hashing which is used to identify cases where
the same tunnel is used for encapsulating multiple flows. As done in the
IPv4 case, the control path for offloading IPv6 tunnels uses route/neigh
lookups and construction of the IPv6 tunnel headers on the encap path and
matching on the outer hears in the decap path.

The last patch of the series enlarges the HW FDB size for the switchdev mode,
so it has now room to contain offloaded flows as many as min(max number
of HW flow counters supported, max HW table size supported).
========================

Next to Or's series you can find several patches handling several topics.

From Mohamad, add support for SRIOV VF min rate guarantee by using the
TSAR BW share weights mechanism.

From Or, Two patches to enable Eth VFs to query their min-inline value for
user-space.
for that we move a mlx5 low level min inline helper function from mlx5
ethernet driver into the core driver and then use it in mlx5_ib to expose
the inline mode to rdma applications through libmlx5.

From Kamal Heib, Reduce memory consumption on kdump kernel.

From Shaker Daibes, code reuse in CQE compression control logic
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 12:49:58 -05:00
Jamal Hadi Salim
1045ba77a5 net sched actions: Add support for user cookies
Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to save
user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it.  The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 1 bind 0 installed 5 sec used 5 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-25 12:37:04 -05:00
Michael Kerrisk (man-pages)
e5ff5ce6e2 nsfs: Add an ioctl() to return the namespace type
Linux 4.9 added two ioctl() operations that can be used to discover:

* the parental relationships for hierarchical namespaces (user and PID)
  [NS_GET_PARENT]
* the user namespaces that owns a specified non-user-namespace
  [NS_GET_USERNS]

For no good reason that I can glean, NS_GET_USERNS was made synonymous
with NS_GET_PARENT for user namespaces. It might have been better if
NS_GET_USERNS had returned an error if the supplied file descriptor
referred to a user namespace, since it suggests that the caller may be
confused. More particularly, if it had generated an error, then I wouldn't
need the new ioctl() operation proposed here. (On the other hand, what
I propose here may be more generally useful.)

I would like to write code that discovers namespace relationships for
the purpose of understanding the namespace setup on a running system.
In particular, given a file descriptor (or pathname) for a namespace,
N, I'd like to obtain the corresponding user namespace.  Namespace N
might be a user namespace (in which case my code would just use N) or
a non-user namespace (in which case my code will use NS_GET_USERNS to
get the user namespace associated with N). The problem is that there
is no way to tell the difference by looking at the file descriptor
(and if I try to use NS_GET_USERNS on an N that is a user namespace, I
get the parent user namespace of N, which is not what I want).

This patch therefore adds a new ioctl(), NS_GET_NSTYPE, which, given
a file descriptor that refers to a user namespace, returns the
namespace type (one of the CLONE_NEW* constants).

Signed-off-by: Michael Kerrisk <mtk-manpages@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-01-25 14:43:09 +13:00
Liping Zhang
5ce6b04ce9 netfilter: nft_log: restrict the log prefix length to 127
First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127,
at nf_log_packet(), so the extra part is useless.

Second, after adding a log rule with a very very long prefix, we will
fail to dump the nft rules after this _special_ one, but acctually,
they do exist. For example:
  # name_65000=$(printf "%0.sQ" {1..65000})
  # nft add rule filter output log prefix "$name_65000"
  # nft add rule filter output counter
  # nft add rule filter output counter
  # nft list chain filter output
  table ip filter {
      chain output {
          type filter hook output priority 0; policy accept;
      }
  }

So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1.

Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-24 21:46:29 +01:00