linux

Author	SHA1	Message	Date
Thomas Falcon	2770a7984d	ibmvnic: Introduce hard reset recovery Introduce a recovery hard reset to handle reset failure as a result of change of device context following a transport event, such as a backing device failover or partition migration. These operations reset the device context to its initial state. If this occurs during a reset, any initialization commands are likely to fail with an invalid state error as backing device firmware requests reinitialization. When this happens, make one more attempt by performing a hard reset, which frees any resources currently allocated and performs device initialization. If a transport event occurs during a device reset, a flag is set which will trigger a new hard reset following the completionof the current reset event. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	06e43d7f9f	ibmvnic: Set resetting state at earliest possible point Set device resetting state at the earliest possible point: as soon as a reset is successfully scheduled. The reset state is toggled off when all resets have been processed to completion. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	8a348450a0	ibmvnic: Create separate initialization routine for resets Instead of having one initialization routine for all cases, create a separate, simpler function for standard initialization, such as during device probe. Use the original initialization function to handle device reset scenarios. The goal of this patch is to avoid having a single, cluttered init function to handle all possible scenarios. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	ab5ec33b9a	ibmvnic: Handle error case when setting link state If setting the link state is not successful, print a warning with the resulting return code and return it to be handled by the caller. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	17c8705838	ibmvnic: Return error code if init interrupted by transport event If device init is interrupted by a failover, set the init return code so that it can be checked and handled appropriately by the init routine. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	9c4eaabd1b	ibmvnic: Check CRQ command return codes Check whether CRQ command is successful before awaiting a response from the management partition. If the command was not successful, the driver may hang waiting for a response that will never come. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:26 -04:00
Thomas Falcon	5153698e55	ibmvnic: Introduce active CRQ state Introduce an "active" state for a IBM vNIC Command-Response Queue. A CRQ is considered active once it has initialized or linked with its partner by sending an initialization request and getting a successful response back from the management partition. Until this has happened, do not allow CRQ commands to be sent other than the initialization request. This change will avoid a protocol error in case of a device transport event occurring during a initialization. When the driver receives a transport event notification indicating that the backing hardware has changed and needs reinitialization, any further commands other than the initialization handshake with the VIOS management partition will result in an invalid state error. Instead of sending a command that will be returned with an error, print a warning and return an error that will be handled by the caller. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:25 -04:00
Thomas Falcon	c3f2241547	ibmvnic: Mark NAPI flag as disabled when released Set adapter NAPI state as disabled if they are removed. This will allow them to be enabled again if reallocated in case of a hard reset. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:19:25 -04:00
Willem de Bruijn	730c54d594	ipv4: remove warning in ip_recv_error A precondition check in ip_recv_error triggered on an otherwise benign race. Remove the warning. The warning triggers when passing an ipv6 socket to this ipv4 error handling function. RaceFuzzer was able to trigger it due to a race in setsockopt IPV6_ADDRFORM. --- CPU0 do_ipv6_setsockopt sk->sk_socket->ops = &inet_dgram_ops; --- CPU1 sk->sk_prot->recvmsg udp_recvmsg ip_recv_error WARN_ON_ONCE(sk->sk_family == AF_INET6); --- CPU0 do_ipv6_setsockopt sk->sk_family = PF_INET; This socket option converts a v6 socket that is connected to a v4 peer to an v4 socket. It updates the socket on the fly, changing fields in sk as well as other structs. This is inherently non-atomic. It races with the lockless udp_recvmsg path. No other code makes an assumption that these fields are updated atomically. It is benign here, too, as ip_recv_error cares only about the protocol of the skbs enqueued on the error queue, for which sk_family is not a precise predictor (thanks to another isue with IPV6_ADDRFORM). Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr Fixes: `7ce875e5ec` ("ipv4: warn once on passing AF_INET6 socket to ip_recv_error") Reported-by: DaeRyong Jeong <threeearcat@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:16:57 -04:00
David S. Miller	180f848b8f	Merge branch 'gretap-mirroring-selftests' Petr Machata says: ==================== selftests: forwarding: Additions to mirror-to-gretap tests This patchset is for a handful of edge cases in mirror-to-gretap scenarios: removal of mirrored-to netdevice (#1), removal of underlay route for tunnel remote endpoint (#2) and cessation of mirroring upon removal of flower mirroring rule (#3). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:14:37 -04:00
Petr Machata	a96d81a20b	selftests: forwarding: Test removal of mirroring Test that when flower-based mirror action is removed, mirroring stops. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:14:36 -04:00
Petr Machata	77a8df3810	selftests: forwarding: Test removal of underlay route When underlay route is removed, the mirrored traffic should not be forwarded. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:14:36 -04:00
Petr Machata	6b45432d78	selftests: forwarding: Test mirroring to deleted device Tests that the mirroring code catches up with deletion of a mirrored-to device. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:14:36 -04:00
Or Gerlitz	f8f4bef322	net : sched: cls_api: deal with egdev path only if needed When dealing with ingress rule on a netdev, if we did fine through the conventional path, there's no need to continue into the egdev route, and we can stop right there. Not doing so may cause a 2nd rule to be added by the cls api layer with the ingress being the egdev. For example, under sriov switchdev scheme, a user rule of VFR A --> VFR B will end up with two HW rules (1) VF A --> VF B and (2) uplink --> VF B Fixes: `208c0f4b52` ('net: sched: use tc_setup_cb_call to call per-block callbacks') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:12:22 -04:00
Jason Wang	1b15ad683a	vhost: synchronize IOTLB message with dev cleanup DaeRyong Jeong reports a race between vhost_dev_cleanup() and vhost_process_iotlb_msg(): Thread interleaving: CPU0 (vhost_process_iotlb_msg) CPU1 (vhost_dev_cleanup) (In the case of both VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE) ===== ===== vhost_umem_clean(dev->iotlb); if (!dev->iotlb) { ret = -EFAULT; break; } dev->iotlb = NULL; The reason is we don't synchronize between them, fixing by protecting vhost_process_iotlb_msg() with dev mutex. Reported-by: DaeRyong Jeong <threeearcat@gmail.com> Fixes: `6b1e6cc785` ("vhost: new device IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:09:51 -04:00
David S. Miller	d681bc027a	mlx5-fixes-2018-05-24 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbBzJYAAoJEEg/ir3gV/o+QL8H/2vYqPAClpXyd2ZfQYY2XoeD cfu5XLflTS1jhQ6gfQxUNI+eXGoiX6P9cUpFEXiR53Fck3/1t1Jr1kG4SzysVGdB X+HARp3fHG1Rpm0J4aRB2hRONtG9GOyqUXwK+pgzdDu5fjhStxvHm/jlNH1NkZWA Y4gbgkqAv75Xrj2iEoGATrUFODttMmN3epsVgOytiovR7sbcCdI294kA97YwSb3C Id+GmWV027AcEgB3AVatScpOXo7WUh/gIssv1P2JBTWCY7Gsp+SaxaF0fecS7qAm +srklCYLQmvZ0sNtnq70bBnQbnj4AR3vSPY/iywx+KbvRV87sCW8iFPCbakJt7Q= =ueev -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2018-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2018-05-24 This series includes two mlx5 fixes. 1) add FCS data to checksum complete when required, from Eran Ben Elisha. 2) Fix A race in IPSec sandbox QP commands, from Yossi Kuperman. Please pull and let me know if there's any problem. for -stable v4.15 ("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 22:01:06 -04:00
Willem de Bruijn	9aad13b087	packet: fix reserve calculation Commit `b84bbaf7a6` ("packet: in packet_snd start writing at link layer allocation") ensures that packet_snd always starts writing the link layer header in reserved headroom allocated for this purpose. This is needed because packets may be shorter than hard_header_len, in which case the space up to hard_header_len may be zeroed. But that necessary padding is not accounted for in skb->len. The fix, however, is buggy. It calls skb_push, which grows skb->len when moving skb->data back. But in this case packet length should not change. Instead, call skb_reserve, which moves both skb->data and skb->tail back, without changing length. Fixes: `b84bbaf7a6` ("packet: in packet_snd start writing at link layer allocation") Reported-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 21:55:20 -04:00
YueHaibing	d624613e42	cxgb4: Check for kvzalloc allocation failure t4_prep_fw doesn't check for card_fw pointer before store the read data, which could lead to a NULL pointer dereference if kvzalloc failed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 21:52:44 -04:00
Alexei Starovoitov	10f678683e	Merge branch 'xdp_xmit-bulking' Jesper Dangaard Brouer says: ==================== This patchset change ndo_xdp_xmit API to take a bulk of xdp frames. When kernel is compiled with CONFIG_RETPOLINE, every indirect function pointer (branch) call hurts performance. For XDP this have a huge negative performance impact. This patchset reduce the needed (indirect) calls to ndo_xdp_xmit, but also prepares for further optimizations. The DMA APIs use of indirect function pointer calls is the primary source the regression. It is left for a followup patchset, to use bulking calls towards the DMA API (via the scatter-gatter calls). The other advantage of this API change is that drivers can easier amortize the cost of any sync/locking scheme, over the bulk of packets. The assumption of the current API is that the driver implemementing the NDO will also allocate a dedicated XDP TX queue for every CPU in the system. Which is not always possible or practical to configure. E.g. ixgbe cannot load an XDP program on a machine with more than 96 CPUs, due to limited hardware TX queues. E.g. virtio_net is hard to configure as it requires manually increasing the queues. E.g. tun driver chooses to use a per XDP frame producer lock modulo smp_processor_id over avail queues. I'm considered adding 'flags' to ndo_xdp_xmit, but it's not part of this patchset. This will be a followup patchset, once we know if this will be needed (e.g. for non-map xdp_redirect flush-flag, and if AF_XDP chooses to use ndo_xdp_xmit for TX). --- V5: Fixed up issues spotted by Daniel and John V4: Splitout the patches from 4 to 8 patches. I cannot split the driver changes from the NDO change, but I've tried to isolated the NDO change together with the driver change as much as possible. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:16 -07:00
Jesper Dangaard Brouer	a570e48fee	samples/bpf: xdp_monitor use err code from tracepoint xdp:xdp_devmap_xmit Update xdp_monitor to use the recently added err code introduced in tracepoint xdp:xdp_devmap_xmit, to show if the drop count is caused by some driver general delivery problem. Other kind of drops will likely just be more normal TX space issues. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	e74de52e55	xdp/trace: extend tracepoint in devmap with an err Extending tracepoint xdp:xdp_devmap_xmit in devmap with an err code allow people to easier identify the reason behind the ndo_xdp_xmit call to a given driver is failing. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	735fc4054b	xdp: change ndo_xdp_xmit API to support bulking This patch change the API for ndo_xdp_xmit to support bulking xdp_frames. When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown. Most of the slowdown is caused by DMA API indirect function calls, but also the net_device->ndo_xdp_xmit() call. Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed performance improved: for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps With frames avail as a bulk inside the driver ndo_xdp_xmit call, further optimizations are possible, like bulk DMA-mapping for TX. Testing without CONFIG_RETPOLINE show the same performance for physical NIC drivers. The virtual NIC driver tun sees a huge performance boost, as it can avoid doing per frame producer locking, but instead amortize the locking cost over the bulk. V2: Fix compile errors reported by kbuild test robot <lkp@intel.com> V4: Isolated ndo, driver changes and callers. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	389ab7f01a	xdp: introduce xdp_return_frame_rx_napi When sending an xdp_frame through xdp_do_redirect call, then error cases can happen where the xdp_frame needs to be dropped, and returning an -errno code isn't sufficient/possible any-longer (e.g. for cpumap case). This is already fully supported, by simply calling xdp_return_frame. This patch is an optimization, which provides xdp_return_frame_rx_napi, which is a faster variant for these error cases. It take advantage of the protection provided by XDP RX running under NAPI protection. This change is mostly relevant for drivers using the page_pool allocator as it can take advantage of this. (Tested with mlx5). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	9940fbf633	samples/bpf: xdp_monitor use tracepoint xdp:xdp_devmap_xmit The xdp_monitor sample/tool is updated to use the new tracepoint xdp:xdp_devmap_xmit the previous patch just introduced. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	38edddb811	xdp: add tracepoint for devmap like cpumap have Notice how this allow us get XDP statistic without affecting the XDP performance, as tracepoint is no-longer activated on a per packet basis. V5: Spotted by John Fastabend. Fix 'sent' also counted 'drops' in this patch, a later patch corrected this, but it was a mistake in this intermediate step. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:15 -07:00
Jesper Dangaard Brouer	5d053f9da4	bpf: devmap prepare xdp frames for bulking Like cpumap create queue for xdp frames that will be bulked. For now, this patch simply invoke ndo_xdp_xmit foreach frame. This happens, either when the map flush operation is envoked, or when the limit DEV_MAP_BULK_SIZE is reached. V5: Avoid memleak on error path in dev_map_update_elem() Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:14 -07:00
Jesper Dangaard Brouer	67f29e07e1	bpf: devmap introduce dev_map_enqueue Functionality is the same, but the ndo_xdp_xmit call is now simply invoked from inside the devmap.c code. V2: Fix compile issue reported by kbuild test robot <lkp@intel.com> V5: Cleanups requested by Daniel - Newlines before func definition - Use BUILD_BUG_ON checks - Remove unnecessary use return value store in dev_map_enqueue Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:36:14 -07:00
Alexei Starovoitov	f80acbd233	Merge branch 'bpf-task-fd-query' Yonghong Song says: ==================== Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, this command will return bpf related information to user space. Right now it only supports tracepoint/kprobe/uprobe perf event fd's. For such a fd, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Patch #1 adds function perf_get_event() in kernel/events/core.c. Patch #2 implements the bpf subcommand BPF_TASK_FD_QUERY. Patch #3 syncs tools bpf.h header and also add bpf_task_fd_query() in the libbpf library for samples/selftests/bpftool to use. Patch #4 adds ksym_get_addr() utility function. Patch #5 add a test in samples/bpf for querying k[ret]probes and u[ret]probes. Patch #6 add a test in tools/testing/selftests/bpf for querying raw_tracepoint and tracepoint. Patch #7 add a new subcommand "perf" to bpftool. Changelogs: v4 -> v5: . return strlen(buf) instead of strlen(buf) + 1 in the attr.buf_len. As long as user provides non-empty buffer, it will be filed with empty string, truncated string, or full string based on the buffer size and the length of to-be-copied string. v3 -> v4: . made attr buf_len input/output. The length of actual buffter is written to buf_len so user space knows what is actually needed. If user provides a buffer with length >= 1 but less than required, do partial copy and return -ENOSPC. . code simplification with put_user. . changed query result attach_info to fd_type. . add tests at selftests/bpf to test zero len, null buf and insufficient buf. v2 -> v3: . made perf_get_event() return perf_event pointer const. this was to ensure that event fields are not meddled. . detect whether newly BPF_TASK_FD_QUERY is supported or not in "bpftool perf" and warn users if it is not. v1 -> v2: . changed bpf subcommand name from BPF_PERF_EVENT_QUERY to BPF_TASK_FD_QUERY. . fixed various "bpftool perf" issues and added documentation and auto-completion. ==================== Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:18:21 -07:00
Yonghong Song	b04df400c3	tools/bpftool: add perf subcommand The new command "bpftool perf [show \| list]" will traverse all processes under /proc, and if any fd is associated with a perf event, it will print out related perf event information. Documentation is also added. Below is an example to show the results using bcc commands. Running the following 4 bcc commands: kprobe: trace.py '__x64_sys_nanosleep' kretprobe: trace.py 'r::__x64_sys_nanosleep' tracepoint: trace.py 't:syscalls:sys_enter_nanosleep' uprobe: trace.py 'p:/home/yhs/a.out:main' The bpftool command line and result: $ bpftool perf pid 21711 fd 5: prog_id 5 kprobe func __x64_sys_write offset 0 pid 21765 fd 5: prog_id 7 kretprobe func __x64_sys_nanosleep offset 0 pid 21767 fd 5: prog_id 8 tracepoint sys_enter_nanosleep pid 21800 fd 5: prog_id 9 uprobe filename /home/yhs/a.out offset 1159 $ bpftool -j perf [{"pid":21711,"fd":5,"prog_id":5,"fd_type":"kprobe","func":"__x64_sys_write","offset":0}, \ {"pid":21765,"fd":5,"prog_id":7,"fd_type":"kretprobe","func":"__x64_sys_nanosleep","offset":0}, \ {"pid":21767,"fd":5,"prog_id":8,"fd_type":"tracepoint","tracepoint":"sys_enter_nanosleep"}, \ {"pid":21800,"fd":5,"prog_id":9,"fd_type":"uprobe","filename":"/home/yhs/a.out","offset":1159}] $ bpftool prog 5: kprobe name probe___x64_sys tag e495a0c82f2c7a8d gpl loaded_at 2018-05-15T04:46:37-0700 uid 0 xlated 200B not jited memlock 4096B map_ids 4 7: kprobe name probe___x64_sys tag f2fdee479a503abf gpl loaded_at 2018-05-15T04:48:32-0700 uid 0 xlated 200B not jited memlock 4096B map_ids 7 8: tracepoint name tracepoint__sys tag 5390badef2395fcf gpl loaded_at 2018-05-15T04:48:48-0700 uid 0 xlated 200B not jited memlock 4096B map_ids 8 9: kprobe name probe_main_1 tag 0a87bdc2e2953b6d gpl loaded_at 2018-05-15T04:49:52-0700 uid 0 xlated 200B not jited memlock 4096B map_ids 9 $ ps ax \| grep "python ./trace.py" 21711 pts/0 T 0:03 python ./trace.py __x64_sys_write 21765 pts/0 S+ 0:00 python ./trace.py r::__x64_sys_nanosleep 21767 pts/2 S+ 0:00 python ./trace.py t:syscalls:sys_enter_nanosleep 21800 pts/3 S+ 0:00 python ./trace.py p:/home/yhs/a.out:main 22374 pts/1 S+ 0:00 grep --color=auto python ./trace.py Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:18:20 -07:00
Yonghong Song	f699cf7aa4	tools/bpf: add two BPF_TASK_FD_QUERY tests in test_progs The new tests are added to query perf_event information for raw_tracepoint and tracepoint attachment. For tracepoint, both syscalls and non-syscalls tracepoints are queries as they are treated slightly differently inside the kernel. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:18:20 -07:00
Yonghong Song	ecb96f7fe1	samples/bpf: add a samples/bpf test for BPF_TASK_FD_QUERY This is mostly to test kprobe/uprobe which needs kernel headers. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:18:20 -07:00
Yonghong Song	73bc4d9fc0	tools/bpf: add ksym_get_addr() in trace_helpers Given a kernel function name, ksym_get_addr() will return the kernel address for this function, or 0 if it cannot find this function name in /proc/kallsyms. This function will be used later when a kernel address is used to initiate a kprobe perf event. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:18:20 -07:00
Yonghong Song	30687ad94e	tools/bpf: sync kernel header bpf.h and add bpf_task_fd_query in libbpf Sync kernel header bpf.h to tools/include/uapi/linux/bpf.h and implement bpf_task_fd_query() in libbpf. The test programs in samples/bpf and tools/testing/selftests/bpf, and later bpftool will use this libbpf function to query kernel. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:18:19 -07:00
Yonghong Song	41bdc4b40e	bpf: introduce bpf subcommand BPF_TASK_FD_QUERY Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, if the <pid, fd> is associated with a tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:18:19 -07:00
Yonghong Song	f8d959a5b1	perf/core: add perf_get_event() to return perf_event given a struct file A new extern function, perf_get_event(), is added to return a perf event given a struct file. This function will be used in later patches. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-05-24 18:18:19 -07:00
Dave Airlie	4bc6f77795	Merge branch 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux into drm-fixes Three fixes for vmwgfx. Two are cc'd stable and fix host logging and its error paths on 32-bit VMs. One is a fix for a hibernate flaw introduced with the 4.17 merge window. * 'vmwgfx-fixes-4.17' of git://people.freedesktop.org/~thomash/linux: drm/vmwgfx: Schedule an fb dirty update after resume drm/vmwgfx: Fix host logging / guestinfo reading error paths drm/vmwgfx: Fix 32-bit VMW_PORT_HB_[IN\|OUT] macros	2018-05-25 09:47:56 +10:00
Linus Torvalds	b50694381c	Merge branch 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb Pull swiotlb fix from Konrad Rzeszutek Wilk: "One single fix in here: under Xen the DMA32 heap (in the hypervisor) would end up looking like swiss cheese. The reason being that for every coherent DMA allocation we didn't do the proper hypercall to tell Xen to return the page back to the DMA32 heap. End result was (eventually) no DMA32 space if you (for example) continously unloaded and loaded modules" * 'stable/for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb: xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent	2018-05-24 14:42:43 -07:00
Yossi Kuperman	1dcbc01f73	net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands Sandbox QP Commands are retired in the order they are sent. Outstanding commands are stored in a linked-list in the order they appear. Once a response is received and the callback gets called, we pull the first element off the pending list, assuming they correspond. Sending a message and adding it to the pending list is not done atomically, hence there is an opportunity for a race between concurrent requests. Bind both send and add under a critical section. Fixes: `bebb23e6cb` ("net/mlx5: Accel, Add IPSec acceleration interface") Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Adi Nissim <adin@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-05-24 14:40:40 -07:00
Eran Ben Elisha	902a545904	net/mlx5e: When RXFCS is set, add FCS data into checksum calculation When RXFCS feature is enabled, the HW do not strip the FCS data, however it is not present in the checksum calculated by the HW. Fix that by manually calculating the FCS checksum and adding it to the SKB checksum field. Add helper function to find the FCS data for all SKB forms (linear, one fragment or more). Fixes: `102722fc68` ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-05-24 14:40:39 -07:00
Huy Nguyen	ecdf2dadee	net/mlx5e: Receive buffer support for DCBX Add dcbnl's set/get buffer configuration callback that allows user to set/get buffer size configuration and priority to buffer mapping. By default, firmware controls receive buffer configuration and priority of buffer mapping based on the changes in pfc settings. When set buffer call back is triggered, the buffer configuration changes to manual mode. The manual mode means mlx5 driver will adjust the buffer configuration accordingly based on the changes in pfc settings. ConnectX buffer stride is 128 Bytes. If the buffer size is not multiple of 128, the buffer size will be rounded down to the nearest multiple of 128. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-05-24 14:23:33 -07:00
Huy Nguyen	0696d60853	net/mlx5e: Receive buffer configuration Add APIs for buffer configuration based on the changes in pfc configuration, cable len, buffer size configuration, and priority to buffer mapping. Note that the xoff fomula is as below xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B] xoff_threshold = buffer_size - xoff xon_threshold = xoff_threshold - MTU Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-05-24 14:23:33 -07:00
Huy Nguyen	50b4a3c236	net/mlx5: PPTB and PBMC register firmware command support Add firmware command interface to read and write PPTB and PBMC registers. PPTB register enables mappings priority to a specific receive buffer. PBMC registers enables changing the receive buffer's configuration such as buffer size, xon/xoff thresholds, buffer's lossy property and buffer's shared property. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-05-24 14:23:33 -07:00
Huy Nguyen	df5f1361cc	net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask Add pbmc and pptb in the port_access_reg_cap_mask. These two bits determine if device supports receive buffer configuration. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-05-24 14:23:33 -07:00
Huy Nguyen	2c81bfd5ae	net/mlx5e: Move port speed code from en_ethtool.c to en/port.c Move four below functions from en_ethtool.c to en/port.c. These functions are used by both en_ethtool.c and en_main.c. Future code can use these functions without ethtool link mode dependency. u32 mlx5e_port_ptys2speed(u32 eth_proto_oper); int mlx5e_port_linkspeed(struct mlx5_core_dev mdev, u32 speed); int mlx5e_port_max_linkspeed(struct mlx5_core_dev mdev, u32 speed); u32 mlx5e_port_speed2linkmodes(u32 speed); Delete the speed field from table mlx5e_build_ptys2ethtool_map. This table only keeps the mapping between the mlx5e link mode and ethtool link mode. Add new table mlx5e_link_speed for translation from mlx5e link mode to actual speed. Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-05-24 14:23:33 -07:00
Huy Nguyen	e549f6f9c0	net/dcb: Add dcbnl buffer attribute In this patch, we add dcbnl buffer attribute to allow user change the NIC's buffer configuration such as priority to buffer mapping and buffer size of individual buffer. This attribute combined with pfc attribute allows advanced user to fine tune the qos setting for specific priority queue. For example, user can give dedicated buffer for one or more priorities or user can give large buffer to certain priorities. The dcb buffer configuration will be controlled by lldptool. lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6 maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6 lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0 sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to choose dcbnl over devlink interface since this feature is intended to set port attributes which are governed by the netdev instance of that port, where devlink API is more suitable for global ASIC configurations. We present an use case scenario where dcbnl buffer attribute configured by advance user helps reduce the latency of messages of different sizes. Scenarios description: On ConnectX-5, we run latency sensitive traffic with small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive traffic with large messages sizes 512KB and 1MB. We group small, medium, and large message sizes to their own pfc enables priorities as follow. Priorities 1 & 2 (64B, 256B and 1KB) Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB) Priorities 5 & 6 (512KB and 1MB) By default, ConnectX-5 maps all pfc enabled priorities to a single lossless fixed buffer size of 50% of total available buffer space. The other 50% is assigned to lossy buffer. Using dcbnl buffer attribute, we create three equal size lossless buffers. Each buffer has 25% of total available buffer space. Thus, the lossy buffer size reduces to 25%. Priority to lossless buffer mappings are set as follow. Priorities 1 & 2 on lossless buffer #1 Priorities 3 & 4 on lossless buffer #2 Priorities 5 & 6 on lossless buffer #3 We observe improvements in latency for small and medium message sizes as follows. Please note that the large message sizes bandwidth performance is reduced but the total bandwidth remains the same. 256B message size (42 % latency reduction) 4K message size (21% latency reduction) 64K message size (16% latency reduction) CC: Ido Schimmel <idosch@idosch.org> CC: Jakub Kicinski <jakub.kicinski@netronome.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Or Gerlitz <gerlitz.or@gmail.com> CC: Parav Pandit <parav@mellanox.com> CC: Aron Silverton <aron.silverton@oracle.com> Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2018-05-24 14:22:59 -07:00
Linus Torvalds	34b48b8789	Merge candidates for 4.17-rc - Remove bouncing addresses from the MAINTAINERS file - Kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and hns drivers - Various small LOC behavioral/operational bugs in mlx5, hns, qedr and i40iw drivers - Two fixes for patches already sent during the merge window - A long standing bug related to not decreasing the pinned pages count in the right MM was found and fixed -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCgAGBQJbByPQAAoJEDht9xV+IJsa164P/AihB/vbn9MBdK3pe1OSUGTm tKZJ/Y6nY/Q/XTJSeM2wNECk8fOrZbKuLBz2XlPRsB2djp4ugC5WWfK9YbwWMGXG I5B/lB8VTorQr8E5i9lqqMDQc8aF8VcGJtdqVE3nD4JsVTrQSGiSnw45/BARDUm3 OycJJMDOWhDj2wnNSa+JfjPemIMDM1jse7DnsJfDsGfTMS/G+6nyzjKIlEnnFZ8/ PBxhq0q7C5viNDwwn2GsAVUrATTlW48SY0WYhkgMdSl20d2th9wMZqNMqtniz8NP lg87SrhzsAPOTlbSWlYYkAnzE7nEhfJyIfYUp2piNJeYuOohYPtO6w99Tqjl/GmU uLIYIXtZCxAK1Zb/znc49HkRVL5YFDsQGXdtYy7tvRZPwwR32kowUtpKIWaZFz8O BA/x+Zgqu9AlwqSWwQwxmMbUX42RRwhNJDVyTYlXQSSzhfgFaLIZARqb4K6HxeNN vZN0BK+x6pX6FI7hpdsqNRtH1oo4SNUBxiuUsrZ7cy7GqYNdUJ6piygDgmERaJxU svIUJof/+OoU1QyErQ0JgUEK/3jOHbjxSPb/rjQeqxAnCqhaGOuNGMtdfsGqgvBU x/u3eDcbfi/LBErXR46gYtxnOQ8I2BB+m8erUc/GVvCzWrX+R7ELZYpBrP5Pcu/6 mr2D7hDqgZHbeU8aB8+D =uFZh -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fixes from Jason Gunthorpe: "This is pretty much just the usual array of smallish driver bugs. - remove bouncing addresses from the MAINTAINERS file - kernel oops and bad error handling fixes for hfi, i40iw, cxgb4, and hns drivers - various small LOC behavioral/operational bugs in mlx5, hns, qedr and i40iw drivers - two fixes for patches already sent during the merge window - a long-standing bug related to not decreasing the pinned pages count in the right MM was found and fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (28 commits) RDMA/hns: Move the location for initializing tmp_len RDMA/hns: Bugfix for cq record db for kernel IB/uverbs: Fix uverbs_attr_get_obj RDMA/qedr: Fix doorbell bar mapping for dpi > 1 IB/umem: Use the correct mm during ib_umem_release iw_cxgb4: Fix an error handling path in 'c4iw_get_dma_mr()' RDMA/i40iw: Avoid panic when reading back the IRQ affinity hint RDMA/i40iw: Avoid reference leaks when processing the AEQ RDMA/i40iw: Avoid panic when objects are being created and destroyed RDMA/hns: Fix the bug with NULL pointer RDMA/hns: Set NULL for __internal_mr RDMA/hns: Enable inner_pa_vld filed of mpt RDMA/hns: Set desc_dma_addr for zero when free cmq desc RDMA/hns: Fix the bug with rq sge RDMA/hns: Not support qp transition from reset to reset for hip06 RDMA/hns: Add return operation when configured global param fail RDMA/hns: Update convert function of endian format RDMA/hns: Load the RoCE dirver automatically RDMA/hns: Bugfix for rq record db for kernel RDMA/hns: Add rq inline flags judgement ...	2018-05-24 14:12:05 -07:00
Heiner Kallweit	87e5808d52	net: phy: replace bool members in struct phy_device with bit-fields In struct phy_device we have a number of flags being defined as type bool. Similar to e.g. struct pci_dev we can save some space by using bit-fields. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-05-24 15:35:58 -04:00
Linus Torvalds	d7b66b4ab0	for-4.17-rc6-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlsG/ecACgkQxWXV+ddt WDvr3w/8D12pwR9sPcEwxD4pvoLv7LP1VRQy2u+ivSifdBD7MueKh3y0igUMyARR LERsK0zUsTQGkkC6c7ZYd4cT9PikPpXtO1P9iATFAKqR/YMDIV/haSqT8DwbI/qb 7F+ZMeTy1LzL01YlYBrGVDxP8AWVO2Dml6JolYxzplILSLvdPH6G8xOSjei/p9sm RK5ERHJENEI0l/cThpiLoAEWjzciPtR39T5Hq45onHyCs3bjJCcx51/QE8sBsl8x +BKvCmL40UKd30YKudJZYDM6NgMgWENhfTtIZQIInv99sMNCxIgTEUdX8ExdyjRZ 24rst/BuQz4d8r/8zqE/hdFsHRGWwnEiYmGWylanPY5KdQ41ULfXC06xuoNOLoW8 KQwD8SWv+W5vEJW0UQz5cb3vUgv5RnUzPvcmMfSztLeo2K4zj6zCK5L6XJwIJNbM 1AJR7R4TRkQdf5QEeziFl738Yv1AgsPQuKSiiFa9YwXMLU8dYXlx14ioUzBL8MLe 1wZPJ03x/N7eKJ0g6OIAAVfUTFFejv4Z2B2IDoObuLLsPwTdK6tS+9tJ5mos7ngG Vf1ZVmhmeJdw1qwK8ROzAJHkK807KgGO7LWmA7tIVLwWuZX14F7xLQIg3Ux3MhIh NhoBTFy2AGmdE0hFYv/4FA5dnUOU4VTVYVw3QUV4DMc0XIodZrE= =iYyx -----END PGP SIGNATURE----- Merge tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A one-liner that prevents leaking an internal error value 1 out of the ftruncate syscall. This has been observed in practice. The steps to reproduce make a common pattern (open/write/fync/ftruncate) but also need the application to not check only for negative values and happens only for compressed inlined files. The conditions are narrow but as this could break userspace I think it's better to merge it now and not wait for the merge window" * tag 'for-4.17-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix error handling in btrfs_truncate()	2018-05-24 11:47:43 -07:00
Lukas Wunner	009f8c90f5	ALSA: hda - Fix runtime PM Before commit `3b5b899ca6` ("ALSA: hda: Make use of core codec functions to sync power state"), hda_set_power_state() returned the response to the Get Power State verb, a 32-bit unsigned integer whose expected value is 0x233 after transitioning a codec to D3, and 0x0 after transitioning it to D0. The response value is significant because hda_codec_runtime_suspend() does not clear the codec's bit in the codec_powered bitmask unless the AC_PWRST_CLK_STOP_OK bit (0x200) is set in the response value. That in turn prevents the HDA controller from runtime suspending because azx_runtime_idle() checks that the codec_powered bitmask is zero. Since commit `3b5b899ca6`, hda_set_power_state() only returns 0x0 or 0x1, thereby breaking runtime PM for any HDA controller. That's because an inline function introduced by the commit returns a bool instead of a 32-bit unsigned int. The change was likely erroneous and resulted from copying and pasting snd_hda_check_power_state(), which is immediately preceding the newly introduced inline function. Fix it. Link: https://bugs.freedesktop.org/show_bug.cgi?id=106597 Fixes: `3b5b899ca6` ("ALSA: hda: Make use of core codec functions to sync power state") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Abhijeet Kumar <abhijeet.kumar@intel.com> Reported-and-tested-by: Gunnar Krüger <taijian@posteo.de> Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>	2018-05-24 20:16:47 +02:00
Joonsoo Kim	d883c6cf3b	Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" This reverts the following commits that change CMA design in MM. `3d2054ad8c` ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y") `1d47a3ec09` ("mm/cma: remove ALLOC_CMA") `bad8c6c0b1` ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE") Ville reported a following error on i386. Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) microcode: microcode updated early to revision 0x4, date = 2013-06-28 Initializing CPU#0 Initializing HighMem for node 0 (000377fe:00118000) Initializing Movable for node 0 (00000001:00118000) BUG: Bad page state in process swapper pfn:377fe page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0 flags: 0x80000000() raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145 Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013 Call Trace: dump_stack+0x60/0x96 bad_page+0x9a/0x100 free_pages_check_bad+0x3f/0x60 free_pcppages_bulk+0x29d/0x5b0 free_unref_page_commit+0x84/0xb0 free_unref_page+0x3e/0x70 __free_pages+0x1d/0x20 free_highmem_page+0x19/0x40 add_highpages_with_active_regions+0xab/0xeb set_highmem_pages_init+0x66/0x73 mem_init+0x1b/0x1d7 start_kernel+0x17a/0x363 i386_start_kernel+0x95/0x99 startup_32_smp+0x164/0x168 The reason for this error is that the span of MOVABLE_ZONE is extended to whole node span for future CMA initialization, and, normal memory is wrongly freed here. I submitted the fix and it seems to work, but, another problem happened. It's so late time to fix the later problem so I decide to reverting the series. Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Acked-by: Laura Abbott <labbott@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-05-24 10:07:50 -07:00

... 2 3 4 5 6 ...

755631 Commits