linux

Author	SHA1	Message	Date
Jesper Dangaard Brouer	9ce6146ec7	virtio_net: Add XDP frame size in two code paths The virtio_net driver is running inside the guest-OS. There are two XDP receive code-paths in virtio_net, namely receive_small() and receive_mergeable(). The receive_big() function does not support XDP. In receive_small() the frame size is available in buflen. The buffer backing these frames are allocated in add_recvbuf_small() with same size, except for the headroom, but tailroom have reserved room for skb_shared_info. The headroom is encoded in ctx pointer as a value. In receive_mergeable() the frame size is more dynamic. There are two basic cases: (1) buffer size is based on a exponentially weighted moving average (see DECLARE_EWMA) of packet length. Or (2) in case virtnet_get_headroom() have any headroom then buffer size is PAGE_SIZE. The ctx pointer is this time used for encoding two values; the buffer len "truesize" and headroom. In case (1) if the rx buffer size is underestimated, the packet will have been split over more buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of buffer area). If that happens the XDP path does a xdp_linearize_page operation. V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang. The code is really hard to follow, so some hints to reviewers. The receive_mergeable() case gets frames that were allocated in add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(), and 'buf' ptr is advanced this headroom. The headroom can only be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really simple: static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; } As frame_sz is an offset size from xdp.data_hard_start, reviewers should notice how this is calculated in receive_mergeable(): int offset = buf - page_address(page); [...] data = page_address(xdp_page) + offset; xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len; The calculated offset will always be VIRTIO_XDP_HEADROOM when reaching this code. Thus, xdp.data_hard_start will be page-start address plus vi->hdr_len. Given this xdp.frame_sz need to be reduced with vi->hdr_len size. IMHO a followup patch should cleanup this code to make it easier to maintain and understand, but it is outside the scope of this patchset. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945344436.97035.9445115070189151680.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	05afee298a	vhost_net: Also populate XDP frame size In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start) which contains the buffer length 'buflen' (with tailroom for skb_shared_info). Also storing this buflen in xdp->frame_sz, does not obsolete struct tun_xdp_hdr, as it also contains a struct virtio_net_hdr with other information. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945343928.97035.4620233649151726289.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	fb3e6e9307	tun: Add XDP frame size The tun driver have two code paths for running XDP (bpf_prog_run_xdp). In both cases 'buflen' contains enough tailroom for skb_shared_info. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945343419.97035.9594485183958037621.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	fa6540b8ef	nfp: Add XDP frame size to netronome driver The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM. Thus, adjust for this when setting xdp.frame_sz, as it counts from data_hard_start. When doing XDP_TX this driver is smart and instead of a full DMA-map does a DMA-sync on with packet length. As xdp_adjust_tail can now grow packet length, add checks to make sure that grow size is within the DMA-mapped size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/bpf/158945342911.97035.11214251236208648808.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	c8145b263d	net: thunderx: Add XDP frame size To help reviewers these are the defines related to RCV_FRAG_LEN #define DMA_BUFFER_LEN 1536 /* In multiples of 128bytes */ #define RCV_FRAG_LEN (SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Sunil Goutham <sgoutham@marvell.com> Cc: Robert Richter <rrichter@marvell.com> Link: https://lore.kernel.org/bpf/158945342402.97035.12649844447148990032.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	d201ea9ebc	mlx4: Add XDP frame size and adjust max XDP MTU The mlx4 drivers size of memory backing the RX packet is stored in frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096). For normal mode frag_stride is 2048. Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Link: https://lore.kernel.org/bpf/158945341893.97035.2688142527052329942.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	08fc1cfd2d	ena: Add XDP frame size to amazon NIC driver Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account the reserved tailroom. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Sameeh Jubran <sameehj@amazon.com> Cc: Arthur Kiyanovski <akiyano@amazon.com> Link: https://lore.kernel.org/bpf/158945341384.97035.907403694833419456.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	c88c35181d	net: ethernet: ti: Add XDP frame size to driver cpsw The driver code cpsw.c and cpsw_new.c both use page_pool with default order-0 pages or their RX-pages. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/158945340875.97035.752144756428532878.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	bc1c5745d7	qlogic/qede: Add XDP frame size to driver The driver qede uses a full page, when XDP is enabled. The drivers value in rx_buf_seg_size (struct qede_rx_queue) will be PAGE_SIZE when an XDP bpf_prog is attached. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Ariel Elior <aelior@marvell.com> Cc: GR-everest-linux-l2@marvell.com Link: https://lore.kernel.org/bpf/158945340366.97035.7764939691580349618.stgit@firesoul	2020-05-14 21:21:55 -07:00
Jesper Dangaard Brouer	7358877ac1	hv_netvsc: Add XDP frame size to driver The hyperv NIC driver does memory allocation and copy even without XDP. In XDP mode it will allocate a new page for each packet and copy over the payload, before invoking the XDP BPF-prog. The positive thing it that its easy to determine the xdp.frame_sz. The XDP implementation for hv_netvsc transparently passes xdp_prog to the associated VF NIC. Many of the Azure VMs are using SRIOV, so majority of the data are actually processed directly on the VF driver's XDP path. So the overhead of the synthetic data path (hv_netvsc) is minimal. Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the SKB via build_skb (based on the newly allocated page). Now using XDP frame_sz this will provide more skb_tailroom, which netstack can use for SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce). V3: Adjust patch desc to be more positive. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Wei Liu <wei.liu@kernel.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Link: https://lore.kernel.org/bpf/158945339857.97035.10212138582505736163.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	4a9b052a59	dpaa2-eth: Add XDP frame size The dpaa2-eth driver reserve some headroom used for hardware and software annotation area in RX/TX buffers. Thus, xdp.data_hard_start doesn't start at page boundary. When XDP is configured the area reserved via dpaa2_fd_get_offset(fd) is 448 bytes of which XDP have reserved 256 bytes. As frame_sz is calculated as an offset from xdp_buff.data_hard_start, an adjust from the full PAGE_SIZE == DPAA2_ETH_RX_BUF_RAW_SIZE. When doing XDP_REDIRECT, the driver doesn't need this reserved headroom any-longer and allows xdp_do_redirect() to use it. This is an advantage for the drivers own ndo-xdp_xmit, as it uses part of this headroom for itself. Patch also adjust frame_sz in this case. The driver cannot support XDP data_meta, because it uses the headroom just before xdp.data for struct dpaa2_eth_swa (DPAA2_ETH_SWA_SIZE=64), when transmitting the packet. When transmitting a xdp_frame in dpaa2_eth_xdp_xmit_frame (call via ndo_xdp_xmit) is uses this area to store a pointer to xdp_frame and dma_size, which is used in TX completion (free_tx_fd) to return frame via xdp_return_frame(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com> Link: https://lore.kernel.org/bpf/158945339348.97035.8562488847066908856.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	45a9e6d8a6	veth: Xdp using frame_sz in veth driver The veth driver can run XDP in "native" mode in it's own NAPI handler, and since commit `9fc8d518d9` ("veth: Handle xdp_frames in xdp napi ring") packets can come in two forms either xdp_frame or skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb(). For packets to arrive in xdp_frame format, they will have been redirected from an XDP native driver. In case of XDP_PASS or no XDP-prog attached, the veth driver will allocate and create an SKB. The current code in veth_xdp_rcv_one() xdp_frame case, had to guess the frame truesize of the incoming xdp_frame, when using veth_build_skb(). With xdp_frame->frame_sz this is not longer necessary. Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done similar to the XDP-generic handling code in net/core/dev.c. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Link: https://lore.kernel.org/bpf/158945338840.97035.935897116345700902.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	5c8572251f	veth: Adjust hard_start offset on redirect XDP frames When native XDP redirect into a veth device, the frame arrives in the xdp_frame structure. It is then processed in veth_xdp_rcv_one(), which can run a new XDP bpf_prog on the packet. Doing so requires converting xdp_frame to xdp_buff, but the tricky part is that xdp_frame memory area is located in the top (data_hard_start) memory area that xdp_buff will point into. The current code tried to protect the xdp_frame area, by assigning xdp_buff.data_hard_start past this memory. This results in 32 bytes less headroom to expand into via BPF-helper bpf_xdp_adjust_head(). This protect step is actually not needed, because BPF-helper bpf_xdp_adjust_head() already reserve this area, and don't allow BPF-prog to expand into it. Thus, it is safe to point data_hard_start directly at xdp_frame memory area. Fixes: `9fc8d518d9` ("veth: Handle xdp_frames in xdp napi ring") Reported-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945338331.97035.5923525383710752178.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	db612f749e	xdp: Cpumap redirect use frame_sz and increase skb_tailroom Knowing the memory size backing the packet/xdp_frame data area, and knowing it already have reserved room for skb_shared_info, simplifies using build_skb significantly. With this change we no-longer lie about the SKB truesize, but more importantly a significant larger skb_tailroom is now provided, e.g. when drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce, see TCP cases where tcp_queue_rcv() can 'eat' skb). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945337822.97035.13557959180460986059.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	34cc0b338a	xdp: Xdp_frame add member frame_sz and handle in convert_to_xdp_frame Use hole in struct xdp_frame, when adding member frame_sz, which keeps same sizeof struct (32 bytes) Drivers ixgbe and sfc had bug cases where the necessary/expected tailroom was not reserved. This can lead to some hard to catch memory corruption issues. Having the drivers frame_sz this can be detected when packet length/end via xdp->data_end exceed the xdp_data_hard_end pointer, which accounts for the reserved the tailroom. When detecting this driver issue, simply fail the conversion with NULL, which results in feedback to driver (failing xdp_do_redirect()) causing driver to drop packet. Given the lack of consistent XDP stats, this can be hard to troubleshoot. And given this is a driver bug, we want to generate some more noise in form of a WARN stack dump (to ID the driver code that inlined convert_to_xdp_frame). Inlining the WARN macro is problematic, because it adds an asm instruction (on Intel CPUs ud2) what influence instruction cache prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this and at the same time make identifying the function and line of this inlined function easier. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945337313.97035.10015729316710496600.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	a075767bbd	net: XDP-generic determining XDP frame size The SKB "head" pointer points to the data area that contains skb_shared_info, that can be found via skb_end_pointer(). Given xdp->data_hard_start have been established (basically pointing to skb->head), frame size is between skb_end_pointer() and data_hard_start, plus the size reserved to skb_shared_info. Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive offset number on grow, and negative number on shrink. As this seems more natural when reading the code. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945336804.97035.7164852191163722056.stgit@firesoul	2020-05-14 21:21:54 -07:00
Ilias Apalodimas	495de55f70	net: netsec: Add support for XDP frame size This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that can help reduce the number of cache-lines that need to be flushed when doing DMA sync for_device. Due to xdp_adjust_tail can grow the area accessible to the by the CPU (can possibly write into), then max sync length after bpf_prog_run_xdp() needs to be taken into account. For XDP_TX action the driver is smart and does DMA-sync. When growing tail this is still safe, because page_pool have DMA-mapped the entire page size. Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/bpf/158945336295.97035.15034759661036971024.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	494f44d54e	mvneta: Add XDP frame size to driver This marvell driver mvneta uses PAGE_SIZE frames, which makes it really easy to convert. Driver updates rxq and now frame_sz once per NAPI call. This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that can help reduce the number of cache-lines that need to be flushed when doing DMA sync for_device. Due to xdp_adjust_tail can grow the area accessible to the by the CPU (can possibly write into), then max sync length after bpf_prog_run_xdp() needs to be taken into account. For XDP_TX action the driver is smart and does DMA-sync. When growing tail this is still safe, because page_pool have DMA-mapped the entire page size. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Cc: thomas.petazzoni@bootlin.com Link: https://lore.kernel.org/bpf/158945335786.97035.12714388304493736747.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	983e434518	sfc: Add XDP frame size This driver uses RX page-split when possible. It was recently fixed in commit `86e85bf698` ("sfc: fix XDP-redirect in this driver") to add needed tailroom for XDP-redirect. After the fix efx->rx_page_buf_step is the frame size, with enough head and tail-room for XDP-redirect. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/158945335278.97035.14611425333184621652.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	63fe91ab3d	bnxt: Add XDP frame size to driver This driver uses full PAGE_SIZE pages when XDP is enabled. In case of XDP uses driver uses __bnxt_alloc_rx_page which does full page DMA-map. Thus, xdp_adjust_tail grow is DMA compliant for XDP_TX action that does DMA-sync. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Cc: Michael Chan <michael.chan@broadcom.com> Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com> Link: https://lore.kernel.org/bpf/158945334769.97035.13437970179897613984.stgit@firesoul	2020-05-14 21:21:54 -07:00
Jesper Dangaard Brouer	f95f0f95cf	xdp: Add frame size to xdp_buff XDP have evolved to support several frame sizes, but xdp_buff was not updated with this information. The frame size (frame_sz) member of xdp_buff is introduced to know the real size of the memory the frame is delivered in. When introducing this also make it clear that some tailroom is reserved/required when creating SKBs using build_skb(). It would also have been an option to introduce a pointer to data_hard_end (with reserved offset). The advantage with frame_sz is that (like rxq) drivers only need to setup/assign this value once per NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to store frame_sz inside xdp_rxq_info, because it's varies per packet as it can be based/depend on packet length. V2: nitpick: deduct -> deduce Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/158945334261.97035.555255657490688547.stgit@firesoul	2020-05-14 21:21:54 -07:00
Herbert Xu	9a611a1dce	Revert "ASoC: cros_ec_codec: use crypto_shash_tfm_digest()" This reverts commit `85fc78b80f` as a different fix has already been applied in the sound-asoc tree and this patch is no longer required. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Mark Brown <broonie@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2020-05-15 14:13:46 +10:00
David S. Miller	d00f26b623	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-05-14 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Merged tag 'perf-for-bpf-2020-05-06' from tip tree that includes CAP_PERFMON. 2) support for narrow loads in bpf_sock_addr progs and additional helpers in cg-skb progs, from Andrey. 3) bpf benchmark runner, from Andrii. 4) arm and riscv JIT optimizations, from Luke. 5) bpf iterator infrastructure, from Yonghong. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-14 20:31:21 -07:00
Andre Przywara	59ffe4ed07	dt-bindings: ehci/ohci: Allow iommus property A OHCI/EHCI controller could be behind an IOMMU, in which case an iommus property assigns the stream ID for this device. Allow that property in the DT bindings to fix a complaint about the Arm Juno board's DTS file. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>	2020-05-14 22:17:06 -05:00
Andre Przywara	17b53ce330	dt-bindings: mali-midgard: Allow dma-coherent Add the boolean dma-coherent property to the list of allowed properties, since some boards (Arm Juno) integrate the GPU this way. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>	2020-05-14 22:16:29 -05:00
Andre Przywara	61efb56e30	dt-bindings: arm: gic: Allow combining arm,gic-400 compatible strings The arm,gic-400 compatible is probably the best matching string for the GIC in most modern SoCs, but was only introduced later into the kernel. For historic reasons and to keep compatibility, some SoC DTs were thus using a combination of this name and one of the older strings, which currently the binding denies. Add a stanza to the DT binding to allow "arm,gic-400", followed by either "arm,cortex-a15-gic" or "arm,cortex-a7-gic". This fixes binding compliance for quite some SoC .dtsi files in the kernel tree. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Rob Herring <robh@kernel.org>	2020-05-14 22:16:08 -05:00
Yoshihiro Kaneko	0be4ae7488	dt-bindings: irqchip: renesas-intc-irqpin: Convert to json-schema Convert the Renesas Interrupt Controller (INTC) for external pins Device Tree binding documentation to json-schema. Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Co-developed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> [robh: drop allOf] Signed-off-by: Rob Herring <robh@kernel.org>	2020-05-14 21:48:36 -05:00
Dave Airlie	27db6f7b0a	Merge tag 'drm-intel-fixes-2020-05-13-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - Handle idling during i915_gem_evict_something busy loops (Chris) - Mark current submissions with a weak-dependency (Chris) - Propagate errror from completed fences (Chris) - Fixes on execlist to avoid GPU hang situation (Chris) - Fixes couple deadlocks (Chris) - Timeslice preemption fixes (Chris) - Fix Display Port interrupt handling on Tiger Lake (Imre) - Reduce debug noise around Frame Buffer Compression +(Peter) - Fix logic around IPC W/a for Coffee Lake and Kaby Lake +(Sultan) - Avoid dereferencing a dead context (Chris) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200514040235.GA2164266@intel.com	2020-05-15 12:29:01 +10:00
Dave Airlie	1493bddcca	Merge tag 'drm-misc-next-2020-05-14' of git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 5.8: UAPI Changes: Cross-subsystem Changes: * dma-buf: use atomic64_fetch_add() for context id * Documentation: document bindings for ASUS ZOOT TM5P5, BOE NV133FHM-N62, hpd-gpios Core Changes: Driver Changes: * drm/ast: fix supend; cleanups * drm/i2c: cleanups * drm/panel: add MODULE_LICENSE to panel-visinox-rm69299; add support for ASUS TM5P5i, BOE NV133FHM-N62i; fix size and bpp of BOE NV133FHM-N61 add hpd-gpio to panel-simple * drm/mcde: fix return value check in mcde_dsi_bind() * drm/mgag200: use managed drmm_mode_config_init(); cleanups * fbdev/pxa168fb: cleanups Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20200514070819.GA6930@linux-uq9g	2020-05-15 12:23:25 +10:00
Michael Ellerman	93900337b9	drivers/macintosh: Fix memleak in windfarm_pm112 driver create_cpu_loop() calls smu_sat_get_sdb_partition() which does kmalloc() and returns the allocated buffer. In fact it's called twice, and neither buffer is freed. This results in a memory leak as reported by Erhard: unreferenced object 0xc00000047081f840 (size 32): comm "kwindfarm", pid 203, jiffies 4294880630 (age 5552.877s) hex dump (first 32 bytes): c8 06 02 7f ff 02 ff 01 fb bf 00 41 00 20 00 00 ...........A. .. 00 07 89 37 00 a0 00 00 00 00 00 00 00 00 00 00 ...7............ backtrace: [<0000000083f0a65c>] .smu_sat_get_sdb_partition+0xc4/0x2d0 [windfarm_smu_sat] [<000000003010fcb7>] .pm112_wf_notify+0x104c/0x13bc [windfarm_pm112] [<00000000b958b2dd>] .notifier_call_chain+0xa8/0x180 [<0000000070490868>] .blocking_notifier_call_chain+0x64/0x90 [<00000000131d8149>] .wf_thread_func+0x114/0x1a0 [<000000000d54838d>] .kthread+0x13c/0x190 [<00000000669b72bc>] .ret_from_kernel_thread+0x58/0x64 unreferenced object 0xc0000004737089f0 (size 16): comm "kwindfarm", pid 203, jiffies 4294880879 (age 5552.050s) hex dump (first 16 bytes): c4 04 01 7f 22 11 e0 e6 ff 55 7b 12 ec 11 00 00 ...."....U{..... backtrace: [<0000000083f0a65c>] .smu_sat_get_sdb_partition+0xc4/0x2d0 [windfarm_smu_sat] [<00000000b94ef7e1>] .pm112_wf_notify+0x1294/0x13bc [windfarm_pm112] [<00000000b958b2dd>] .notifier_call_chain+0xa8/0x180 [<0000000070490868>] .blocking_notifier_call_chain+0x64/0x90 [<00000000131d8149>] .wf_thread_func+0x114/0x1a0 [<000000000d54838d>] .kthread+0x13c/0x190 [<00000000669b72bc>] .ret_from_kernel_thread+0x58/0x64 Fix it by rearranging the logic so we deal with each buffer separately, which then makes it easy to free the buffer once we're done with it. Fixes: `ac171c4666` ("[PATCH] powerpc: Thermal control for dual core G5s") Cc: stable@vger.kernel.org # v2.6.16+ Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Erhard F. <erhard_f@mailbox.org> Link: https://lore.kernel.org/r/20200423060038.3308530-1-mpe@ellerman.id.au	2020-05-15 11:58:56 +10:00
Michael Ellerman	7481cad474	selftests/powerpc: Add a test of counting larx/stcx This is based on the count_instructions test. However this one also counts the number of failed stcx's, and in conjunction with knowing the size of the stcx loop, can calculate the total number of instructions executed even in the face of non-deterministic stcx failures. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200426114410.3917383-1-mpe@ellerman.id.au	2020-05-15 11:58:55 +10:00
Michael Ellerman	24ac99e97f	powerpc: Drop unneeded cast in task_pt_regs() There's no need to cast in task_pt_regs() as tsk->thread.regs should already be a struct pt_regs. If someone's using task_pt_regs() on something that's not a task but happens to have a thread.regs then we'll deal with them later. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200428123152.73566-1-mpe@ellerman.id.au	2020-05-15 11:58:55 +10:00
Michael Ellerman	7ffa8b7dc1	powerpc/64: Don't initialise init_task->thread.regs Aneesh increased the size of struct pt_regs by 16 bytes and started seeing this WARN_ON: smp: Bringing up secondary CPUs ... ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at arch/powerpc/kernel/process.c:455 giveup_all+0xb4/0x110 Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+ #318 NIP: c00000000001a2b4 LR: c00000000001a29c CTR: c0000000031d0000 REGS: c0000000026d3980 TRAP: 0700 Not tainted (5.7.0-rc2-gcc-8.2.0-1.g8f6a41f-default+) MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 48048224 XER: 00000000 CFAR: c000000000019cc8 IRQMASK: 1 GPR00: c00000000001a264 c0000000026d3c20 c0000000026d7200 800000000280b033 GPR04: 0000000000000001 0000000000000000 0000000000000077 30206d7372203164 GPR08: 0000000000002000 0000000002002000 800000000280b033 3230303030303030 GPR12: 0000000000008800 c0000000031d0000 0000000000800050 0000000002000066 GPR16: 000000000309a1a0 000000000309a4b0 000000000309a2d8 000000000309a890 GPR20: 00000000030d0098 c00000000264da40 00000000fd620000 c0000000ff798080 GPR24: c00000000264edf0 c0000001007469f0 00000000fd620000 c0000000020e5e90 GPR28: c00000000264edf0 c00000000264d200 000000001db60000 c00000000264d200 NIP [c00000000001a2b4] giveup_all+0xb4/0x110 LR [c00000000001a29c] giveup_all+0x9c/0x110 Call Trace: [c0000000026d3c20] [c00000000001a264] giveup_all+0x64/0x110 (unreliable) [c0000000026d3c90] [c00000000001ae34] __switch_to+0x104/0x480 [c0000000026d3cf0] [c000000000e0b8a0] __schedule+0x320/0x970 [c0000000026d3dd0] [c000000000e0c518] schedule_idle+0x38/0x70 [c0000000026d3df0] [c00000000019c7c8] do_idle+0x248/0x3f0 [c0000000026d3e70] [c00000000019cbb8] cpu_startup_entry+0x38/0x40 [c0000000026d3ea0] [c000000000011bb0] rest_init+0xe0/0xf8 [c0000000026d3ed0] [c000000002004820] start_kernel+0x990/0x9e0 [c0000000026d3f90] [c00000000000c49c] start_here_common+0x1c/0x400 Which was unexpected. The warning is checking the thread.regs->msr value of the task we are switching from: usermsr = tsk->thread.regs->msr; ... WARN_ON((usermsr & MSR_VSX) && !((usermsr & MSR_FP) && (usermsr & MSR_VEC))); ie. if MSR_VSX is set then both of MSR_FP and MSR_VEC are also set. Dumping tsk->thread.regs->msr we see that it's: 0x1db60000 Which is not a normal looking MSR, in fact the only valid bit is MSR_VSX, all the other bits are reserved in the current definition of the MSR. We can see from the oops that it was swapper/0 that we were switching from when we hit the warning, ie. init_task. So its thread.regs points to the base (high addresses) in init_stack. Dumping the content of init_task->thread.regs, with the members of pt_regs annotated (the 16 bytes larger version), we see: 0000000000000000 c000000002780080 gpr[0] gpr[1] 0000000000000000 c000000002666008 gpr[2] gpr[3] c0000000026d3ed0 0000000000000078 gpr[4] gpr[5] c000000000011b68 c000000002780080 gpr[6] gpr[7] 0000000000000000 0000000000000000 gpr[8] gpr[9] c0000000026d3f90 0000800000002200 gpr[10] gpr[11] c000000002004820 c0000000026d7200 gpr[12] gpr[13] 000000001db60000 c0000000010aabe8 gpr[14] gpr[15] c0000000010aabe8 c0000000010aabe8 gpr[16] gpr[17] c00000000294d598 0000000000000000 gpr[18] gpr[19] 0000000000000000 0000000000001ff8 gpr[20] gpr[21] 0000000000000000 c00000000206d608 gpr[22] gpr[23] c00000000278e0cc 0000000000000000 gpr[24] gpr[25] 000000002fff0000 c000000000000000 gpr[26] gpr[27] 0000000002000000 0000000000000028 gpr[28] gpr[29] 000000001db60000 0000000004750000 gpr[30] gpr[31] 0000000002000000 000000001db60000 nip msr 0000000000000000 0000000000000000 orig_r3 ctr c00000000000c49c 0000000000000000 link xer 0000000000000000 0000000000000000 ccr softe 0000000000000000 0000000000000000 trap dar 0000000000000000 0000000000000000 dsisr result 0000000000000000 0000000000000000 ppr kuap 0000000000000000 0000000000000000 pad[2] pad[3] This looks suspiciously like stack frames, not a pt_regs. If we look closely we can see return addresses from the stack trace above, c000000002004820 (start_kernel) and c00000000000c49c (start_here_common). init_task->thread.regs is setup at build time in processor.h: #define INIT_THREAD { \ .ksp = INIT_SP, \ .regs = (struct pt_regs )INIT_SP - 1, / XXX bogus, I think / \ The early boot code where we setup the initial stack is: LOAD_REG_ADDR(r3,init_thread_union) / set up a stack pointer */ LOAD_REG_IMMEDIATE(r1,THREAD_SIZE) add r1,r3,r1 li r0,0 stdu r0,-STACK_FRAME_OVERHEAD(r1) Which creates a stack frame of size 112 bytes (STACK_FRAME_OVERHEAD). Which is far too small to contain a pt_regs. So the result is init_task->thread.regs is pointing at some stack frames on the init stack, not at a pt_regs. We have gotten away with this for so long because with pt_regs at its current size the MSR happens to point into the first frame, at a location that is not written to by the early asm. With the 16 byte expansion the MSR falls into the second frame, which is used by the compiler, and collides with a saved register that tends to be non-zero. As far as I can see this has been wrong since the original merge of 64-bit ppc support, back in 2002. Conceptually swapper should have no regs, it never entered from userspace, and in fact that's what we do on 32-bit. It's also presumably what the "bogus" comment is referring to. So I think the right fix is to just not-initialise regs at all. I'm slightly worried this will break some code that isn't prepared for a NULL regs, but we'll have to see. Remove the comment in head_64.S which refers to us setting up the regs (even though we never did), and is otherwise not really accurate any more. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200428123130.73078-1-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Gustavo A. R. Silva	02bddf21c3	powerpc/mm: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507185755.GA15014@embeddedor	2020-05-15 11:58:54 +10:00
Gustavo A. R. Silva	0f6be41c60	powerpc: Replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507185749.GA14994@embeddedor	2020-05-15 11:58:54 +10:00
Nicholas Piggin	4e0e45b07d	powerpc: Use trap metadata to prevent double restart rather than zeroing trap It's not very nice to zero trap for this, because then system calls no longer have trap_is_syscall(regs) invariant, and we can't distinguish between sc and scv system calls (in a later patch). Take one last unused bit from the low bits of the pt_regs.trap word for this instead. There is not a really good reason why it should be in trap as opposed to another field, but trap has some concept of flags and it exists. Ideally I think we would move trap to 2-byte field and have 2 more bytes available independently. Add a selftests case for this, which can be seen to fail if trap_norestart() is changed to return false. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make them static inlines] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-4-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	912237ea16	powerpc: trap_is_syscall() helper to hide syscall trap number A new system call interrupt will be added with a new trap number. Hide the explicit 0xc00 test behind an accessor to reduce churn in callers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make it a static inline] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-3-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	db30144b5c	powerpc: Use set_trap() and avoid open-coding trap masking The pt_regs.trap field keeps 4 low bits for some metadata about the trap or how it was handled, which is masked off in order to test the architectural trap number. Add a set_trap() accessor to set this, equivalent to TRAP() for returning it. This is actually not quite the equivalent of TRAP() because it always clears the low bits, which may be harmless if it can only be updated via ptrace syscall, but it seems dangerous. In fact settting TRAP from ptrace doesn't seem like a great idea so maybe it's better deleted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Make it a static inline rather than a shouty macro] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-2-mpe@ellerman.id.au	2020-05-15 11:58:54 +10:00
Nicholas Piggin	feb9df3462	powerpc/64s: Always has full regs, so remove remnant checks Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200507121332.2233629-1-mpe@ellerman.id.au	2020-05-15 11:58:53 +10:00
Alexei Starovoitov	b92d44b5c2	Merge branch 'expand-cg_skb-helpers' Andrey Ignatov says: ==================== v2->v3: - better documentation for bpf_sk_cgroup_id in uapi (Yonghong Song) - save/restore errno in network helpers (Yonghong Song) - cleanup leftover after switching selftest to skeleton (Yonghong Song) - switch from map to skel->bss in selftest (Yonghong Song) v1->v2: - switch selftests to skeleton. This patch set allows a bunch of existing sk lookup and skb cgroup id helpers, and adds two new bpf_sk_{,ancestor_}cgroup_id helpers to be used in cgroup skb programs. It fills the gap to cover a use-case to apply intra-host cgroup-bpf network policy based on a source cgroup a packet comes from. For example, there can be multiple containers A, B, C running on a host. Every such container runs in its own cgroup that can have multiple sub-cgroups. But all these containers can share some IP addresses. At the same time container A wants to have a policy for a server S running in it so that only clients from this same container can connect to S, but not from other containers (such as B, C). Source IP address can't be used to decide whether to allow or deny a packet, but it looks reasonable to filter by cgroup id. The patch set allows to implement the following policy: * when an ingress packet comes to container's cgroup, lookup peer (client) socket this packet comes from; * having peer socket, get its cgroup id; * compare peer cgroup id with self cgroup id and allow packet only if they match, i.e. it comes from same cgroup; * the "sub-cgroup" part of the story can be addressed by getting not direct cgroup id of the peer socket, but ancestor cgroup id on specified level, similar to existing "ancestor" flavors of cgroup id helpers. A newly introduced selftest implements such a policy in its basic form to provide a better idea on the use-case. Patch 1 allows existing sk lookup helpers in cgroup skb. Patch 2 allows skb_ancestor_cgroup_id in cgrou skb. Patch 3 introduces two new helpers to get cgroup id of socket. Patch 4 extends network helpers to use them in the next patch. Patch 5 adds selftest / example of use-case. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2020-05-14 18:42:02 -07:00
Andrey Ignatov	68e916bc8d	selftests/bpf: Test for sk helpers in cgroup skb Test bpf_sk_lookup_tcp, bpf_sk_release, bpf_sk_cgroup_id and bpf_sk_ancestor_cgroup_id helpers from cgroup skb program. The test creates a testing cgroup, starts a TCPv6 server inside the cgroup and creates two client sockets: one inside testing cgroup and one outside. Then it attaches cgroup skb program to the cgroup that checks all TCP segments coming to the server and allows only those coming from the cgroup of the server. If a segment comes from a peer outside of the cgroup, it'll be dropped. Finally the test checks that client from inside testing cgroup can successfully connect to the server, but client outside the cgroup fails to connect by timeout. The main goal of the test is to check newly introduced bpf_sk_{,ancestor_}cgroup_id helpers. It also checks a couple of socket lookup helpers (tcp & release), but lookup helpers were introduced much earlier and covered by other tests. Here it's mostly checked that they can be called from cgroup skb. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/171f4c5d75e8ff4fe1c4e8c1c12288b5240a4549.1589486450.git.rdna@fb.com	2020-05-14 18:41:08 -07:00
Andrey Ignatov	383724e17a	selftests/bpf: Add connect_fd_to_fd, connect_wait net helpers Add two new network helpers. connect_fd_to_fd connects an already created client socket fd to address of server fd. Sometimes it's useful to separate client socket creation and connecting this socket to a server, e.g. if client socket has to be created in a cgroup different from that of server cgroup. Additionally connect_to_fd is now implemented using connect_fd_to_fd, both helpers don't treat EINPROGRESS as an error and let caller decide how to proceed with it. connect_wait is a helper to work with non-blocking client sockets so that if connect_to_fd or connect_fd_to_fd returned -1 with errno == EINPROGRESS, caller can wait for connect to finish or for connection timeout. The helper returns -1 on error, 0 on timeout (1sec, hard-coded), and positive number on success. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/1403fab72300f379ca97ead4820ae43eac4414ef.1589486450.git.rdna@fb.com	2020-05-14 18:41:08 -07:00
Andrey Ignatov	f307fa2cb4	bpf: Introduce bpf_sk_{, ancestor_}cgroup_id helpers With having ability to lookup sockets in cgroup skb programs it becomes useful to access cgroup id of retrieved sockets so that policies can be implemented based on origin cgroup of such socket. For example, a container running in a cgroup can have cgroup skb ingress program that can lookup peer socket that is sending packets to a process inside the container and decide whether those packets should be allowed or denied based on cgroup id of the peer. More specifically such ingress program can implement intra-host policy "allow incoming packets only from this same container and not from any other container on same host" w/o relying on source IP addresses since quite often it can be the case that containers share same IP address on the host. Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and bpf_sk_ancestor_cgroup_id(). These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id helpers with the only difference that sk is used to get cgroup id instead of skb, and share code with them. See documentation in UAPI for more details. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com	2020-05-14 18:41:07 -07:00
Andrey Ignatov	06d3e4c9f1	bpf: Allow skb_ancestor_cgroup_id helper in cgroup skb cgroup skb programs already can use bpf_skb_cgroup_id. Allow bpf_skb_ancestor_cgroup_id as well so that container policies can be implemented for a container that can have sub-cgroups dynamically created, but policies should still be implemented based on cgroup id of container itself not on an id of a sub-cgroup. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/8874194d6041eba190356453ea9f6071edf5f658.1589486450.git.rdna@fb.com	2020-05-14 18:41:07 -07:00
Andrey Ignatov	d56c2f95ad	bpf: Allow sk lookup helpers in cgroup skb Currently sk lookup helpers are allowed in tc, xdp, sk skb, and cgroup sock_addr programs. But they would be useful in cgroup skb as well so that for example cgroup skb ingress program can lookup a peer socket a packet comes from on same host and make a decision whether to allow or deny this packet based on the properties of that socket, e.g. cgroup that peer socket belongs to. Allow the following sk lookup helpers in cgroup skb: * bpf_sk_lookup_tcp; * bpf_sk_lookup_udp; * bpf_sk_release; * bpf_skc_lookup_tcp. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/f8c7ee280f1582b586629436d777b6db00597d63.1589486450.git.rdna@fb.com	2020-05-14 18:41:07 -07:00
Colin Ian King	5b0004d92b	selftest/bpf: Fix spelling mistake "SIGALARM" -> "SIGALRM" There is a spelling mistake in an error message, fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200514121529.259668-1-colin.king@canonical.com	2020-05-14 18:39:06 -07:00
Andrii Nakryiko	c70f34a8ac	bpf: Fix bpf_iter's task iterator logic task_seq_get_next might stop prematurely if get_pid_task() fails to get task_struct. Failure to do so doesn't mean that there are no more tasks with higher pids. Procfs's iteration algorithm (see next_tgid in fs/proc/base.c) does a retry in such case. After this fix, instead of stopping prematurely after about 300 tasks on my server, bpf_iter program now returns >4000, which sounds much closer to reality. Fixes: `eaaacd2391` ("bpf: Add task and task/file iterator targets") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200514055137.1564581-1-andriin@fb.com	2020-05-14 18:37:32 -07:00
Andrey Ignatov	0645f7eb6f	selftests/bpf: Test narrow loads for bpf_sock_addr.user_port Test 1,2,4-byte loads from bpf_sock_addr.user_port in sock_addr programs. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/e5c734a58cca4041ab30cb5471e644246f8cdb5a.1589420814.git.rdna@fb.com	2020-05-14 18:30:57 -07:00
Andrey Ignatov	7aebfa1b38	bpf: Support narrow loads from bpf_sock_addr.user_port bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly code in BPF programs, like: volatile __u32 user_port = ctx->user_port; __u16 port = bpf_ntohs(user_port); Since otherwise clang may optimize the load to be 2-byte and it's rejected by verifier. Add support for 1- and 2-byte loads same way as it's supported for other fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com	2020-05-14 18:30:57 -07:00
Zhou Wang	528443e32a	arm64: defconfig: Enable UACCE/PCI PASID/SEC2/HPRE configs Enable configs for UACCE, PCI PASID, HiSilicon SEC2 and HPRE drivers. Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Wei Xu <xuwei5@hisilicon.com>	2020-05-15 09:29:47 +08:00

... 473 474 475 476 477 ...

948892 Commits