linux

Author	SHA1	Message	Date
Mark yao	0b12e9c0e4	drm/rockchip: vop: fix NV12 video display error fixup the scale calculation formula on the case src_height == (dst_height/2). Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494586-6984-1-git-send-email-mark.yao@rock-chips.com	2017-08-04 16:09:39 +08:00
Mark yao	64d7756469	drm/rockchip: vop: fix iommu page fault when resume Iommu would get page fault with following path: vop_disable: 1, disable all windows and set vop config done 2, vop enter to standy, all windows not works, but their registers are not clean, when you read window's enable bit, may found the window is enable. vop_enable: 1, memcpy(vop->regsbak, vop->regs, len) save current vop registers to vop->regsbak, then you can found window is enable on regsbak. 2, VOP_WIN_SET(vop, win, gate, 1); force enable window gate, but gate and enable are on same hardware register, then window enable bit rewrite to vop hardware. 3, vop power on, and vop might try to scan destroyed buffer, then iommu get page fault. Move windows disable after vop regsbak restore, then vop regsbak mechanism would keep tracing the modify, everything would be safe. Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494582-6934-1-git-send-email-mark.yao@rock-chips.com	2017-08-04 16:09:37 +08:00
Mark yao	b5015e92a0	drm/rockchip: vop: no need wait vblank on crtc enable Since atomic framework, crtc enable and disable are in pairs, no need to wait vblank. Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494577-6884-1-git-send-email-mark.yao@rock-chips.com	2017-08-04 16:09:34 +08:00
Mark yao	80c471ea04	drm/rockchip: vop: report error when check resource error The user would be confused while facing a error commit without any error report. Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494596-7090-1-git-send-email-mark.yao@rock-chips.com	2017-08-04 15:39:32 +08:00
Mark yao	79a0b149d4	drm/rockchip: vop: round_up pitches to word align VOP pitch register is word align, need align to word. VOP_WIN0_VIR: bit[31:16] win0_vir_stride_uv Number of words of Win0 uv Virtual width bit[15:0] win0_vir_width Number of words of Win0 yrgb Virtual width ARGB888 : win0_vir_width RGB888 : (win0_vir_width*3/4) + (win0_vir_width%3) RGB565 : ceil(win0_vir_width/2) YUV : ceil(win0_vir_width/4) Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494591-7034-1-git-send-email-mark.yao@rock-chips.com	2017-08-04 15:39:20 +08:00
Mark yao	6f04f5925c	drm/rockchip: vop: fix NV12 video display error fixup the scale calculation formula on the case src_height == (dst_height/2). Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494586-6984-1-git-send-email-mark.yao@rock-chips.com	2017-08-04 15:39:10 +08:00
Mark yao	da6c9bbf41	drm/rockchip: vop: fix iommu page fault when resume Iommu would get page fault with following path: vop_disable: 1, disable all windows and set vop config done 2, vop enter to standy, all windows not works, but their registers are not clean, when you read window's enable bit, may found the window is enable. vop_enable: 1, memcpy(vop->regsbak, vop->regs, len) save current vop registers to vop->regsbak, then you can found window is enable on regsbak. 2, VOP_WIN_SET(vop, win, gate, 1); force enable window gate, but gate and enable are on same hardware register, then window enable bit rewrite to vop hardware. 3, vop power on, and vop might try to scan destroyed buffer, then iommu get page fault. Move windows disable after vop regsbak restore, then vop regsbak mechanism would keep tracing the modify, everything would be safe. Signed-off-by: Mark Yao <mark.yao@rock-chips.com> Reviewed-by: Sandy huang <sandy.huang@rock-chips.com> Link: https://patchwork.freedesktop.org/patch/msgid/1501494582-6934-1-git-send-email-mark.yao@rock-chips.com	2017-08-04 15:38:46 +08:00
Arvind Yadav	d720661291	agp: nvidia: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:50 +10:00
Arvind Yadav	75383dd348	agp: amd64: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:49 +10:00
Arvind Yadav	f2149f0af3	agp: sis: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:48 +10:00
Arvind Yadav	0fa02c658a	agp: efficeon: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:48 +10:00
Arvind Yadav	11cdae9a5f	agp: ati: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:47 +10:00
Arvind Yadav	e4e22911b3	agp: ali: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:47 +10:00
Arvind Yadav	84a6bf7fd7	agp: intel: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:46 +10:00
Arvind Yadav	b8ca53f4d0	agp: amd-k7: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:45 +10:00
Arvind Yadav	ba67a31aac	agp: uninorth: constify pci_device_id. pci_device_id are not supposed to change at runtime. All functions working with pci_device_id provided by <linux/pci.h> work with const pci_device_id. So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-08-04 16:59:43 +10:00
David S. Miller	35615994c1	Merge branch 'socket-sendmsg-zerocopy' Willem de Bruijn says: ==================== socket sendmsg MSG_ZEROCOPY Introduce zerocopy socket send flag MSG_ZEROCOPY. This extends the shared page support (SKBTX_SHARED_FRAG) from sendpage to sendmsg. Implement the feature for TCP initially, as large writes benefit most. On a send call with MSG_ZEROCOPY, the kernel pins user pages and links these directly into the skbuff frags[] array. Each send call with MSG_ZEROCOPY that transmits data will eventually queue a completion notification on the error queue: a per-socket u32 incremented on each such call. A request may have to revert to copy to succeed, for instance when a device cannot support scatter-gather IO. In that case a flag is passed along to notify that the operation succeeded without zerocopy optimization. The implementation extends the existing zerocopy infra for tuntap, vhost and xen with features needed for TCP, notably reference counting to handle cloning on retransmit and GSO. For more details, see also the netdev 2.1 paper and presentation at https://netdevconf.org/2.1/session.html?debruijn Changelog: v3 -> v4: - dropped UDP, RAW and PF_PACKET for now Without loopback support, datagrams are usually smaller than the ~8KB size threshold needed to benefit from zerocopy. - style: a few reverse chrismas tree - minor: SO_ZEROCOPY returns ENOTSUPP on unsupported protocols - minor: squashed SO_EE_CODE_ZEROCOPY_COPIED patch - minor: rebased on top of net-next with kmap_atomic fix v2 -> v3: - fix rebase conflict: SO_ZEROCOPY 59 -> 60 v1 -> v2: - fix (kbuild-bot): do not remove uarg until patch 5 - fix (kbuild-bot): move zerocopy_sg_from_iter doc with function - fix: remove unused extern in header file RFCv2 -> v1: - patch 2 - review comment: in skb_copy_ubufs, always allocate order-0 page, also when replacing compound source pages. - patch 3 - fix: always queue completion notification on MSG_ZEROCOPY, also if revert to copy. - fix: on syscall abort, correctly revert notification state - minor: skip queue notification on SOCK_DEAD - minor: replace BUG_ON with WARN_ON in recoverable error - patch 4 - new: add socket option SOCK_ZEROCOPY. only honor MSG_ZEROCOPY if set, ignore for legacy apps. - patch 5 - fix: clear zerocopy state on skb_linearize - patch 6 - fix: only coalesce if prev errqueue elem is zerocopy - minor: try coalescing with list tail instead of head - minor: merge bytelen limit patch - patch 7 - new: signal when data had to be copied - patch 8 (tcp) - optimize: avoid setting PSH bit when exceeding max frags. that limits GRO on the client. do not goto new_segment. - fix: fail on MSG_ZEROCOPY \| MSG_FASTOPEN - minor: do not wait for memory: does not work for optmem - minor: simplify alloc - patch 9 (udp) - new: add PF_INET6 - fix: attach zerocopy notification even if revert to copy - minor: simplify alloc size arithmetic - patch 10 (raw hdrinc) - new: add PF_INET6 - patch 11 (pf_packet) - minor: simplify slightly - patch 12 - new msg_zerocopy regression test: use veth pair to test all protocols: ipv4/ipv6/packet, tcp/udp/raw, cork all relevant ethtool settings: rx off, sg off all relevant packet lengths: 0, <MAX_HEADER, max size RFC -> RFCv2: - review comment: do not loop skb with zerocopy frags onto rx: add skb_orphan_frags_rx to orphan even refcounted frags call this in __netif_receive_skb_core, deliver_skb and tun: same as commit `1080e512d4` ("net: orphan frags on receive") - fix: hold an explicit sk reference on each notification skb. previously relied on the reference (or wmem) held by the data skb that would trigger notification, but this breaks on skb_orphan. - fix: when aborting a send, do not inc the zerocopy counter this caused gaps in the notification chain - fix: in packet with SOCK_DGRAM, pull ll headers before calling zerocopy_sg_from_iter - fix: if sock_zerocopy_realloc does not allow coalescing, do not fail, just allocate a new ubuf - fix: in tcp, check return value of second allocation attempt - chg: allocate notification skbs from optmem to avoid affecting tcp write queue accounting (TSQ) - chg: limit #locked pages (ulimit) per user instead of per process - chg: grow notification ids from 16 to 32 bit - pass range [lo, hi] through 32 bit fields ee_info and ee_data - chg: rebased to davem-net-next on top of v4.10-rc7 - add: limit notification coalescing sharing ubufs limits overhead, but delays notification until the last packet is released, possibly unbounded. Add a cap. - tests: add snd_zerocopy_lo pf_packet test - tests: two bugfixes (add do_flush_tcp, ++sent not only in debug) Limitations / Known Issues: - TCP may build slightly smaller than max TSO packets due to exceeding MAX_SKB_FRAGS frags when zerocopy pages are unaligned. - All SKBTX_SHARED_FRAG may require additional __skb_linearize or skb_copy_ubufs calls in u32, skb_find_text, similar to skb_checksum_help. Notification skbuffs are allocated from optmem. For sockets that cannot effectively coalesce notifications, the optmem max may need to be increased to avoid hitting -ENOBUFS: sysctl -w net.core.optmem_max=1048576 In application load, copy avoidance shows a roughly 5% systemwide reduction in cycles when streaming large flows and a 4-8% reduction in wall clock time on early tensorflow test workloads. For the single-machine veth tests to succeed, loopback support has to be temporarily enabled by making skb_orphan_frags_rx map to skb_orphan_frags. * Performance The below table shows cycles reported by perf for a netperf process sending a single 10 Gbps TCP_STREAM. The first three columns show Mcycles spent in the netperf process context. The second three columns show time spent systemwide (-a -C A,B) on the two cpus that run the process and interrupt handler. Reported is the median of at least 3 runs. std is a standard netperf, zc uses zerocopy and % is the ratio. Netperf is pinned to cpu 2, network interrupts to cpu3, rps and rfs are disabled and the kernel is booted with idle=halt. NETPERF=./netperf -t TCP_STREAM -H $host -T 2 -l 30 -- -m $size perf stat -e cycles $NETPERF perf stat -C 2,3 -a -e cycles $NETPERF --process cycles-- ----cpu cycles---- std zc % std zc % 4K 27,609 11,217 41 49,217 39,175 79 16K 21,370 3,823 18 43,540 29,213 67 64K 20,557 2,312 11 42,189 26,910 64 256K 21,110 2,134 10 43,006 27,104 63 1M 20,987 1,610 8 42,759 25,931 61 Perf record indicates the main source of these differences. Process cycles only at 1M writes (perf record; perf report -n): std: Samples: 42K of event 'cycles', Event count (approx.): 21258597313 79.41% 33884 netperf [kernel.kallsyms] [k] copy_user_generic_string 3.27% 1396 netperf [kernel.kallsyms] [k] tcp_sendmsg 1.66% 694 netperf [kernel.kallsyms] [k] get_page_from_freelist 0.79% 325 netperf [kernel.kallsyms] [k] tcp_ack 0.43% 188 netperf [kernel.kallsyms] [k] __alloc_skb zc: Samples: 1K of event 'cycles', Event count (approx.): 1439509124 30.36% 584 netperf.zerocop [kernel.kallsyms] [k] gup_pte_range 14.63% 284 netperf.zerocop [kernel.kallsyms] [k] __zerocopy_sg_from_iter 8.03% 159 netperf.zerocop [kernel.kallsyms] [k] skb_zerocopy_add_frags_iter 4.84% 96 netperf.zerocop [kernel.kallsyms] [k] __alloc_skb 3.10% 60 netperf.zerocop [kernel.kallsyms] [k] kmem_cache_alloc_node * Safety The number of pages that can be pinned on behalf of a user with MSG_ZEROCOPY is bound by the locked memory ulimit. While the kernel holds process memory pinned, a process cannot safely reuse those pages for other purposes. Packets looped onto the receive stack and queued to a socket can be held indefinitely. Avoid unbounded notification latency by restricting user pages to egress paths only. skb_orphan_frags_rx() will create a private copy of pages even for refcounted packets when these are looped, as did skb_orphan_frags for the original tun zerocopy implementation. Pages are not remapped read-only. Processes can modify packet contents while packets are in flight in the kernel path. Bytes on which kernel control flow depends (headers) are copied to avoid TOCTTOU attacks. Datapath integrity does not otherwise depend on payload, with three exceptions: checksums, optional sk_filter/tc u32/.. and device + driver logic. The effect of wrong checksums is limited to the misbehaving process. TC filters that access contents may have to be excluded by adding an skb_orphan_frags_rx. Processes can also safely avoid OOM conditions by bounding the number of bytes passed with MSG_ZEROCOPY and by removing shared pages after transmission from their own memory map. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:30 -07:00
Willem de Bruijn	07b65c5b31	test: add msg_zerocopy test Introduce regression test for msg_zerocopy feature. Send traffic from one process to another with and without zerocopy. Evaluate tcp, udp, raw and packet sockets, including variants - udp: corking and corking with mixed copy/zerocopy calls - raw: with and without hdrincl - packet: at both raw and dgram level Test on both ipv4 and ipv6, optionally with ethtool changes to disable scatter-gather, tx checksum or tso offload. All of these can affect zerocopy behavior. The regression test can be run on a single machine if over a veth pair. Then skb_orphan_frags_rx must be modified to be identical to skb_orphan_frags to allow forwarding zerocopy locally. The msg_zerocopy.sh script will setup the veth pair in network namespaces and run all tests. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:30 -07:00
Willem de Bruijn	f214f915e7	tcp: enable MSG_ZEROCOPY Enable support for MSG_ZEROCOPY to the TCP stack. TSO and GSO are both supported. Only data sent to remote destinations is sent without copying. Packets looped onto a local destination have their payload copied to avoid unbounded latency. Tested: A 10x TCP_STREAM between two hosts showed a reduction in netserver process cycles by up to 70%, depending on packet size. Systemwide, savings are of course much less pronounced, at up to 20% best case. msg_zerocopy.sh 4 tcp: without zerocopy tx=121792 (7600 MB) txc=0 zc=n rx=60458 (7600 MB) with zerocopy tx=286257 (17863 MB) txc=286257 zc=y rx=140022 (17863 MB) This test opens a pair of sockets over veth, one one calls send with 64KB and optionally MSG_ZEROCOPY and on the other reads the initial bytes. The receiver truncates, so this is strictly an upper bound on what is achievable. It is more representative of sending data out of a physical NIC (when payload is not touched, either). Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:30 -07:00
Willem de Bruijn	a91dbff551	sock: ulimit on MSG_ZEROCOPY pages Bound the number of pages that a user may pin. Follow the lead of perf tools to maintain a per-user bound on memory locked pages commit `789f90fcf6` ("perf_counter: per user mlock gift") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:30 -07:00
Willem de Bruijn	4ab6c99d99	sock: MSG_ZEROCOPY notification coalescing In the simple case, each sendmsg() call generates data and eventually a zerocopy ready notification N, where N indicates the Nth successful invocation of sendmsg() with the MSG_ZEROCOPY flag on this socket. TCP and corked sockets can cause send() calls to append new data to an existing sk_buff and, thus, ubuf_info. In that case the notification must hold a range. odify ubuf_info to store a inclusive range [N..N+m] and add skb_zerocopy_realloc() to optionally extend an existing range. Also coalesce notifications in this common case: if a notification [1, 1] is about to be queued while [0, 0] is the queue tail, just modify the head of the queue to read [0, 1]. Coalescing is limited to a few TSO frames worth of data to bound notification latency. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:30 -07:00
Willem de Bruijn	1f8b977ab3	sock: enable MSG_ZEROCOPY Prepare the datapath for refcounted ubuf_info. Clone ubuf_info with skb_zerocopy_clone() wherever needed due to skb split, merge, resize or clone. Split skb_orphan_frags into two variants. The split, merge, .. paths support reference counted zerocopy buffers, so do not do a deep copy. Add skb_orphan_frags_rx for paths that may loop packets to receive sockets. That is not allowed, as it may cause unbounded latency. Deep copy all zerocopy copy buffers, ref-counted or not, in this path. The exact locations to modify were chosen by exhaustively searching through all code that might modify skb_frag references and/or the the SKBTX_DEV_ZEROCOPY tx_flags bit. The changes err on the safe side, in two ways. (1) legacy ubuf_info paths virtio and tap are not modified. They keep a 1:1 ubuf_info to sk_buff relationship. Calls to skb_orphan_frags still call skb_copy_ubufs and thus copy frags in this case. (2) not all copies deep in the stack are addressed yet. skb_shift, skb_split and skb_try_coalesce can be refined to avoid copying. These are not in the hot path and this patch is hairy enough as is, so that is left for future refinement. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:30 -07:00
Willem de Bruijn	76851d1212	sock: add SOCK_ZEROCOPY sockopt The send call ignores unknown flags. Legacy applications may already unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a socket opts in to zerocopy. Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing. Processes can also query this socket option to detect kernel support for the feature. Older kernels will return ENOPROTOOPT. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:29 -07:00
Willem de Bruijn	52267790ef	sock: add MSG_ZEROCOPY The kernel supports zerocopy sendmsg in virtio and tap. Expand the infrastructure to support other socket types. Introduce a completion notification channel over the socket error queue. Notifications are returned with ee_origin SO_EE_ORIGIN_ZEROCOPY. ee_errno is 0 to avoid blocking the send/recv path on receiving notifications. Add reference counting, to support the skb split, merge, resize and clone operations possible with SOCK_STREAM and other socket types. The patch does not yet modify any datapaths. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:29 -07:00
Willem de Bruijn	3ece782693	sock: skb_copy_ubufs support for compound pages Refine skb_copy_ubufs to support compound pages. With upcoming TCP zerocopy sendmsg, such fragments may appear. The existing code replaces each page one for one. Splitting each compound page into an independent number of regular pages can result in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned. Instead, fill all destination pages but the last to PAGE_SIZE. Split the existing alloc + copy loop into separate stages: 1. compute bytelength and minimum number of pages to store this. 2. allocate 3. copy, filling each page except the last to PAGE_SIZE bytes 4. update skb frag array Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:29 -07:00
Willem de Bruijn	98ba0bd550	sock: allocate skbs from optmem Add sock_omalloc and sock_ofree to be able to allocate control skbs, for instance for looping errors onto sk_error_queue. The transmit budget (sk_wmem_alloc) is involved in transmit skb shaping, most notably in TCP Small Queues. Using this budget for control packets would impact transmission. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-08-03 21:37:29 -07:00
Tony Luck	3e5d2bd191	EDAC, pnd2: Build in a minimal sideband driver for Apollo Lake I've been waing a long time for the generic sideband driver to appear. Patience has run out, so include the minimum here to just read registers. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Patrick Geary <patrickg@supermicro.com> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170803210536.5662-1-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-08-04 05:58:23 +02:00
Nicholas Piggin	3db40c312c	powerpc/64: Fix __check_irq_replay missing decrementer interrupt If the decrementer wraps again and de-asserts the decrementer exception while hard-disabled, __check_irq_replay() has a test to notice the wrap when interrupts are re-enabled. The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked because the decrementer interrupt was always the first one checked after clearing the hard disable flag, but HMI check was moved ahead of that, which introduced this bug. This can cause a missed decrementer interrupt if we soft-disable interrupts then take an HMI which is recorded in irq_happened, then hard-disable interrupts for > 4s to wrap the decrementer. Fixes: `e0e0d6b739` ("powerpc/64: Replay hypervisor maintenance interrupt first") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-08-04 12:55:49 +10:00
Nicholas Piggin	09539f9b12	powerpc/perf: POWER9 PMU stops after idle workaround POWER9 DD2 PMU can stop after a state-loss idle in some conditions. A solution is to set then clear MMCRA[60] after wake from state-loss idle. MMCRA[60] is a non-architected bit, see the user manual for details. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-08-04 12:52:26 +10:00
Dave Airlie	5669b9989e	Merge branch 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux into drm-fixes Just a few small fixes for 4.13. * 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux: drm/amdgpu: Use list_del_init in amdgpu_mn_unregister drm/amdgpu: Fix undue fallthroughs in golden registers initialization drm/amdgpu: fix header on gfx9 clear state	2017-08-04 11:43:14 +10:00
Dave Airlie	c27668ba9a	Merge branch 'topic-arcpgu-updates' of https://github.com/foss-for-synopsys-dwc-arc-processors/linux into drm-next arcgpu minor updates. * 'topic-arcpgu-updates' of https://github.com/foss-for-synopsys-dwc-arc-processors/linux: drm: arcpgu: Allow some clock deviation in crtc->mode_valid() callback drm: arcpgu: Fix module unload drm: arcpgu: Fix mmap() callback arcpgu: Simplify driver name drm/arcpgu: Opt in debugfs	2017-08-04 11:42:34 +10:00
Dave Airlie	9f589b20b4	Merge tag 'drm-next-du-20170803' of git://linuxtv.org/pinchartl/media into drm-next rcar-du updates, contains vsp1 updates as well. * tag 'drm-next-du-20170803' of git://linuxtv.org/pinchartl/media: (24 commits) drm: rcar-du: Use new iterator macros drm: rcar-du: Repair vblank for DRM page flips using the VSP drm: rcar-du: Fix race condition when disabling planes at CRTC stop drm: rcar-du: Wait for flip completion instead of vblank in commit tail drm: rcar-du: Use the VBK interrupt for vblank events drm: rcar-du: Add HDMI outputs to R8A7796 device description drm: rcar-du: Remove an unneeded NULL check drm: rcar-du: Setup planes before enabling CRTC to avoid flicker drm: rcar-du: Configure DPAD0 routing through last group on Gen3 drm: rcar-du: Restrict DPLL duty cycle workaround to H3 ES1.x drm: rcar-du: Support multiple sources from the same VSP drm: rcar-du: Fix comments to comply with the kernel coding style drm: rcar-du: Use of_graph_get_remote_endpoint() v4l: vsp1: Add support for header display lists in continuous mode v4l: vsp1: Add support for multiple DRM pipelines v4l: vsp1: Add support for multiple LIF instances v4l: vsp1: Add support for new VSP2-BS, VSP2-DL and VSP2-D instances v4l: vsp1: Add support for the BRS entity v4l: vsp1: Add pipe index argument to the VSP-DU API v4l: vsp1: Don't create links for DRM pipeline ...	2017-08-04 11:41:24 +10:00
Gary R Hook	5060ffc97b	crypto: ccp - Add XTS-AES-256 support for CCP version 5 Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:44 +08:00
Gary R Hook	7f7216cfaf	crypto: ccp - Rework the unit-size check for XTS-AES The CCP supports a limited set of unit-size values. Change the check for this parameter such that acceptable values match the enumeration. Then clarify the conditions under which we must use the fallback implementation. Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:43 +08:00
Gary R Hook	47f27f160b	crypto: ccp - Add a call to xts_check_key() Vet the key using the available standard function Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:42 +08:00
Gary R Hook	e652399edb	crypto: ccp - Fix XTS-AES-128 support on v5 CCPs Version 5 CCPs have some new requirements for XTS-AES: the type field must be specified, and the key requires 512 bits, with each part occupying 256 bits and padded with zeroes. cc: <stable@vger.kernel.org> # 4.9.x+ Signed-off-by: Gary R Hook <ghook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:41 +08:00
Ard Biesheuvel	7c83d689c7	crypto: arm64/aes - avoid expanded lookup tables in the final round For the final round, avoid the expanded and padded lookup tables exported by the generic AES driver. Instead, for encryption, we can perform byte loads from the same table we used for the inner rounds, which will still be hot in the caches. For decryption, use the inverse AES Sbox directly, which is 4x smaller than the inverse lookup table exported by the generic driver. This should significantly reduce the Dcache footprint of our code, which makes the code more robust against timing attacks. It does not introduce any additional module dependencies, given that we already rely on the core AES module for the shared key expansion routines. It also frees up register x18, which is not available as a scratch register on all platforms, which and so avoiding it improves shareability of this code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:26 +08:00
Ard Biesheuvel	0d149ce67d	crypto: arm/aes - avoid expanded lookup tables in the final round For the final round, avoid the expanded and padded lookup tables exported by the generic AES driver. Instead, for encryption, we can perform byte loads from the same table we used for the inner rounds, which will still be hot in the caches. For decryption, use the inverse AES Sbox directly, which is 4x smaller than the inverse lookup table exported by the generic driver. This should significantly reduce the Dcache footprint of our code, which makes the code more robust against timing attacks. It does not introduce any additional module dependencies, given that we already rely on the core AES module for the shared key expansion routines. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:25 +08:00
Ard Biesheuvel	03c9a333fe	crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL Implement a NEON fallback for systems that do support NEON but have no support for the optional 64x64->128 polynomial multiplication instruction that is part of the ARMv8 Crypto Extensions. It is based on the paper "Fast Software Polynomial Multiplication on ARM Processors Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and Ricardo Dahab (https://hal.inria.fr/hal-01506572), but has been reworked extensively for the AArch64 ISA. On a low-end core such as the Cortex-A53 found in the Raspberry Pi3, the NEON based implementation is 4x faster than the table based one, and is time invariant as well, making it less vulnerable to timing attacks. When combined with the bit-sliced NEON implementation of AES-CTR, the AES-GCM performance increases by 2x (from 58 to 29 cycles per byte). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:25 +08:00
Ard Biesheuvel	3759ee0572	crypto: arm/ghash - add NEON accelerated fallback for vmull.p64 Implement a NEON fallback for systems that do support NEON but have no support for the optional 64x64->128 polynomial multiplication instruction that is part of the ARMv8 Crypto Extensions. It is based on the paper "Fast Software Polynomial Multiplication on ARM Processors Using the NEON Engine" by Danilo Camara, Conrado Gouvea, Julio Lopez and Ricardo Dahab (https://hal.inria.fr/hal-01506572) On a 32-bit guest executing under KVM on a Cortex-A57, the new code is not only 4x faster than the generic table based GHASH driver, it is also time invariant. (Note that the existing vmull.p64 code is 16x faster on this core). Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:24 +08:00
Ard Biesheuvel	537c1445ab	crypto: arm64/gcm - implement native driver using v8 Crypto Extensions Currently, the AES-GCM implementation for arm64 systems that support the ARMv8 Crypto Extensions is based on the generic GCM module, which combines the AES-CTR implementation using AES instructions with the PMULL based GHASH driver. This is suboptimal, given the fact that the input data needs to be loaded twice, once for the encryption and again for the MAC calculation. On Cortex-A57 (r1p2) and other recent cores that implement micro-op fusing for the AES instructions, AES executes at less than 1 cycle per byte, which means that any cycles wasted on loading the data twice hurt even more. So implement a new GCM driver that combines the AES and PMULL instructions at the block level. This improves performance on Cortex-A57 by ~37% (from 3.5 cpb to 2.6 cpb) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:23 +08:00
Ard Biesheuvel	ec808bbef0	crypto: arm64/aes-bs - implement non-SIMD fallback for AES-CTR Of the various chaining modes implemented by the bit sliced AES driver, only CTR is exposed as a synchronous cipher, and requires a fallback in order to remain usable once we update the kernel mode NEON handling logic to disallow nested use. So wire up the existing CTR fallback C code. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:22 +08:00
Ard Biesheuvel	611d5324f4	crypto: arm64/chacha20 - take may_use_simd() into account To accommodate systems that disallow the use of kernel mode NEON in some circumstances, take the return value of may_use_simd into account when deciding whether to invoke the C fallback routine. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:22 +08:00
Ard Biesheuvel	e211506979	crypto: arm64/aes-blk - add a non-SIMD fallback for synchronous CTR To accommodate systems that may disallow use of the NEON in kernel mode in some circumstances, introduce a C fallback for synchronous AES in CTR mode, and use it if may_use_simd() returns false. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:21 +08:00
Ard Biesheuvel	5092fcf349	crypto: arm64/aes-ce-ccm: add non-SIMD generic fallback The arm64 kernel will shortly disallow nested kernel mode NEON. So honour this in the ARMv8 Crypto Extensions implementation of CCM-AES, and fall back to a scalar implementation using the generic crypto helpers for AES, XOR and incrementing the CTR counter. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:21 +08:00
Ard Biesheuvel	b8fb993a83	crypto: arm64/aes-ce-cipher: add non-SIMD generic fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:20 +08:00
Ard Biesheuvel	f402e3115e	crypto: arm64/aes-ce-cipher - match round key endianness with generic code In order to be able to reuse the generic AES code as a fallback for situations where the NEON may not be used, update the key handling to match the byte order of the generic code: it stores round keys as sequences of 32-bit quantities rather than streams of bytes, and so our code needs to be updated to reflect that. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:19 +08:00
Ard Biesheuvel	da1793312f	crypto: arm64/sha2-ce - add non-SIMD scalar fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:19 +08:00
Ard Biesheuvel	0771f3234d	crypto: arm64/sha1-ce - add non-SIMD generic fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:18 +08:00
Ard Biesheuvel	15c7d8f8a2	crypto: arm64/crc32 - add non-SIMD scalar fallback The arm64 kernel will shortly disallow nested kernel mode NEON, so add a fallback to scalar C code that can be invoked in that case. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-08-04 09:27:17 +08:00

... 169 170 171 172 173 ...

704772 Commits