linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-27 22:51:35 +00:00

Author	SHA1	Message	Date
Dmitry Bezrukov	59b04eeaf2	net: usb: aqc111: Implement set_rx_mode callback Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:07 -08:00
Dmitry Bezrukov	de074e7a7e	net: usb: aqc111: Add support for TSO Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:07 -08:00
Dmitry Bezrukov	6649d2a6c4	net: usb: aqc111: Add support for enable/disable checksum offload Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:07 -08:00
Dmitry Bezrukov	a4017cc264	net: usb: aqc111: Add support for changing MTU Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:07 -08:00
Dmitry Bezrukov	0203146646	net: usb: aqc111: Add checksum offload support Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:07 -08:00
Dmitry Bezrukov	361459cd96	net: usb: aqc111: Implement RX data path Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Dmitry Bezrukov	4a3576d2bc	net: usb: aqc111: Implement TX data path Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Dmitry Bezrukov	df2d59a2ab	net: usb: aqc111: Add support for getting and setting of MAC address Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Dmitry Bezrukov	7b8b06544a	net: usb: aqc111: Introduce link management Add full hardware initialization sequence and link configuration logic Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Dmitry Bezrukov	33cd597fbf	net: usb: aqc111: Introduce PHY access Add helpers to write 32bit values. Implement PHY power up/down sequences. AQC111, PHY is being controlled via vendor command interface. Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Dmitry Bezrukov	f3aa095ac7	net: usb: aqc111: Various callbacks implementation Reset, stop callbacks, driver unbind callback. More register defines required for these callbacks. Add helpers to read/write 16bit values Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Dmitry Bezrukov	619fcb4487	net: usb: aqc111: Add implementation of read and write commands Read/write command register defines and functions Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Dmitry Bezrukov	7cea2d40af	net: usb: aqc111: Add bind and empty unbind callbacks Initialize net_device_ops structure Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Dmitry Bezrukov	17364b805f	net: usb: aqc111: Driver skeleton for Aquantia AQtion USB to 5GbE Initialize usb_driver structure skeleton Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:46:06 -08:00
Xin Long	0d32f17717	sctp: increase sk_wmem_alloc when head->truesize is increased I changed to count sk_wmem_alloc by skb truesize instead of 1 to fix the sk_wmem_alloc leak caused by later truesize's change in xfrm in Commit `02968ccf01` ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit"). But I should have also increased sk_wmem_alloc when head->truesize is increased in sctp_packet_gso_append() as xfrm does. Otherwise, sctp gso packet will cause sk_wmem_alloc underflow. Fixes: `02968ccf01` ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:42:31 -08:00
yupeng	712ee16c23	add documents for snmp counters Add explaination of below counters: TcpExtTCPRcvCoalesce TcpExtTCPAutoCorking TcpExtTCPOrigDataSent TCPSynRetrans TCPFastOpenActiveFail TcpExtListenOverflows TcpExtListenDrops TcpExtTCPHystartTrainDetect TcpExtTCPHystartTrainCwnd TcpExtTCPHystartDelayDetect TcpExtTCPHystartDelayCwnd Signed-off-by: yupeng <yupeng0921@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:39:37 -08:00
Colin Ian King	a8842e9755	firestream: fix spelling mistake: "Inititing" -> "Initializing" There are spelling mistakes in debug messages, fix them. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:32:06 -08:00
David S. Miller	50853808ff	Merge branch 'mlxsw-Prepare-for-VLAN-aware-bridge-w-VxLAN' Ido Schimmel says: ==================== mlxsw: Prepare for VLAN-aware bridge w/VxLAN The driver is using 802.1Q filtering identifiers (FIDs) to represent the different VLANs in the VLAN-aware bridge (only one is supported). However, the device cannot assign a VNI to such FIDs, which prevents the driver from supporting the enslavement of VxLAN devices to the VLAN-aware bridge. This patchset works around this limitation by emulating 802.1Q FIDs using 802.1D FIDs, which can be assigned a VNI and so far have only been used in conjunction with VLAN-unaware bridges. The downside of this approach is that multiple {Port,VID}->FID entries are required, whereas a single VID->FID entry is required with "true" 802.1Q FIDs. First four patches introduce the new FID family of emulated 802.1Q FIDs and the associated type of router interfaces (RIFs). Last patch flips the driver to use this new FID family. The diff is relatively small because the internal implementation of each FID family is contained and hidden in spectrum_fid.c. Different internal users (e.g., bridge, router) are aware of the different FID types, but do not care about their internal implementation. This makes it trivial to swap the current implementation of 802.1Q FIDs with the new one, using 802.1D FIDs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:08 -08:00
Ido Schimmel	c2e7490c31	mlxsw: spectrum: Flip driver to use emulated 802.1Q FIDs Replace 802.1Q FIDs and VLAN RIFs with their emulated counterparts. The emulated 802.1Q FIDs are actually 802.1D FIDs and thus use the same flood tables, of per-FID type. Therefore, add 4K-1 entries to the per-FID flood tables for the new FIDs and get rid of the FID-offset flood tables that were used by the old 802.1Q FIDs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:08 -08:00
Ido Schimmel	ba6da02a9c	mlxsw: spectrum_router: Introduce emulated VLAN RIFs Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of "VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of "FID" type. In other words, the RIF type is derived from the underlying FID type. VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on top of 802.1D FIDs. Since the previous patch emulated 802.1Q FIDs using 802.1D FIDs, this patch emulates VLAN RIFs using FID RIFs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:08 -08:00
Ido Schimmel	d62dd8a0c8	mlxsw: spectrum_fid: Introduce emulated 802.1Q FIDs The driver uses 802.1Q FIDs when offloading a VLAN-aware bridge. Unfortunately, it is not possible to assign a VNI to such FIDs, which prompts the driver to forbid the enslavement of VxLAN devices to a VLAN-aware bridge. Workaround this hardware limitation by creating a new family of FIDs, emulated 802.1Q FIDs. These FIDs are emulated using 802.1D FIDs, which can be assigned a VNI. The downside of this approach is that multiple {Port, VID}->FID entries are required, whereas only a single VID->FID is required with "true" 802.1Q FIDs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:08 -08:00
Ido Schimmel	7c4a729221	mlxsw: spectrum_fid: Make flood index calculation more robust 802.1D FIDs use a per-FID flood table, where the flood index into the table is calculated by subtracting 4K from the FID's index. Currently, 802.1D FIDs start at 4K, so the calculation is correct, but if it was ever to change, the calculation will no longer be correct. In addition, this change will allow us to reuse the flood index calculation function in the next patch, where we are going to emulate 802.1Q FIDs using 802.1D FIDs. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:07 -08:00
Ido Schimmel	6502be9f04	mlxsw: spectrum_switchdev: Do not set field when it is reserved When configuring an FDB entry pointing to a LAG netdev (or its upper), the driver should only set the 'lag_vid' field when the FID (filtering identifier) is of 802.1D type. Extend the 802.1D FID family with an attribute indicating whether this field should be set and based on its value set the field or leave it blank. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:27:07 -08:00
YueHaibing	4e3c7c00bb	net: aquantia: return 'err' if set MPI_DEINIT state fails Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:260:7: warning: variable 'err' set but not used [-Wunused-but-set-variable] 'err' should be returned while set MPI_DEINIT state fails in hw_atl_utils_soft_reset. Fixes: `cce96d1883` ("net: aquantia: Regression on reset with 1.x firmware") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:23:37 -08:00
David S. Miller	ff2237890c	Merge branch 'bridge-bools' Nikolay Aleksandrov says: ==================== net: bridge: add an option to disabe linklocal learning This set adds a new bridge option which can control learning from link-local packets, by default learning is on to be consistent and avoid breaking users expectations. If the new no_linklocal_learn option is enabled then the bridge will stop learning from link-local packets. In order to save space for future boolean options, patch 01 adds a new bool option API that uses a bitmask to control boolean options. The bridge is by far the largest netlink attr user and we keep adding simple boolean options which waste nl attr ids and space. We're not directly mapping these to the in-kernel bridge flags because some might require more complex configuration changes (e.g. if we were to add the per port vlan stats now, it'd require multiple checks before changing value). Any new bool option needs to be handled by both br_boolopt_toggle and get in order to be able to retrieve its state later. All such options are automatically exported via netlink. The behaviour of setting such options is consistent with netlink option handling when a missing option is being set (silently ignored), e.g. when a newer iproute2 is used on older kernel. All supported options are exported via bm's optmask when dumping the new attribute. v2: address Andrew Lunn's comments, squash a minor change into patch 01, export all supported options via optmask when dumping, add patch 03, pass down extack so options can return meaningful errors, add WARN_ON on unsupported options (should not happen) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:04:16 -08:00
Nikolay Aleksandrov	1ed1ccb99e	net: bridge: export supported boolopts Now that we have at least one bool option, we can export all of the supported bool options via optmask when dumping them. v2: new patch Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:04:15 -08:00
Nikolay Aleksandrov	70e4272b4c	net: bridge: add no_linklocal_learn bool option Use the new boolopt API to add an option which disables learning from link-local packets. The default is kept as before and learning is enabled. This is a simple map from a boolopt bit to a bridge private flag that is tested before learning. v2: pass NULL for extack via sysfs Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:04:15 -08:00
Nikolay Aleksandrov	a428afe82f	net: bridge: add support for user-controlled bool options We have been adding many new bridge options, a big number of which are boolean but still take up netlink attribute ids and waste space in the skb. Recently we discussed learning from link-local packets[1] and decided yet another new boolean option will be needed, thus introducing this API to save some bridge nl space. The API supports changing the value of multiple boolean options at once via the br_boolopt_multi struct which has an optmask (which options to set, bit per opt) and optval (options' new values). Future boolean options will only be added to the br_boolopt_id enum and then will have to be handled in br_boolopt_toggle/get. The API will automatically add the ability to change and export them via netlink, sysfs can use the single boolopt function versions to do the same. The behaviour with failing/succeeding is the same as with normal netlink option changing. If an option requires mapping to internal kernel flag or needs special configuration to be enabled then it should be handled in br_boolopt_toggle. It should also be able to retrieve an option's current state via br_boolopt_get. v2: WARN_ON() on unsupported option as that shouldn't be possible and also will help catch people who add new options without handling them for both set and get. Pass down extack so if an option desires it could set it on error and be more user-friendly. [1] https://www.spinics.net/lists/netdev/msg532698.html Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:04:15 -08:00
Heiner Kallweit	c85ddecae6	net: phy: add workaround for issue where PHY driver doesn't bind to the device After switching the r8169 driver to use phylib some user reported that their network is broken. This was caused by the genphy PHY driver being used instead of the dedicated PHY driver for the RTL8211B. Users reported that loading the Realtek PHY driver module upfront fixes the issue. See also this mail thread: https://marc.info/?t=154279781800003&r=1&w=2 The issue is quite weird and the root cause seems to be somewhere in the base driver core. The patch works around the issue and may be removed once the actual issue is fixed. The Fixes tag refers to the first reported occurrence of the issue. The issue itself may have been existing much longer and it may affect users of other network chips as well. Users typically will recognize this issue only if their PHY stops working when being used with the genphy driver. Fixes: `f1e911d5d0` ("r8169: add basic phylib support") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 15:00:24 -08:00
Bernd Eckstein	45611c61dd	usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 The bug is not easily reproducable, as it may occur very infrequently (we had machines with 20minutes heavy downloading before it occurred) However, on a virual machine (VMWare on Windows 10 host) it occurred pretty frequently (1-2 seconds after a speedtest was started) dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback before it is set. This causes the following problems: - double free of the skb or potential memory leak - in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually general protection fault Example dmesg output: [ 134.841986] ------------[ cut here ]------------ [ 134.841987] recvmsg bug: copied 9C24A555 seq 9C24B557 rcvnxt 9C25A6B3 fl 0 [ 134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0 [ 134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi [ 134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu [ 134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0 [ 134.842048] RSP: 0018:ffffa6630422bcc8 EFLAGS: 00010286 [ 134.842049] RAX: 0000000000000000 RBX: ffff997616f4f200 RCX: 0000000000000006 [ 134.842049] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9976257d6490 [ 134.842050] RBP: ffffa6630422bd98 R08: 0000000000000001 R09: 000000000004bba4 [ 134.842050] R10: 0000000001e00c6f R11: 000000000004bba4 R12: ffff99760dee3000 [ 134.842051] R13: 0000000000000000 R14: ffff99760dee3514 R15: 0000000000000000 [ 134.842051] FS: 00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000 [ 134.842052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 134.842053] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0 [ 134.842055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 134.842055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 134.842057] Call Trace: [ 134.842060] ? aa_sk_perm+0x53/0x1a0 [ 134.842064] inet_recvmsg+0x51/0xc0 [ 134.842066] sock_recvmsg+0x43/0x50 [ 134.842070] SYSC_recvfrom+0xe4/0x160 [ 134.842072] ? __schedule+0x3de/0x8b0 [ 134.842075] ? ktime_get_ts64+0x4c/0xf0 [ 134.842079] SyS_recvfrom+0xe/0x10 [ 134.842082] do_syscall_64+0x73/0x130 [ 134.842086] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 134.842086] RIP: 0033:0x7fe331f5a81d [ 134.842088] RSP: 002b:00007ffe8da98398 EFLAGS: 00000246 ORIG_RAX: 000000000000002d [ 134.842090] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007fe331f5a81d [ 134.842094] RDX: 00000000000003fb RSI: 0000000001e00874 RDI: 0000000000000003 [ 134.842095] RBP: 00007fe32f642c70 R08: 0000000000000000 R09: 0000000000000000 [ 134.842097] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe332347698 [ 134.842099] R13: 0000000001b7e0a0 R14: 0000000001e00874 R15: 0000000000000000 [ 134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00 [ 134.842126] ---[ end trace b7138fc08c83147f ]--- [ 134.842144] general protection fault: 0000 [#1] SMP PTI [ 134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi [ 134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu [ 134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 134.842164] RIP: 0010:tcp_close+0x2c6/0x440 [ 134.842165] RSP: 0018:ffffa6630422bde8 EFLAGS: 00010202 [ 134.842167] RAX: 0000000000000000 RBX: ffff99760dee3000 RCX: 0000000180400034 [ 134.842168] RDX: 5c4afd407207a6c4 RSI: ffffe868495bd300 RDI: ffff997616f4f200 [ 134.842169] RBP: ffffa6630422be08 R08: 0000000016f4d401 R09: 0000000180400034 [ 134.842169] R10: ffffa6630422bd98 R11: 0000000000000000 R12: 000000000000600c [ 134.842170] R13: 0000000000000000 R14: ffff99760dee30c8 R15: ffff9975bd44fe00 [ 134.842171] FS: 00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000 [ 134.842173] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 134.842174] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0 [ 134.842177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 134.842178] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 134.842179] Call Trace: [ 134.842181] inet_release+0x42/0x70 [ 134.842183] __sock_release+0x42/0xb0 [ 134.842184] sock_close+0x15/0x20 [ 134.842187] __fput+0xea/0x220 [ 134.842189] ____fput+0xe/0x10 [ 134.842191] task_work_run+0x8a/0xb0 [ 134.842193] exit_to_usermode_loop+0xc4/0xd0 [ 134.842195] do_syscall_64+0xf4/0x130 [ 134.842197] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 134.842197] RIP: 0033:0x7fe331f5a560 [ 134.842198] RSP: 002b:00007ffe8da982e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 134.842200] RAX: 0000000000000000 RBX: 00007fe32f642c70 RCX: 00007fe331f5a560 [ 134.842201] RDX: 00000000008f5320 RSI: 0000000001cd4b50 RDI: 0000000000000003 [ 134.842202] RBP: 00007fe32f6500f8 R08: 000000000000003c R09: 00000000009343c0 [ 134.842203] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe32f6500d0 [ 134.842204] R13: 00000000008f5320 R14: 00000000008f5320 R15: 0000000001cd4770 [ 134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80 [ 134.842226] RIP: tcp_close+0x2c6/0x440 RSP: ffffa6630422bde8 [ 134.842227] ---[ end trace b7138fc08c831480 ]--- The proposed patch eliminates a potential racing condition. Before, usb_submit_urb was called and _after_ that, the skb was attached (dev->tx_skb). So, on a callback it was possible, however unlikely that the skb was freed before it was set. That way (because dev->tx_skb was not set to NULL after it was freed), it could happen that a skb from a earlier transmission was freed a second time (and the skb we should have freed did not get freed at all) Now we free the skb directly in ipheth_tx(). It is not passed to the callback anymore, eliminating the posibility of a double free of the same skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any() respectively dev_consume_skb_any() to free the skb. Signed-off-by: Oliver Zweigle <Oliver.Zweigle@faro.com> Signed-off-by: Bernd Eckstein <3ernd.Eckstein@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 14:57:16 -08:00
David S. Miller	93143f846b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2018-11-27 The following pull-request contains BPF updates for your net tree. The main changes are: 1) Fix several bugs in BPF sparc JIT, that is, convergence for fused branches, initialization of frame pointer register, and moving all arguments into output registers from input registers in prologue to fix BPF to BPF calls, from David. 2) Fix a bug in arm64 JIT for fetching BPF to BPF call addresses where they are not guaranteed to fit into imm field and therefore must be retrieved through prog aux data, from Daniel. 3) Explicitly add all JITs to MAINTAINERS file with developers able to help out in feature development, fixes, review, etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-27 09:52:05 -08:00
Jim Mattson	fd65d3142f	kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb Previously, we only called indirect_branch_prediction_barrier on the logical CPU that freed a vmcb. This function should be called on all logical CPUs that last loaded the vmcb in question. Fixes: `15d4507152` ("KVM/x86: Add IBPB support") Reported-by: Neel Natu <neelnatu@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:50:42 +01:00
Junaid Shahid	0e0fee5c53	kvm: mmu: Fix race in emulated page table writes When a guest page table is updated via an emulated write, kvm_mmu_pte_write() is called to update the shadow PTE using the just written guest PTE value. But if two emulated guest PTE writes happened concurrently, it is possible that the guest PTE and the shadow PTE end up being out of sync. Emulated writes do not mark the shadow page as unsync-ed, so this inconsistency will not be resolved even by a guest TLB flush (unless the page was marked as unsync-ed at some other point). This is fixed by re-reading the current value of the guest PTE after the MMU lock has been acquired instead of just using the value that was written prior to calling kvm_mmu_pte_write(). Signed-off-by: Junaid Shahid <junaids@google.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:50:31 +01:00
Liran Alon	52ad7eb3d6	KVM: nVMX: vmcs12 revision_id is always VMCS12_REVISION even when copied from eVMCS vmcs12 represents the per-CPU cache of L1 active vmcs12. This cache can be loaded by one of the following: 1) Guest making a vmcs12 active by exeucting VMPTRLD 2) Guest specifying eVMCS in VP assist page and executing VMLAUNCH/VMRESUME. Either way, vmcs12 should have revision_id of VMCS12_REVISION. Which is not equal to eVMCS revision_id which specifies used VersionNumber of eVMCS struct (e.g. KVM_EVMCS_VERSION). Specifically, this causes an issue in restoring a nested VM state because vmx_set_nested_state() verifies that vmcs12->revision_id is equal to VMCS12_REVISION which was not true in case vmcs12 was populated from an eVMCS by vmx_get_nested_state() which calls copy_enlightened_to_vmcs12(). Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:50:30 +01:00
Liran Alon	72aeb60c52	KVM: nVMX: Verify eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD According to TLFS section 16.11.2 Enlightened VMCS, the first u32 field of eVMCS should specify eVMCS VersionNumber. This version should be in the range of supported eVMCS versions exposed to guest via CPUID.0x4000000A.EAX[0:15]. The range which KVM expose to guest in this CPUID field should be the same as the value returned in vmcs_version by nested_enable_evmcs(). According to the above, eVMCS VMPTRLD should verify that version specified in given eVMCS is in the supported range. However, current code mistakenly verfies this field against VMCS12_REVISION. One can also see that when KVM use eVMCS, it makes sure that alloc_vmcs_cpu() sets allocated eVMCS revision_id to KVM_EVMCS_VERSION. Obvious fix should just change eVMCS VMPTRLD to verify first u32 field of eVMCS is equal to KVM_EVMCS_VERSION. However, it turns out that Microsoft Hyper-V fails to comply to their own invented interface: When Hyper-V use eVMCS, it just sets first u32 field of eVMCS to revision_id specified in MSR_IA32_VMX_BASIC (In our case: VMCS12_REVISION). Instead of used eVMCS version number which is one of the supported versions specified in CPUID.0x4000000A.EAX[0:15]. To overcome Hyper-V bug, we accept either a supported eVMCS version or VMCS12_REVISION as valid values for first u32 field of eVMCS. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:50:29 +01:00
Leonid Shatz	326e742533	KVM: nVMX/nSVM: Fix bug which sets vcpu->arch.tsc_offset to L1 tsc_offset Since commit `e79f245dde` ("X86/KVM: Properly update 'tsc_offset' to represent the running guest"), vcpu->arch.tsc_offset meaning was changed to always reflect the tsc_offset value set on active VMCS. Regardless if vCPU is currently running L1 or L2. However, above mentioned commit failed to also change kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly. This is because vmx_write_tsc_offset() could set the tsc_offset value in active VMCS to given offset parameter plus vmcs12->tsc_offset. However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset to given offset parameter. Without taking into account the possible addition of vmcs12->tsc_offset. (Same is true for SVM case). Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return actually set tsc_offset in active VMCS and modify kvm_vcpu_write_tsc_offset() to set returned value in vcpu->arch.tsc_offset. In addition, rename write_tsc_offset() callback to write_l1_tsc_offset() to make it clear that it is meant to set L1 TSC offset. Fixes: `e79f245dde` ("X86/KVM: Properly update 'tsc_offset' to represent the running guest") Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Leonid Shatz <leonid.shatz@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:50:10 +01:00
Yi Wang	1e4329ee2c	x86/kvm/vmx: fix old-style function declaration The inline keyword which is not at the beginning of the function declaration may trigger the following build warnings, so let's fix it: arch/x86/kvm/vmx.c:1309:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:5947:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:5985:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] arch/x86/kvm/vmx.c:6023:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:50:09 +01:00
Yi Wang	354cb410d8	KVM: x86: fix empty-body warnings We get the following warnings about empty statements when building with 'W=1': arch/x86/kvm/lapic.c:632:53: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1907:42: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1936:65: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] arch/x86/kvm/lapic.c:1975:44: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] Rework the debug helper macro to get rid of these warnings. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:50:09 +01:00
Liran Alon	f48b4711dd	KVM: VMX: Update shared MSRs to be saved/restored on MSR_EFER.LMA changes When guest transitions from/to long-mode by modifying MSR_EFER.LMA, the list of shared MSRs to be saved/restored on guest<->host transitions is updated (See vmx_set_efer() call to setup_msrs()). On every entry to guest, vcpu_enter_guest() calls vmx_prepare_switch_to_guest(). This function should also take care of setting the shared MSRs to be saved/restored. However, the function does nothing in case we are already running with loaded guest state (vmx->loaded_cpu_state != NULL). This means that even when guest modifies MSR_EFER.LMA which results in updating the list of shared MSRs, it isn't being taken into account by vmx_prepare_switch_to_guest() because it happens while we are running with loaded guest state. To fix above mentioned issue, add a flag to mark that the list of shared MSRs has been updated and modify vmx_prepare_switch_to_guest() to set shared MSRs when running with host state OR list of shared MSRs has been updated. Note that this issue was mistakenly introduced by commit `678e315e78` ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base") because previously vmx_set_efer() always called vmx_load_host_state() which resulted in vmx_prepare_switch_to_guest() to set shared MSRs. Fixes: `678e315e78` ("KVM: vmx: add dedicated utility to access guest's kernel_gs_base") Reported-by: Eyal Moscovici <eyal.moscovici@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:50:08 +01:00
Liran Alon	bcbfbd8ec2	KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall kvm_pv_clock_pairing() allocates local var "struct kvm_clock_pairing clock_pairing" on stack and initializes all it's fields besides padding (clock_pairing.pad[]). Because clock_pairing var is written completely (including padding) to guest memory, failure to init struct padding results in kernel info-leak. Fix the issue by making sure to also init the padding with zeroes. Fixes: `55dd00a73a` ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall") Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:49:57 +01:00
Liran Alon	7f9ad1dfa3	KVM: nVMX: Fix kernel info-leak when enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS more than once Consider the case that userspace enables KVM_CAP_HYPERV_ENLIGHTENED_VMCS twice: 1) kvm_vcpu_ioctl_enable_cap() is called to enable KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs(). 2) nested_enable_evmcs() sets enlightened_vmcs_enabled to true and fills vmcs_version which is then copied to userspace. 3) kvm_vcpu_ioctl_enable_cap() is called again to enable KVM_CAP_HYPERV_ENLIGHTENED_VMCS which calls nested_enable_evmcs(). 4) This time nested_enable_evmcs() just returns 0 as enlightened_vmcs_enabled is already true. Without filling vmcs_version. 5) kvm_vcpu_ioctl_enable_cap() continues as usual and copies uninitialized vmcs_version to userspace which leads to kernel info-leak. Fix this issue by simply changing nested_enable_evmcs() to always fill vmcs_version output argument. Even when enlightened_vmcs_enabled is already set to true. Note that SVM's nested_enable_evmcs() should not be modified because it always returns a non-zero value (-ENODEV) which results in kvm_vcpu_ioctl_enable_cap() skipping the copy of vmcs_version to userspace (as it should). Fixes: `57b119da35` ("KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability") Reported-by: syzbot+cfbc368e283d381f8cef@syzkaller.appspotmail.com Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:49:46 +01:00
Wei Wang	30510387a5	svm: Add mutex_lock to protect apic_access_page_done on AMD systems There is a race condition when accessing kvm->arch.apic_access_page_done. Due to it, x86_set_memory_region will fail when creating the second vcpu for a svm guest. Add a mutex_lock to serialize the accesses to apic_access_page_done. This lock is also used by vmx for the same purpose. Signed-off-by: Wei Wang <wawei@amazon.de> Signed-off-by: Amadeusz Juskowiak <ajusk@amazon.de> Signed-off-by: Julian Stecklina <jsteckli@amazon.de> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Joerg Roedel <jroedel@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:49:39 +01:00
Wanpeng Li	e97f852fd4	KVM: X86: Fix scan ioapic use-before-initialization Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8 PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 5059 Comm: debug Tainted: G OE 4.19.0-rc5 #16 RIP: 0010:__lock_acquire+0x1a6/0x1990 Call Trace: lock_acquire+0xdb/0x210 _raw_spin_lock+0x38/0x70 kvm_ioapic_scan_entry+0x3e/0x110 [kvm] vcpu_enter_guest+0x167e/0x1910 [kvm] kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm] kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm] do_vfs_ioctl+0xa5/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x83/0x6e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This can be triggered by the following program: #define _GNU_SOURCE #include <endian.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h> uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff}; int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); long res = 0; memcpy((void)0x20000040, "/dev/kvm", 9); res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0); if (res != -1) r[0] = res; res = syscall(__NR_ioctl, r[0], 0xae01, 0); if (res != -1) r[1] = res; res = syscall(__NR_ioctl, r[1], 0xae41, 0); if (res != -1) r[2] = res; memcpy( (void)0x20000080, "\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00" "\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43" "\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33" "\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe" "\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22" "\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb", 106); syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080); syscall(__NR_ioctl, r[2], 0xae80, 0); return 0; } This patch fixes it by bailing out scan ioapic if ioapic is not initialized in kernel. Reported-by: Wei Wu <ww9210@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wei Wu <ww9210@gmail.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:49:20 +01:00
Wanpeng Li	38ab012f10	KVM: LAPIC: Fix pv ipis use-before-initialization Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 0000000000000014 PGD 800000040410c067 P4D 800000040410c067 PUD 40410d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 3 PID: 2567 Comm: poc Tainted: G OE 4.19.0-rc5 #16 RIP: 0010:kvm_pv_send_ipi+0x94/0x350 [kvm] Call Trace: kvm_emulate_hypercall+0x3cc/0x700 [kvm] handle_vmcall+0xe/0x10 [kvm_intel] vmx_handle_exit+0xc1/0x11b0 [kvm_intel] vcpu_enter_guest+0x9fb/0x1910 [kvm] kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm] kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm] do_vfs_ioctl+0xa5/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x83/0x6e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the apic map has not yet been initialized, the testcase triggers pv_send_ipi interface by vmcall which results in kvm->arch.apic_map is dereferenced. This patch fixes it by checking whether or not apic map is NULL and bailing out immediately if that is the case. Fixes: `4180bf1b65` (KVM: X86: Implement "send IPI" hypercall) Reported-by: Wei Wu <ww9210@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wei Wu <ww9210@gmail.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:48:56 +01:00
Luiz Capitulino	a87c99e612	KVM: VMX: re-add ple_gap module parameter Apparently, the ple_gap parameter was accidentally removed by commit `c8e88717cf`. Add it back. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Cc: stable@vger.kernel.org Fixes: `c8e88717cf` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 12:48:43 +01:00
David Miller	2b9034b5ea	sparc: Adjust bpf JIT prologue for PSEUDO calls. Move all arguments into output registers from input registers. This path is exercised by test_verifier.c's "calls: two calls with args" test. Adjust BPF_TAILCALL_PROLOGUE_SKIP as needed. Let's also make the prologue length a constant size regardless of the combination of ->saw_frame_pointer and ->saw_tail_call settings. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-11-27 09:46:52 +01:00
David S. Miller	02c72d5eda	Merge branch 'virtio-support-packed-ring' Tiwei Bie says: ==================== virtio: support packed ring This patch set implements packed ring support in virtio driver. A performance test between pktgen (pktgen_sample03_burst_single_flow.sh) and DPDK vhost (testpmd/rxonly/vhost-PMD) has been done, I saw ~30% performance gain in packed ring in this case. To make this patch set work with below patch set for vhost, some hacks are needed to set the _F_NEXT flag in indirect descriptors (this should be fixed in vhost): https://lkml.org/lkml/2018/7/3/33 v2 -> v3: - Use leXX instead of virtioXX (MST); - Refactor split ring first (MST); - Add debug helpers (MST); - Put split/packed ring specific fields in sub structures (MST); - Handle normal descriptors and indirect descriptors differently (MST); - Track the DMA addr/len related info in a separate structure (MST); - Calculate AVAIL/USED flags only when wrap counter wraps (MST); - Define a struct/union to read event structure (MST); - Define a macro for wrap counter bit in uapi (MST); - Define the AVAIL/USED bits as shifts instead of values (MST); - s/_F_/_FLAG_/ in VRING_PACKED_EVENT_* as they are values (MST); - Drop the notify workaround for QEMU's tx-timer in packed ring (MST); v1 -> v2: - Use READ_ONCE() to read event off_wrap and flags together (Jason); - Add comments related to ccw (Jason); RFC v6 -> v1: - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed() when event idx is off (Jason); - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason); - Test the state of the desc at used_idx instead of last_used_idx in virtqueue_enable_cb_delayed_packed() (Jason); - Save wrap counter (as part of queue state) in the return value of virtqueue_enable_cb_prepare_packed(); - Refine the packed ring definitions in uapi; - Rebase on the net-next tree; RFC v5 -> RFC v6: - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason); - Define wrap counter as bool (Jason); - Use ALIGN() in vring_init_packed() (Jason); - Avoid using pointer to track `next` in detach_buf_packed() (Jason); - Add comments for barriers (Jason); - Don't enable RING_PACKED on ccw for now (noticed by Jason); - Refine the memory barrier in virtqueue_poll(); - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed(); - Remove the hacks in virtqueue_enable_cb_prepare_packed(); RFC v4 -> RFC v5: - Save DMA addr, etc in desc state (Jason); - Track used wrap counter; RFC v3 -> RFC v4: - Make ID allocation support out-of-order (Jason); - Various fixes for EVENT_IDX support; RFC v2 -> RFC v3: - Split into small patches (Jason); - Add helper virtqueue_use_indirect() (Jason); - Just set id for the last descriptor of a list (Jason); - Calculate the prev in virtqueue_add_packed() (Jason); - Fix/improve desc suppression code (Jason/MST); - Refine the code layout for XXX_split/packed and wrappers (MST); - Fix the comments and API in uapi (MST); - Remove the BUG_ON() for indirect (Jason); - Some other refinements and bug fixes; RFC v1 -> RFC v2: - Add indirect descriptor support - compile test only; - Add event suppression supprt - compile test only; - Move vring_packed_init() out of uapi (Jason, MST); - Merge two loops into one in virtqueue_add_packed() (Jason); - Split vring_unmap_one() for packed ring and split ring (Jason); - Avoid using '%' operator (Jason); - Rename free_head -> next_avail_idx (Jason); - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason); - Some other refinements and bug fixes; ==================== Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-26 22:17:40 -08:00
Tiwei Bie	f959a128fe	virtio_ring: advertize packed ring layout Advertize the packed ring layout support. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-26 22:17:40 -08:00
Tiwei Bie	3a814fdf27	virtio_ring: disable packed ring on unsupported transports Currently, ccw, vop and remoteproc need some legacy virtio APIs to create or access virtio rings, which are not supported by packed ring. So disable packed ring on these transports for now. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-26 22:17:40 -08:00
Tiwei Bie	f51f982682	virtio_ring: leverage event idx in packed ring Leverage the EVENT_IDX feature in packed ring to suppress events when it's available. Signed-off-by: Tiwei Bie <tiwei.bie@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-11-26 22:17:39 -08:00

1 2 3 4 5 ...

798407 Commits