linux

Author	SHA1	Message	Date
Fuyun Liang	0979aa0bfd	net: hns3: fix for getting wrong link mode problem Fixed link mode is returned by hns3_get_link_ksettings. It is unreasonable. This patch fixes it by adding some related functions to get link mode from hardware. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:12:02 -04:00
Fuyun Liang	a95e1f8666	net: hns3: change the time interval of int_gl calculating Since we change the update rate of int_gl from every interrupt to every one hundred interrupts, the old way to get time interval by int_gl value is not accurate. This patch calculates the time interval using the jiffies value. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:12:02 -04:00
Fuyun Liang	cd9d187b07	net: hns3: change GL update rate The interrupt coalescing self-adaptive function updates the int_gl every interrupt. The GL update rate is too faster to get a better new GL value. This patch changes the GL update rate to every one hundred interrupts. The GL update rate is defined by HNS3_INT_ADAPT_DOWN_START. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:12:02 -04:00
Peng Li	d458c815e7	net: hns3: increase the max time for IMP handle command It may need more time for IMP handle some command, such as reset. This patch enlarges the max time for cmd timeout. Driver will check the IMP result every us, it may break through the loop when get the right result. So not all command need the max time. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:12:02 -04:00
Yunsheng Lin	2f550a4678	net: hns3: export pci table of hclge and hclgevf to userspace There is no module that is dependent on hclge or hclgevf's symbol, but hns_enet need them to provide ops for it to run. When there is a need to auto load the hns3 driver, the auto load will fail because hclge or hclgevf is not loaded. Hns_enet has already exported the pci table, so this patch exports the pci table for hclge and hclgevf module too. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:12:02 -04:00
Yunsheng Lin	681ec3999b	net: hns3: fix for vlan table lost problem when resetting The vlan table in hardware is clear after PF/Core/IMP/Global reset, which will cause vlan tagged packets not being received problem. This patch fixes it by restoring the vlan table after reset. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:12:01 -04:00
Peng Li	1a426f8b40	net: hns3: fix the VF queue reset flow error VF queue reset flow is different from PF queue reset flow. VF driver should stop VF queue first, then send message to PF and PF do the reset. PF should send a response to VF after PF complete the queue reset, VF can initialize the queue hw after get the response. This patch fixes the VF queue reset flow as the correct step. Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:12:01 -04:00
Fuyun Liang	dd72140ca9	net: hns3: reallocate tx/rx buffer after changing mtu When changing the mtu, the max frame size also will be changed. The tx buffer size and the rx buffer size to be allocated are determined by max frame size. So when max frame size is changed, the tx buffer and rx buffer need to be reallocated. When the tc_num is changed, the tx buffer and rx buffer need to be reallocated too. So calling set_mtu and buffer_alloc separately is better. Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:12:01 -04:00
David Ahern	145307460b	devlink: Remove top_hierarchy arg to devlink_resource_register top_hierarchy arg can be determined by comparing parent_resource_id to DEVLINK_RESOURCE_ID_PARENT_TOP so it does not need to be a separate argument. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 13:08:41 -04:00
David Ahern	8ae797aaa8	selftests: Add multipath tests for onlink flag Add multipath tests for onlink flag: one test with onlink added to both nexthops, then tests with onlink added to only 1 nexthop. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 12:37:05 -04:00
David S. Miller	bc48740bcd	Merge branch 'cxgb4-rdma' Raju Rangoju says: ==================== Add support for RDMA enhancements in cxgb4 Allocates the HW-resources and provide the necessary routines for the upper layer driver (rdma/iw_cxgb4) to enable the RDMA SRQ support for Chelsio adapters. Advertise support for write with immediate work request Advertise support for write with completion v3: modified memory allocation as per Stefano's suggestion v2: fixed the patching issues and also fixed the following based on review comments of Stefano Brivio - using kvzalloc instead of vzalloc - using #define instead of enum ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:59:12 -04:00
Raju Rangoju	f3910c6278	cxgb4: Support firmware rdma write completion work request. If FW supports RDMA WRITE_COMPLETION functionality, then advertise that to the ULDs. This will be used by iw_cxgb4 to allow WRITE_COMPLETION work requests. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:59:11 -04:00
Raju Rangoju	43db929640	cxgb4: Support firmware rdma write with immediate work request. If FW supports RDMA WRITE_WITH_IMMEDATE functionality, then advertise that to the ULDs. This will be used by iw_cxgb4 to allow WRITE_WITH_IMMEDIATE work requests. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:59:11 -04:00
Raju Rangoju	c68644ef16	cxgb4: Add support to query HW SRQ parameters This patch adds support to query FW for the HW SRQ table start/end, and advertise that for ULDs. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:59:11 -04:00
Raju Rangoju	e47094751d	cxgb4: Add support to initialise/read SRQ entries - This patch adds support to initialise srq table and read srq entries Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:59:11 -04:00
Raju Rangoju	a3cdaa69e4	cxgb4: Adds CPL support for Shared Receive Queues - Add srq table query cpl support for srq - Add cpl_abort_req_rss6 and cpl_abort_rpl_rss6 structs. - Add accessors, macros to get the SRQ IDX value. Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:59:10 -04:00
David S. Miller	0dfebaf1cd	Merge branch 'r8169-small-improvements' Heiner Kallweit says: ==================== r8169: series with smaller improvements w/o functional changes This series includes smaller improvements w/o intended functional changes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:47:53 -04:00
Heiner Kallweit	1e1205b7d3	r8169: add helper tp_to_dev In several places struct device is referenced by using &tp->pci_dev->dev. Add helper tp_to_dev() to improve code readability. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:47:52 -04:00
Heiner Kallweit	73c86ee38b	r8169: change type of argument in rtl_disable/enable_clock_request Changing the argument type to struct rtl8169_private * is more in line with the other functions in the driver and it allows to reduce the code size. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:47:52 -04:00
Heiner Kallweit	cb73200c59	r8169: change type of first argument in rtl_tx_performance_tweak Changing the type of the first argument to struct rtl8169_private * is more in line with the other functions in the driver and it allows to reduce the code size. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:47:51 -04:00
Heiner Kallweit	1f7aa2bc26	r8169: simplify rtl_set_mac_address Replace open-coded functionality with eth_mac_addr(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:47:51 -04:00
David S. Miller	755f6633d6	This feature/cleanup patchset includes the following patches: - avoid redundant multicast TT entries, by Linus Luessing - add netlink support for distributed arp table cache and multicast flags, by Linus Luessing (2 patches) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlqv59kWHHN3QHNpbW9u d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoSBaD/9r/oJC+Q/3Eu6DTAAiS7Jx2IpQ kOwU7l4hkGK8mBZ098CkmHTBY+zurqYwcokCHhKJO5mqJEpvlM27PuQxzqWSBMMO FWFax2YlKPpJp+/f/rSD9HS73RTY7npv6l5/eFg6+0WSQET04PjLB1KxPrO5u1+Z JujrAxp0GEyMoVQgMy9uloedkpizhADyYSZzDDXnHhq1NiAPU87cjrTLv/xdtdp7 TvbNfobhZUmKZ951yaRlDmE+mH8IoTQoY7HD/JANnduYeFJAlIPnHQEQa8+5tLwO qWUeLa4Acv5MhO2KjbKQpu5r2dFbs0x+jmsja8xBmgNWO5meKh/aE8TKGJeDVQEW TTynEivf82suiquCIZ573fBnliJkipffg32ZHgtNGrh54hh+YU7Sts0t9Lsou4ar aOU6lup3MHFysf3s9hK6y9TzSqwFJ8Mak0UFsa03r0Ub8am6bKHTqMFaCgN0aK9P vBL4atSvJVguwPlzxLMi44K4NxqEVfa41dZ7nQ99P3BFzWwSvph3i4lu+cxGxwI7 4kgWU5Cz8T51dH7g8j3vUish36DzwQtUsKLWZVpV+DV4BaHJ/rLyqeug3ffUrWRk p0bFU7wBAv8rKeFPd30m2tdJ/nMo+rDbN6Tm9n43tK4NWKOGBndhCoNhjfrzhM8U un6Iy7taISgeElZ5fQ== =HVxO -----END PGP SIGNATURE----- Merge tag 'batadv-next-for-davem-20180319' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - avoid redundant multicast TT entries, by Linus Luessing - add netlink support for distributed arp table cache and multicast flags, by Linus Luessing (2 patches) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:28:54 -04:00
Sowmini Varadhan	bdf5bd7f21	rds: tcp: remove register_netdevice_notifier infrastructure. The netns deletion path does not need to wait for all net_devices to be unregistered before dismantling rds_tcp state for the netns (we are able to dismantle this state on module unload even when all net_devices are active so there is no dependency here). This patch removes code related to netdevice notifiers and refactors all the code needed to dismantle rds_tcp state into a ->exit callback for the pernet_operations used with register_pernet_device(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:21:45 -04:00
Christian Brauner	692ec06d7c	netns: send uevent messages This patch adds a receive method to NETLINK_KOBJECT_UEVENT netlink sockets to allow sending uevent messages into the network namespace the socket belongs to. Currently non-initial network namespaces are already isolated and don't receive uevents. There are a number of cases where it is beneficial for a sufficiently privileged userspace process to send a uevent into a network namespace. One such use case would be debugging and fuzzing of a piece of software which listens and reacts to uevents. By running a copy of that software inside a network namespace, specific uevents could then be presented to it. More concretely, this would allow for easy testing of udevd/ueventd. This will also allow some piece of software to run components inside a separate network namespace and then effectively filter what that software can receive. Some examples of software that do directly listen to uevents and that we have in the past attempted to run inside a network namespace are rbd (CEPH client) or the X server. Implementation: The implementation has been kept as simple as possible from the kernel's perspective. Specifically, a simple input method uevent_net_rcv() is added to NETLINK_KOBJECT_UEVENT sockets which completely reuses existing af_netlink infrastructure and does neither add an additional netlink family nor requires any user-visible changes. For example, by using netlink_rcv_skb() we can make use of existing netlink infrastructure to report back informative error messages to userspace. Furthermore, this implementation does not introduce any overhead for existing uevent generating codepaths. The struct netns got a new uevent socket member that records the uevent socket associated with that network namespace including its position in the uevent socket list. Since we record the uevent socket for each network namespace in struct net we don't have to walk the whole uevent socket list. Instead we can directly retrieve the relevant uevent socket and send the message. At exit time we can now also trivially remove the uevent socket from the uevent socket list. This keeps the codepath very performant without introducing needless overhead and even makes older codepaths faster. Uevent sequence numbers are kept global. When a uevent message is sent to another network namespace the implementation will simply increment the global uevent sequence number and append it to the received uevent. This has the advantage that the kernel will never need to parse the received uevent message to replace any existing uevent sequence numbers. Instead it is up to the userspace process to remove any existing uevent sequence numbers in case the uevent message to be sent contains any. Security: In order for a caller to send uevent messages to a target network namespace the caller must have CAP_SYS_ADMIN in the owning user namespace of the target network namespace. Additionally, any received uevent message is verified to not exceed size UEVENT_BUFFER_SIZE. This includes the space needed to append the uevent sequence number. Testing: This patch has been tested and verified to work with the following udev implementations: 1. CentOS 6 with udevd version 147 2. Debian Sid with systemd-udevd version 237 3. Android 7.1.1 with ueventd Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:16:43 -04:00
Christian Brauner	94e5e3087a	net: add uevent socket member This commit adds struct uevent_sock to struct net. Since struct uevent_sock records the position of the uevent socket in the uevent socket list we can trivially remove it from the uevent socket list during cleanup. This speeds up the old removal codepath. Note, list_del() will hit __list_del_entry_valid() in its call chain which will validate that the element is a member of the list. If it isn't it will take care that the list is not modified. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:16:42 -04:00
Kirill Tkhai	aa65f63654	net: Convert nf_ct_net_ops These pernet_operations register and unregister sysctl. Also, there is inet_frags_exit_net() called in exit method, which has to be safe after `a560002437` "net: Fix hlist corruptions in inet_evict_bucket()". Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:11:30 -04:00
Kirill Tkhai	08012631d6	net: Convert lowpan_frags_ops These pernet_operations register and unregister sysctl. Also, there is inet_frags_exit_net() called in exit method, which has to be safe after `a560002437` "net: Fix hlist corruptions in inet_evict_bucket()". Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:11:29 -04:00
Kirill Tkhai	1ae7762760	net: Convert can_pernet_ops These pernet_operations create and destroy /proc entries and cancel per-net timer. Also, there are unneed iterations over empty list of net devices, since all net devices must be already moved to init_net or unregistered by default_device_ops. This already was mentioned here: https://marc.info/?l=linux-can&m=150169589119335&w=2 So, it looks safe to make them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-22 11:11:29 -04:00
David S. Miller	454bfe9783	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-03-21 The following pull-request contains BPF updates for your net-next tree. The main changes are: 1) Add a BPF hook for sendmsg and sendfile by reusing the ULP infrastructure and sockmap. Three helpers are added along with this, bpf_msg_apply_bytes(), bpf_msg_cork_bytes(), and bpf_msg_pull_data(). The first is used to tell for how many bytes the verdict should be applied to, the second to tell that x bytes need to be queued first to retrigger the BPF program for a verdict, and the third helper is mainly for the sendfile case to pull in data for making it private for reading and/or writing, from John. 2) Improve address to symbol resolution of user stack traces in BPF stackmap. Currently, the latter stores the address for each entry in the call trace, however to map these addresses to user space files, it is necessary to maintain the mapping from these virtual addresses to symbols in the binary which is not practical for system-wide profiling. Instead, this option for the stackmap rather stores the ELF build id and offset for the call trace entries, from Song. 3) Add support that allows BPF programs attached to perf events to read the address values recorded with the perf events. They are requested through PERF_SAMPLE_ADDR via perf_event_open(). Main motivation behind it is to support building memory or lock access profiling and tracing tools with the help of BPF, from Teng. 4) Several improvements to the tools/bpf/ Makefiles. The 'make bpf' in the tools directory does not provide the standard quiet output except for bpftool and it also does not respect specifying a build output directory. 'make bpf_install' command neither respects specified destination nor prefix, all from Jiri. In addition, Jakub fixes several other minor issues in the Makefiles on top of that, e.g. fixing dependency paths, phony targets and more. 5) Various doc updates e.g. add a comment for BPF fs about reserved names to make the dentry lookup from there a bit more obvious, and a comment to the bpf_devel_QA file in order to explain the diff between native and bpf target clang usage with regards to pointer size, from Quentin and Daniel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-21 12:08:01 -04:00
Daniel Borkmann	78262f4575	bpf, doc: add description wrt native/bpf clang target and pointer size As this recently came up on netdev [0], lets add it to the BPF devel doc. [0] https://www.spinics.net/lists/netdev/msg489612.html Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2018-03-20 15:47:45 -07:00
David S. Miller	0466080c75	Merge branch 'dsa-mv88e6xxx-some-fixes' Uwe Kleine-König says: ==================== net: dsa: mv88e6xxx: some fixes these patches target net-next and got approved by Andrew Lunn. Compared to (implicit) v1, I dropped the patch that I didn't know if it was right because of missing documentation on my side. But Andrew already cared for that in a patch that is now `adfccf1182` in net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:29:58 -04:00
Uwe Kleine-König	36d6ea94b0	net: dsa: mv88e6xxx: Fix interrupt name for g2 irq This changes the respective line in /proc/interrupts from 49: x x mv88e6xxx-g1 7 Edge mv88e6xxx-g1 to 49: x x mv88e6xxx-g1 7 Edge mv88e6xxx-g2 which makes more sense. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:29:58 -04:00
Uwe Kleine-König	a708767e40	net: dsa: mv88e6xxx: Fix typo in a comment Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:29:57 -04:00
Uwe Kleine-König	79a68b2631	net: dsa: mv88e6xxx: Fix name of switch 88E6141 The switch name is emitted in the kernel log, so having the right name there is nice. Fixes: `1558727a1c` ("net: dsa: mv88e6xxx: Add support for ethernet switch 88E6141") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:29:57 -04:00
David S. Miller	6349a16962	Merge branch 'mlxsw-Adapt-driver-to-upcoming-firmware-versions' Ido Schimmel says: ==================== mlxsw: Adapt driver to upcoming firmware versions The first two patches make sure that reserved fields are set to zero, as required by the device's programmer's reference manual (PRM). Last two patches prevent the driver from performing an invalid operation that is going to be denied by upcoming firmware versions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:11:03 -04:00
Ido Schimmel	04719507b7	mlxsw: spectrum_acl: Do not invalidate already invalid ACL groups When a new ACL group is created its region (ACL) list is initially empty. Thus, the call to mlxsw_sp_acl_tcam_group_update() would basically invalidate an already invalid (non-existent) group. Remove the unnecessary call and make the function symmetric to its del() counterpart. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:11:02 -04:00
Ido Schimmel	808be37ae3	mlxsw: spectrum_acl: Adapt ACL configuration to new firmware versions The driver currently creates empty ACL groups, binds them to the requested port and then fills them with actual ACLs that point to TCAM regions. However, empty ACL groups are considered invalid and upcoming firmware versions are going to forbid their binding. Work around this limitation by only performing the binding after the first ACL was added to the group. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:11:02 -04:00
Tal Bar	7e8c711661	mlxsw: spectrum: Reserved field in mbox profile shouldn't be set There is no need to set some of the fields within 'mbox_config_profile', since they are reserved and capability mask should be set to zero. Signed-off-by: Tal Bar <talb@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:11:02 -04:00
Shalom Toledo	830a8b1b00	mlxsw: pci: Set mbox dma addresses to zero when not used Some of the opcodes don't use in, out or both mboxes. In such cases, the mbox address is a reserved field and FW expects it to be zero. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 12:11:02 -04:00
Matthew Wilcox	c846d8da56	mlx5: Remove call to ida_pre_get The mlx5 driver calls ida_pre_get() in a loop for no readily apparent reason. The driver uses ida_simple_get() which will call ida_pre_get() by itself and there's no need to use ida_pre_get() unless using ida_get_new(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-20 10:46:01 -04:00
Daniel Borkmann	d48ce3e5ba	Merge branch 'bpf-sockmap-ulp' John Fastabend says: ==================== This series adds a BPF hook for sendmsg and senfile by using the ULP infrastructure and sockmap. A simple pseudocode example would be, // load the programs bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG, &obj, &msg_prog); // lookup the sockmap bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map"); // get fd for sockmap map_fd_msg = bpf_map__fd(bpf_map_msg); // attach program to sockmap bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0); // Add a socket 'fd' to sockmap at location 'i' bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY); After the above snippet any socket attached to the map would run msg_prog on sendmsg and sendfile system calls. Three additional helpers are added bpf_msg_apply_bytes(), bpf_msg_cork_bytes(), and bpf_msg_pull_data(). With bpf_msg_apply_bytes BPF programs can tell the infrastructure how many bytes the given verdict should apply to. This has two cases. First, a BPF program applies verdict to fewer bytes than in the current sendmsg/sendfile msg this will apply the verdict to the first N bytes of the message then run the BPF program again with data pointers recalculated to the N+1 byte. The second case is the BPF program applies a verdict to more bytes than the current sendmsg or sendfile system call. In this case the infrastructure will cache the verdict and apply it to future sendmsg/sendfile calls until the byte limit is reached. This avoids the overhead of running BPF programs on large payloads. The helper bpf_msg_cork_bytes() handles a different case where a BPF program can not reach a verdict on a msg until it receives more bytes AND the program doesn't want to forward the packet until it is known to be "good". The example case being a user (albeit a dumb one probably) sends a N byte header in 1B system calls. The BPF program can call bpf_msg_cork_bytes with the required byte limit to reach a verdict and then the program will only be called again once N bytes are received. The last helper added in this series is bpf_msg_pull_data(). It is used to pull data in for modification or reading. Similar to how sk_pull_data() works msg_pull_data can be used to access data not in the initial (data_start, data_end) range. For sendpage() calls this is needed if any data is accessed because the BPF sendpage hook initializes the data_start and data_end pointers to zero. We do this because sendpage data is shared with the user and can be modified during or after the BPF verdict possibly invalidating any verdict the BPF program decides. For sendmsg the data is already copied by the sendmsg bpf infrastructure so we only copy the data if the user request a data range that is not already linearized. This happens if the user requests larger blocks of data that are not in a single scatterlist element. The common case seems to be accessing headers which normally are in the first scatterlist element and already linearized. For more examples please review the sample program. There are examples for all the actions and helpers there. Patches 1-8 implement the above sockmap/BPF infrastructure. The remaining patches flush out some minimal selftests and the sample sockmap program. The sockmap sample program is the main vehicle for testing this infrastructure and will be moved into selftests shortly. The final patch in this series is a simple shell script to run a set of tests. These are the tests I run after any changes to sockmap. The next task on the list after this series is to push those into selftests so we can avoid manually testing. Couple notes on future items in the pipeline, 0. move sample sockmap programs into selftests (noted above) 1. add additional support for tcp flags, most are ignored now. 2. add a Documentation/bpf/sockmap file with these details 3. support stacked ULP types to allow this and ktls to cooperate 4. Ingress flag support, redirect only supports egress here. The other redirect helpers support ingress and egress flags. 5. add optimizations, I cut a few optimizations here in the first iteration of the code for later study/implementation -v3 updates : u32 data pointers in msg_md changed to void * : page_address NULL check and flag verification in msg_pull_data : remove old note in commit msg that is no longer relevant : remove enum sk_msg_action its not used anywhere : fixup test_verifier W -> DW insn to account for data pointers : unintentionally dropped a smap_stop_tx() call in sockmap.c I propagated the ACKs forward because above changes were small one/two line changes. -v2 updates (discussion): Dave noticed that sendpage call was previously (in v1) running on the data directly. This allowed users to potentially modify the data after or during the BPF program. However doing a copy automatically even if the data is not accessed has measurable performance impact. So we added another helper modeled after the existing skb_pull_data() helper to allow users to selectively pull data from the msg. This is also useful in the sendmsg case when users need to access data outside the first scatterlist element or across scatterlist boundaries. While doing this I also unified the sendmsg and sendfile handlers a bit. Originally the sendfile call was optimized for never touching the data. I've decided for a first submission to drop this optimization and we can add it back later. It introduced unnecessary complexity, at least for a first posting, for a use case I have not entirely flushed out yet. When the use case is deployed we can add it back if needed. Then we can review concrete performance deltas as well on real-world use-cases/applications. Lastly, I reorganized the patches a bit. Now all sockmap changes are in a single patch and each helper gets its own patch. This, at least IMO, makes it easier to review because sockmap changes are not spread across the patch series. On the other hand now apply_bytes, cork_bytes logic is only activated later in the series. But that should be OK. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:43 +01:00
John Fastabend	ae30727fa4	bpf: sockmap test script This adds the test script I am currently using to validate the latest sockmap changes. Shortly sockmap will be ported to selftests and these will be run from the infrastructure there. Until then add the script here so we have a coverage checklist when porting into selftests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:41 +01:00
John Fastabend	0dcbbf6785	bpf: sockmap sample test for bpf_msg_pull_data This adds an option to test the msg_pull_data helper. This uses two options txmsg_start and txmsg_end to let the user specify start and end bytes to pull. The options can be used with txmsg_apply, txmsg_cork options as well as with any of the basic tests, txmsg, txmsg_redir and txmsg_drop (plus noisy variants) to run pull_data inline with those tests. By giving user direct control over the variables we can easily do negative testing as well as positive tests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:41 +01:00
John Fastabend	e6373ce70a	bpf: sockmap add SK_DROP tests Add tests for SK_DROP. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:41 +01:00
John Fastabend	468b3fdea8	bpf: sockmap sample support for bpf_msg_cork_bytes() Add sample application support for the bpf_msg_cork_bytes helper. This lets the user specify how many bytes each verdict should apply to. Similar to apply_bytes() tests these can be run as a stand-alone test when used without other options or inline with other tests by using the txmsg_cork option along with any of the basic tests txmsg, txmsg_redir, txmsg_drop. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:40 +01:00
John Fastabend	1c16c3126a	bpf: sockmap, add sample option to test apply_bytes helper This adds an option to test the apply_bytes helper. This option lets the user specify an int on the command line specifying how much data each verdict should apply to. When this is set a map entry is set with the bytes input by the user and then the specified program --txmsg or --txmsg_redir will use the value and set the applied data. If no other option is set then a default --txmsg_apply program is run. This program will drop pkts if an error is detected on the bytes map lookup. Useful to verify the map lookup and apply helper are working and causing a hard error if it is not. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:40 +01:00
John Fastabend	6bce9d2ca6	bpf: sockmap sample, add data verification option To verify data is not being dropped or corrupted this adds an option to verify test-patterns on recv. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:40 +01:00
John Fastabend	e67463cb5d	bpf: sockmap sample, add sendfile test To exercise TX ULP sendpage implementation we need a test that does a sendfile. Add sendfile test option here. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:40 +01:00
John Fastabend	4c4c3c276c	bpf: sockmap sample, add option to attach SK_MSG program Add sockmap option to use SK_MSG program types. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:40 +01:00
John Fastabend	1acc60b6a4	bpf: add verifier tests for BPF_PROG_TYPE_SK_MSG Test read and writes for BPF_PROG_TYPE_SK_MSG. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2018-03-19 21:14:39 +01:00

1 2 3 4 5 ...

739351 Commits