linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-13 22:53:20 +00:00

Author	SHA1	Message	Date
David Herrmann	b8a943e294	samples/bpf: add lpm-trie benchmark Extend the map_perf_test_{user,kern}.c infrastructure to stress test lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure the latency depending on trie size and lookup count. On my Intel Haswell i7-6400U, a single gettid() syscall with an empty bpf program takes roughly 6.5us on my system. Lookups in empty tries take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192 entries take ~7.1us (on the first _and_ any subsequent try). Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Reviewed-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 16:10:38 -05:00
David Herrmann	4d3381f5a3	bpf: Add tests for the lpm trie map The first part of this program runs randomized tests against the lpm-bpf-map. It implements a "Trivial Longest Prefix Match" (tlpm) based on simple, linear, single linked lists. The implementation should be pretty straightforward. Based on tlpm, this inserts randomized data into bpf-lpm-maps and verifies the trie-based bpf-map implementation behaves the same way as tlpm. The second part uses 'real world' IPv4 and IPv6 addresses and tests the trie with those. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 16:10:38 -05:00
Daniel Mack	b95a5c4db0	bpf: add a longest prefix match trie map implementation This trie implements a longest prefix match algorithm that can be used to match IP addresses to a stored set of ranges. Internally, data is stored in an unbalanced trie of nodes that has a maximum height of n, where n is the prefixlen the trie was created with. Tries may be created with prefix lengths that are multiples of 8, in the range from 8 to 2048. The key used for lookup and update operations is a struct bpf_lpm_trie_key, and the value is a uint64_t. The code carries more information about the internal implementation. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 16:10:38 -05:00
Bhumika Goyal	10eeb5e645	net: xilinx: constify net_device_ops structure Declare net_device_ops structure as const as it is only stored in the netdev_ops field of a net_device structure. This field is of type const, so net_device_ops structures having same properties can be made const too. Done using Coccinelle: @r1 disable optional_qualifier@ identifier i; position p; @@ static struct net_device_ops i@p={...}; @ok1@ identifier r1.i; position p; struct net_device ndev; @@ ndev.netdev_ops=&i@p @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct net_device_ops i; File size before: text data bss dec hex filename 6201 744 0 6945 1b21 ethernet/xilinx/xilinx_emaclite.o File size after: text data bss dec hex filename 6745 192 0 6937 1b19 ethernet/xilinx/xilinx_emaclite.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 15:58:49 -05:00
Bhumika Goyal	30bd2f52e5	net: moxa: constify net_device_ops structures Declare net_device_ops structure as const as it is only stored in the netdev_ops field of a net_device structure. This field is of type const, so net_device_ops structures having same properties can be made const too. Done using Coccinelle: @r1 disable optional_qualifier@ identifier i; position p; @@ static struct net_device_ops i@p={...}; @ok1@ identifier r1.i; position p; struct net_device ndev; @@ ndev.netdev_ops=&i@p @bad@ position p!={r1.p,ok1.p}; identifier r1.i; @@ i@p @depends on !bad disable optional_qualifier@ identifier r1.i; @@ +const struct net_device_ops i; File size before: text data bss dec hex filename 4821 744 0 5565 15bd ethernet/moxa/moxart_ether.o File size after: text data bss dec hex filename 5373 192 0 5565 15bd ethernet/moxa/moxart_ether.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 15:58:49 -05:00
Timur Tabi	4404323c6a	net: qcom/emac: claim the irq only when the device is opened During reset, functions emac_mac_down() and emac_mac_up() are called, so we don't want to free and claim the IRQ unnecessarily. Move those operations to open/close. Signed-off-by: Timur Tabi <timur@codeaurora.org> Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 13:03:28 -05:00
Timur Tabi	41c1093f2e	net: qcom/emac: rename emac_phy to emac_sgmii and move it The EMAC has an internal PHY that is often called the "SGMII". This SGMII is also connected to an external PHY, which is managed by phylib. These dual PHYs often cause confusion. In this case, the data structure for managing the SGMII was mis-named and located in the wrong header file. Structure emac_phy is renamed to emac_sgmii to clearly indicate it applies to the internal PHY only. It also also moved from emac_phy.h (which supports the external PHY) to emac_sgmii.h (where it belongs). To keep the changes minimal, only the structure name is changed, not the names of any variables of that type. Signed-off-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 12:54:35 -05:00
Eric Dumazet	b9032741e4	bnx2x: avoid two atomic ops per page on x86 Commit `4cace675d6` ("bnx2x: Alloc 4k fragment for each rx ring buffer element") added extra put_page() and get_page() calls on arches where PAGE_SIZE=4K like x86 Reorder things to avoid this overhead. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Yuval Mintz <Yuval.Mintz@cavium.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-23 11:16:27 -05:00
David S. Miller	41e8c70ee1	Merge branch 'bcm7278' Florian Fainelli says: ==================== net: dsa: bcm_sf2: Add support for BCM7278 This patch series adds support for the Broadcom BCM7278 integrated switch which is a successor of the BCM7445 switch. We have a little bit of register shuffling going on, which is why most of the functional changes are to deal with that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:59:00 -05:00
Florian Fainelli	039a7b8592	net: phy: bcm7xxx: Implement EGPHY workaround for 7278 Implement the HW design team recommended workaround in for 7278. Since the GPHY now returns its revision information in MII_PHYS_ID[23] we need to check whether the revision provided in flags is 0 or not. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	582d0ac397	net: phy: bcm7xxx: Add entry for BCM7278 Add support for the BCM7278 28nm process Gigabit Ethernet PHY. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	64ff2aef91	net: dsa: bcm_sf2: Allow non-IMP ports to have Broadcom tags enabled Parse the "brcm,use-bcm-hdr" boolean property during ports identification to fill a bitmask of ports that should have Broadcom tags enabled. This is needed in some configurations where per-packet metadata can be exchanged using Broadcom tags between the switch and an on-chip acceleration device. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	ebb2ac4f32	net: dsa: bcm_sf2: Move code enabling Broadcom tags In preparation for enabling Broadcom tags on different ports based on configuration information, dedicate a function that is responsible for enabling Broadcom tags for a given port and update the IMP port setup to call it. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	0fe9933804	net: dsa: bcm_sf2: Add support for BCM7278 integrated switch Add support for the integrated switch found on BCM7278: - core_reg_align is set to 1, to force a translation into the target address space which is 8 bytes aligned - an alternate SWITCH_REG layout is provided since registers are largely bit/masks compatible but have different offsets - conditional for all CORE_STS_OVERRIDE_{IMP,GMII_P} since those got moved way out of the traditional register space Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	a78e86ed58	net: dsa: bcm_sf2: Prepare for different register layouts In preparation for supporting a new device with a slightly different register layout, affecting the SWITCH_REG and SWITCH_CORE address spaces, perform a few preparatory steps: - allow matching the compatible string against a data description - convert the SWITCH_REG register accesses into an indirection table - prepare for supporting a SWITCH_CORE register alignment requirement Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
Florian Fainelli	329b5c58f8	net: dsa: bcm_sf2: Make SF2_IO64_MACRO() utilize 32-bit macro There is no point inlining the 32-bit direct register read/write part, just infer it from the existing macro. This will make it easier to centralize the address rewriting that we are going to introduce later on. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:58:31 -05:00
David S. Miller	b20b564b95	Merge branch 'systemport-lite' Florian Fainelli says: ==================== net: systemport: Add support for SYSTEMPORT lite This patch series adds support for SYSTEMPORT Lite which is an evolution of the existing SYSTEMPORT adapter. The two generations are largely identical as far as the transmit/receive path are concerned, and there were just a few control path changes here and there. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:56:07 -05:00
Florian Fainelli	44a4524c54	net: systemport: Add support for SYSTEMPORT Lite Add supporf for the SYSTEMPORT Lite Ethernet controller, this piece of hardware is largely based on the full-blown SYSTEMPORT and differs in the following: - no full-blown UniMAC, instead we have the MagicPacket matching from UniMAC at same offset, and a GMII Interface Block (GIB) for the MAC-level stuff, since we are always interfaced to an Ethernet switch which is fully Ethernet compliant shortcuts could be made - 16 transmit queues, whose interrupts are moved into the first Level-2 interrupt controller bank - slight TDMA offset change (a register was inserted after TDMA_STATUS, sigh) - 256 RX descriptors (512 words) and 256 TX descriptors (not visible) As a consequence of these two things, update the code paths accordingly to differentiate the full-blown from the light version. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:56:06 -05:00
Florian Fainelli	7b78be48a8	net: systemport: Dynamically allocate number of TX rings In preparation for adding SYSTEMPORT Lite, which has twice as less transmit queues than SYSTEMPORT make sure we do allocate TX rings based on the systemport,txq property to get an appropriate memory footprint. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:56:06 -05:00
Eric Dumazet	9ca677b1bd	ipv6: add NUMA awareness to seg6_hmac_init_algo() Since we allocate per cpu storage, let's also use NUMA hints. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:50:36 -05:00
jpinto	f4ec60644a	net: stmicro: fix LS field mask in EEE configuration This patch fixes the LS mask when setting EEE timer. LS field is 10 bits long and not 11 as currently. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Reported-By: Rayagond Kokatanur <rayagond@vayavyalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:47:36 -05:00
Geliang Tang	3704eb6f6f	net/mlx4: use rb_entry() To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:46:13 -05:00
Geliang Tang	530cef21d9	6lowpan: use rb_entry() To make the code clearer, use rb_entry() instead of container_of() to deal with rbtree. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-22 16:46:13 -05:00
David S. Miller	9a549c1e35	Merge branch 'dsa-hwmon' Andrew Lunn says: ==================== net: dsa: Move temperature sensor code into PHY. Marvell Ethernet switches contain a temperature sensor. There appears to be one sensor, which is shared by each of the internal PHYs. Each PHY has independent registers to read this sensor, and to set a limit for when an alarm should be raised. Some Marvell discrete PHY also have the same sensor and registers. Moving the HWMON code from DSA into the PHY makes the sensor available in discrete PHYs, and removes the layering violation, the switch driver poking around in PHY registers. While moving the code into the PHY driver, it has been re-written to use the new HWMON APIs. v2: Better Cover note explaining one sensor, but multiple independent registers Simply error checking. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 14:42:52 -05:00
Andrew Lunn	cf1a56a4cf	net: dsa: Remove hwmon support Only the Marvell mv88e6xxx DSA driver made use of the HWMON support in DSA. The temperature sensor registers are actually in the embedded PHYs, and the PHY driver now supports it. So remove all HWMON support from DSA and drivers. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 14:42:51 -05:00
Andrew Lunn	0b04680fda	phy: marvell: Add support for temperature sensor Some Marvell PHYs have an inbuilt temperature sensor. Add hwmon support for this sensor. There are two different variants. The simpler, older chips have a 5 degree accuracy. The newer devices have 1 degree accuracy. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 14:42:51 -05:00
Josef Bacik	319554f284	inet: don't use sk_v6_rcv_saddr directly When comparing two sockets we need to use inet6_rcv_saddr so we get a NULL sk_v6_rcv_saddr if the socket isn't AF_INET6, otherwise our comparison function can be wrong. Fixes: `637bc8b` ("inet: reset tb->fastreuseport when adding a reuseport sk") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 14:35:51 -05:00
David S. Miller	7d982567f4	mlx5 and mlx5e updates 2017-01-19 This series includes some updates for mlx5 core and mlx5e netdevice driver. From Leon, a small fix that remove an unnecessary print. From Eli Cohen, a fix to the FW version printout in case of internal error. From Eugenia Emantayev, two patches, the 1st adds mlx5 1pps (pulse per second) mlx5 infrastructure support and the 2nd adds the necessary bits for mlx5e ptp logic and structures. From Mohamad, add support for s-tagged packet receive when in promiscuous mode. Form Gal Pressman, MCAM (Management capabilities mask register) and PCAM (Ports capabilities mask register) registers infrastructure, those registers are needed in order to query the different statistics registers support in FW, in order for the driver to enable/disable query and reporting them back to user. On top of this infrastructure we've exposed new set of statistics groups: - MPCNT: Physical layer statistical counters (For symbol errors) - PPCNT: PCIe performance counters In addition to the statistics capabilities series we've moved the mlx5 HCA capabilities fields to a dedicated struct under the driver private data. At the end a small patch to update & query statistics in the most desired order. Thanks, Saeed. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJYgT2XAAoJEEg/ir3gV/o++x8H/2adUTYToQVH9T9KdeHYcpYj LtN36/8WFL5MliMpK0DcmuKe9k45ukN5bJGUDhwndJBsHJledBoFw3C6k4vZl0Qw NiP4t165xmwYQrqI75KVeeGqNWl6LanozZzJVsOM48mSjOXClPnz5BFR4UgL5gTh q60VmqpSeBjT0EQfT18s1DZCdUY6UUK1XgmgNnFsHUhO/iWVPlNEwItblC2N/YWA p7lGUAJmAQvDN2sejzz0ElcCieY8yA+cZHgalW0KZK961RCwIl1GECw5xEcLLGxN O88jaDvTuJpZtl0IOsBi9dwZHx64dx1a+wkFGv+GA6eTgiQ5kPgb2Jdhy26jf9g= =Hy/5 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2017-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 and mlx5e updates 2017-01-19 This series includes some updates for mlx5 core and mlx5e netdevice driver. From Leon, a small fix that remove an unnecessary print. From Eli Cohen, a fix to the FW version printout in case of internal error. From Eugenia Emantayev, two patches, the 1st adds mlx5 1pps (pulse per second) mlx5 infrastructure support and the 2nd adds the necessary bits for mlx5e ptp logic and structures. From Mohamad, add support for s-tagged packet receive when in promiscuous mode. Form Gal Pressman, MCAM (Management capabilities mask register) and PCAM (Ports capabilities mask register) registers infrastructure, those registers are needed in order to query the different statistics registers support in FW, in order for the driver to enable/disable query and reporting them back to user. On top of this infrastructure we've exposed new set of statistics groups: - MPCNT: Physical layer statistical counters (For symbol errors) - PPCNT: PCIe performance counters In addition to the statistics capabilities series we've moved the mlx5 HCA capabilities fields to a dedicated struct under the driver private data. At the end a small patch to update & query statistics in the most desired order. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 14:22:27 -05:00
David S. Miller	cc154783e7	Merge branch 'cpsw-common-res-usage' Ivan Khoronzhuk says: ==================== net: ethernet: ti: cpsw: correct common res usage This series is intended to remove unneeded redundancies connected with common resource usage function. Since v1: - changed name to cpsw_get_usage_count() - added comments to open/closw for cpsw_get_usage_count() - added patch: net: ethernet: ti: cpsw: clarify ethtool ops changing num of descs Based on net-next/master ==================== Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:35:11 -05:00
Ivan Khoronzhuk	022d7ad71d	net: ethernet: ti: cpsw: clarify ethtool ops changing num of descs After adding cpsw_set_ringparam ethtool op, better to carry out common parts of similar ops splitting descriptors in runtime. It allows to reuse these parts and shows what the ops actually do. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:35:10 -05:00
Ivan Khoronzhuk	fe734d0aa9	net: ethernet: ti: cpsw: don't duplicate common res in rx handler No need to duplicate the same function in rx handler to get info if any interface is running. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:35:10 -05:00
Ivan Khoronzhuk	03fd01ad0e	net: ethernet: ti: cpsw: don't duplicate ndev_running No need to create additional vars to identify if interface is running. So simplify code by removing redundant var and checking usage counter instead. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:35:09 -05:00
Ivan Khoronzhuk	176b0cbffd	net: ethernet: ti: cpsw: don't disable interrupts in ndo_open No need to disable interrupts if no open devices, they are disabled anyway. Even no need to disable interrupts if some ndev is opened, In this case shared resources are not touched, only parameters of ndev shell, so no reason to disable them also. Removed lines have proved it. So, no need in redundant check and interrupt disable. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:35:09 -05:00
Ivan Khoronzhuk	aafc93a3b6	net: ethernet: ti: cpsw: remove dual check from common res usage function Common res usage is possible only in case an interface is running. In case of not dual emac here can be only one interface, so while ndo_open and switch mode, only one interface can be opened, thus if open is called no any interface is running ... and no common res are used. So remove check on dual emac, it will simplify code/understanding and will match the name it's called. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:35:09 -05:00
David S. Miller	e9f3a685a2	Merge branch 'rxbusy' Mahesh Bandewar says: ==================== use netdev_is_rx_handler_busy() in few known cases netdev_rx_handler_register() was recently split into two parts - (a) check if the handler is used, (b) register the new handler, parts. This is helpful in scenarios like bonding where at the time of registration there is too much state to unwind and it should check if the device is free before building that state. IPvlan and macvlan drivers don't have this issue however it can make use of the same check instead of using a device specific check. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:22:27 -05:00
Mahesh Bandewar	322dc6e067	macvlan: use netdev_is_rx_handler_busy instead of checking specific type netdev_is_rx_handler_busy() check is a superset of netif_is_ipvlan_port() check and hence should be preferred. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:22:26 -05:00
Mahesh Bandewar	c3262d9dec	ipvlan: use netdev_is_rx_handler_busy instead of checking specific type IPvlan checks if the master device is already used by checking a specific device (here it's macvlan device). This is technically not sufficient and it should just ensure the rx_handler is busy or not. This would be a super check that includes macvlan and any other that has already registered rx-handler. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:22:26 -05:00
Mahesh Bandewar	1b7cd0044e	net: remove duplicate code. netdev_rx_handler_register() checks to see if the handler is already busy which was recently separated into netdev_is_rx_handler_busy(). So use the same function inside register() to avoid code duplication. Essentially this change should be a no-op Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:22:25 -05:00
Andrew Collins	264b87fa61	fq_codel: Avoid regenerating skb flow hash unless necessary The fq_codel qdisc currently always regenerates the skb flow hash. This wastes some cycles and prevents flow seperation in cases where the traffic has been encrypted and can no longer be understood by the flow dissector. Change it to use the prexisting flow hash if one exists, and only regenerate if necessary. Signed-off-by: Andrew Collins <acollins@cradlepoint.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:15:14 -05:00
Lance Richardson	1f6cc07e17	vxlan: preserve type of dst_port parm for encap_bypass_if_local() Eliminate sparse warning by maintaining type of dst_port as __be16. Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:12:14 -05:00
Lance Richardson	22fbece133	csum: eliminate sparse warning in remcsum_unadjust() Cast second parameter of csum_sub() from __sum16 to __wsum. Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:12:13 -05:00
David S. Miller	8d00e20221	Merge branch 'tipc-multicast-through-replication' Jon Maloy says: ==================== tipc: emulate multicast through replication TIPC multicast messages are currently distributed via L2 broadcast or IP multicast to all nodes in the cluster, irrespective of the number of real destinations of the message. In this series we introduce an option to transport messages via replication ("replicast") across a selected number of unicast links, instead of relying on the underlying media. This option is used when true broadcast/multicast is not supported by the media, or when the number of true destinations is much smaller than the cluster size. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:10:17 -05:00
Jon Paul Maloy	01fd12bb18	tipc: make replicast a user selectable option If the bearer carrying multicast messages supports broadcast, those messages will be sent to all cluster nodes, irrespective of whether these nodes host any actual destinations socket or not. This is clearly wasteful if the cluster is large and there are only a few real destinations for the message being sent. In this commit we extend the eligibility of the newly introduced "replicast" transmit option. We now make it possible for a user to select which method he wants to be used, either as a mandatory setting via setsockopt(), or as a relative setting where we let the broadcast layer decide which method to use based on the ratio between cluster size and the message's actual number of destination nodes. In the latter case, a sending socket must stick to a previously selected method until it enters an idle period of at least 5 seconds. This eliminates the risk of message reordering caused by method change, i.e., when changes to cluster size or number of destinations would otherwise mandate a new method to be used. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:10:17 -05:00
Jon Paul Maloy	a853e4c6d0	tipc: introduce replicast as transport option for multicast TIPC multicast messages are currently carried over a reliable 'broadcast link', making use of the underlying media's ability to transport packets as L2 broadcast or IP multicast to all nodes in the cluster. When the used bearer is lacking that ability, we can instead emulate the broadcast service by replicating and sending the packets over as many unicast links as needed to reach all identified destinations. We now introduce a new TIPC link-level 'replicast' service that does this. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:10:17 -05:00
Jon Paul Maloy	2ae0b8af1f	tipc: add functionality to lookup multicast destination nodes As a further preparation for the upcoming 'replicast' functionality, we add some necessary structs and functions for looking up and returning a list of all nodes that host destinations for a given multicast message. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:10:16 -05:00
Jon Paul Maloy	9999974a83	tipc: add function for checking broadcast support in bearer As a preparation for the 'replicast' functionality we are going to introduce in the next commits, we need the broadcast base structure to store whether bearer broadcast is available at all from the currently used bearer or bearers. We do this by adding a new function tipc_bearer_bcast_support() to the bearer layer, and letting the bearer selection function in bcast.c use this to give a new boolean field, 'bcast_support' the appropriate value. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:10:15 -05:00
Gianluca Borello	a5e8c07059	bpf: add bpf_probe_read_str helper Provide a simple helper with the same semantics of strncpy_from_unsafe(): int bpf_probe_read_str(void dst, int size, const void unsafe_addr) This gives more flexibility to a bpf program. A typical use case is intercepting a file name during sys_open(). The current approach is: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 bpf_probe_read(buf, sizeof(buf), ctx->di); / consume buf / } This is suboptimal because the size of the string needs to be estimated at compile time, causing more memory to be copied than often necessary, and can become more problematic if further processing on buf is done, for example by pushing it to userspace via bpf_perf_event_output(), since the real length of the string is unknown and the entire buffer must be copied (and defining an unrolled strnlen() inside the bpf program is a very inefficient and unfeasible approach). With the new helper, the code can easily operate on the actual string length rather than the buffer size: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di); /* consume buf, for example push it to userspace via * bpf_perf_event_output(), but this time we can use * res (the string length) as event size, after checking * its boundaries. */ } Another useful use case is when parsing individual process arguments or individual environment variables navigating current->mm->arg_start and current->mm->env_start: using this helper and the return value, one can quickly iterate at the right offset of the memory area. The code changes simply leverage the already existent strncpy_from_unsafe() kernel function, which is safe to be called from a bpf program as it is used in bpf_trace_printk(). Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 12:08:43 -05:00
David S. Miller	0760462860	Merge branch 'bus-agnostic-num-vf' Phil Sutter says: ==================== Retrieve number of VFs in a bus-agnostic way Previously, it was assumed that only PCI NICs would be capable of having virtual functions - with my proposed enhancement of dummy NIC driver implementing (fake) ones for testing purposes, this is no longer true. Discussion of said patch has led to the suggestion of implementing a bus-agnostic method for VF count retrieval so rtnetlink could work with both real VF-capable PCI NICs as well as my dummy modifications without introducing ugly hacks. The following series tries to achieve just that by introducing a bus type callback to retrieve a device's number of VFs, implementing this callback for PCI bus and finally adjusting rtnetlink to make use of the generalized infrastructure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:43:17 -05:00
Phil Sutter	9af15c3825	device: Implement a bus agnostic dev_num_vf routine Now that pci_bus_type has num_vf callback set, dev_num_vf can be implemented in a bus type independent way and the check for whether a PCI device is being handled in rtnetlink can be dropped. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:43:17 -05:00
Phil Sutter	02e0bea6c8	PCI: implement num_vf bus type callback Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-01-20 11:43:16 -05:00

1 2 3 4 5 ...

649332 Commits