linux

Author	SHA1	Message	Date
Nikolay Aleksandrov	a7854037da	bridge: netlink: add support for vlan_filtering attribute This patch adds the ability to toggle the vlan filtering support via netlink. Since we're already running with rtnl in .changelink() we don't need to take any additional locks. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:36:43 -07:00
David S. Miller	e72ee3ed51	Merge branch 'qlcnic-enhancements' Shahed Shaikh says: ==================== qlcnic: enhancements This series adds few enhancements. o Patch from Harish reorders the sequence of header files inclusion, keeping kernel's header files on top. o Firmware introduced a new feature which allows driver to increases the size of firmware dump of iSCSI function which is being collected by NIC driver. o Print buffer address which is holding a firmware dump. o Use vzalloc() instead kzalloc() for allocating large chunk of memory which will avoid potential memory allocation failure. o Add new device ID for 0x8C30 which is a 83xx series based VF function. Please apply this series to net-next. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:34:29 -07:00
Shahed Shaikh	02509f171d	qlcnic: Update version to 5.3.63 Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:34:29 -07:00
Shahed Shaikh	0e90ad9bfd	qlcnic: Don't use kzalloc unncecessarily for allocating large chunk of memory Driver allocates a large chunk of temporary buffer using kzalloc to copy FW image. As there is no real need of this memory to be physically contiguous, use vzalloc instead. Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:34:28 -07:00
Shahed Shaikh	da286a6fd1	qlcnic: Add new VF device ID 0x8C30 This is a 83xx series based VF device Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:34:28 -07:00
Shahed Shaikh	642de51025	qlcnic: Print firmware minidump buffer and template header addresses Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:34:28 -07:00
Shahed Shaikh	d01a6d3c8a	qlcnic: Add support to enable capability to extend minidump for iSCSI In some cases it is required to capture minidump for iSCSI functions as part of default minidump collection process. To enable this, firmware exports it's capability and driver need to enable that capability by issuing a mailbox command. With this feature, firmware can provide additional iSCSI function's minidump with smaller minidump capture mask (0x1f). Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:34:28 -07:00
Harish Patil	a930a4639d	qlcnic: Rearrange ordering of header files inclusion Include local headers files after kernel's header files. Signed-off-by: Harish Patil <harish.patil@qlogic.com> Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:34:28 -07:00
Madalin Bucur	07151bc9f7	net: phy: select copper mode when Marvel 88e1111 in SGMII For the Marvel 88e1111 PHY only two SGMII modes are available, both allowing only SGMII to copper mode (with or without clock). SGMII to fiber mode is not supported. Make sure the fiber/copper registers selector bits are cleared for selecting copper mode. Signed-off-by: Madalin Bucur <madalin.bucur@freescale.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:31:18 -07:00
Kevin Hao	c4bc44c65b	net: fec: fix the race between xmit and bdp reclaiming path When we transmit a fragmented skb, we may run into a race like the following scenario (assume txq->cur_tx is next to txq->dirty_tx): cpu 0 cpu 1 fec_enet_txq_submit_skb reserve a bdp for the first fragment fec_enet_txq_submit_frag_skb update the bdp for the other fragment update txq->cur_tx fec_enet_tx_queue bdp = fec_enet_get_nextdesc(txq->dirty_tx, fep, queue_id); This bdp is the bdp reserved for the first segment. Given that this bdp BD_ENET_TX_READY bit is not set and txq->cur_tx is already pointed to a bdp beyond this one. We think this is a completed bdp and try to reclaim it. update the bdp for the first segment update txq->cur_tx So we shouldn't update the txq->cur_tx until all the update to the bdps used for fragments are performed. Also add the corresponding memory barrier to guarantee that the update to the bdps, dirty_tx and cur_tx performed in the proper order. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-10 13:28:14 -07:00
David S. Miller	2cf1b5ce16	Merge branch 'mlxsw-fixes' Jiri Pirko says: ==================== mlxsw: Couple of fixes/adjustments Ido Schimmel (5): mlxsw: Call free_netdev when removing port mlxsw: Make system port to local port mapping explicit mlxsw: Simplify mlxsw_sx_port_xmit function mlxsw: Use correct skb length when dumping payload mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit Jiri Pirko (2): mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM mlxsw: Strip FCS from incoming packets ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:54:31 -07:00
Ido Schimmel	e577516b9d	mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit Store the length of the skb before transmitting it and use it for stats instead of skb->len, since skb might have been freed already. This issue was discovered using the Kernel Address sanitizer (KASan). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:54:10 -07:00
Ido Schimmel	3bfcd34764	mlxsw: Use correct skb length when dumping payload Do not use the length of the transmitted skb (which was freed), but that of the response skb. This issue was discovered using the Kernel Address sanitizer (KASan). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:54:10 -07:00
Ido Schimmel	d003462a50	mlxsw: Simplify mlxsw_sx_port_xmit function Previously we only checked if the transmission queue is not full in the middle of the xmit function. This lead to complex logic due to the fact that sometimes we need to reallocate the headroom for our Tx header. Allow the switch driver to know if the transmission queue is not full before sending the packet and remove this complex logic. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:54:10 -07:00
Jiri Pirko	7b7b9cff74	mlxsw: Strip FCS from incoming packets FCS of incoming packets is already checked by HW. Just strip it out. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:54:10 -07:00
Jiri Pirko	74ed207e2a	mlxsw: Make pci module dependent on HAS_DMA and HAS_IOMEM This resolves compile errors on um-allyesconfig. Note that there are many other drivers which have the same issue. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:54:09 -07:00
Ido Schimmel	e61011b5e0	mlxsw: Make system port to local port mapping explicit System ports are unique identifiers in a multi-ASIC environment that represent all the available ports in the system. Local ports on the other hand, are unique only within the local ASIC. Since system port to local port mapping is not part of the HW-SW contract and since only single-ASIC configurations are currently supported, set an explicit 1:1 mapping by configuring the Switch System Port Record (SSPR) register. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:54:09 -07:00
Ido Schimmel	26a80f6e54	mlxsw: Call free_netdev when removing port When removing a port's netdevice we should also free the memory allocated by alloc_etherdev(). Do this by calling free_netdev() at the end of the teardown sequence. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:54:09 -07:00
Masanari Iida	ecea49914b	net: ethernet: Fix double word "the the" in eth.c This patch fix double word "the the" in Documentation/DocBook/networking/API-eth-get-headlen.html Documentation/DocBook/networking/netdev.html Documentation/DocBook/networking.xml These files are generated from comment in source, so I have to fix comment in net/ethernet/eth.c. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:53:00 -07:00
Shaohui Xie	0024f89200	net: phy: add RealTek RTL8211DN phy id RTL8211DN is compatible with RTL8211E. Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:52:15 -07:00
Robert Shearman	118d523463	mpls: Enforce payload type of traffic sent using explicit NULL RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only label on the stack, then after popping the resulting packet must be treated as a IPv4 packet and forwarded based on the IPv4 header. The same is true for IPv6 Explicit NULL with an IPv6 packet following. Therefore, when installing the IPv4/IPv6 Explicit NULL label routes, add an attribute that specifies the expected payload type for use at forwarding time for determining the type of the encapsulated packet instead of inspecting the first nibble of the packet. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:51:42 -07:00
David S. Miller	d74a790d52	Merge branch 'bpf-perf' Kaixu Xia says: ==================== bpf: Introduce the new ability of eBPF programs to access hardware PMU counter This patchset is base on the net-next: git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git commit `9dc20a6496`. Previous patch v6 url: https://lkml.org/lkml/2015/8/4/188 changes in V7: - rebase the whole patch set to net-next tree(`9dc20a64`); - split out the core perf APIs into Patch 1/5; - change the return value of function perf_event_attrs() from struct perf_event * to const struct perf_event * in Patch 1/5; - rename the function perf_event_read_internal() to perf_event_ read_local() and rewrite it in Patch 1/5; - rename the function check_func_limit() to check_map_func _compatibility() and remove the unnecessary pass pointer to a pointer in Patch 4/5; changes in V6: - make the Patch 1/4 commit message more meaning and readable; - remove the unnecessary comment in Patch 2/4 and make it clean; - declare the function perf_event_release_kernel() in include/ linux/perf_event.h to fix the build error when CONFIG_PERF_EVENTS isn't configured in Patch 2/4; - add function perf_event_attrs() to get the struct perf_event_attr in Patch 2/4. - move the related code from kernel/trace/bpf_trace.c to kernel/ events/core.c and add function perf_event_read_internal() to avoid poking inside of the event outside of perf code in Patch 3/4; - generial the func & map match-pair with an array in Patch 3/4; changes in V5: - move struct fd_array_map_ops* fd_ops to bpf_map; - move array perf event decrement refcnt function to map_free; - fix the NULL ptr of perf_event_get(); - move bpf_perf_event_read() to kernel/bpf/bpf_trace.c; - get rid of the remaining struct bpf_prog; - move the unnecessay cast on void *; changes in V4: - make the bpf_prog_array_map more generic; - fix the bug of event refcnt leak; - use more useful errno in bpf_perf_event_read(); changes in V3: - collapse V2 patches 1-3 into one; - drop the function map->ops->map_traverse_elem() and release the struct perf_event in map_free; - only allow to access bpf_perf_event_read() from programs; - update the perf_event_array_map elem via xchg(); - pass index directly to bpf_perf_event_read() instead of MAP_KEY; changes in V2: - put atomic_long_inc_not_zero() between fdget() and fdput(); - limit the event type to PERF_TYPE_RAW and PERF_TYPE_HARDWARE; - Only read the event counter on current CPU or on current process; - add new map type BPF_MAP_TYPE_PERF_EVENT_ARRAY to store the pointer to the struct perf_event; - according to the perf_event_map_fd and key, the function bpf_perf_event_read() can get the Hardware PMU counter value; Patch 5/5 is a simple example and shows how to use this new eBPF programs ability. The PMU counter data can be found in /sys/kernel/debug/tracing/trace(trace_pipe).(the cycles PMU value when 'kprobe/sys_write' sampling) $ cat /sys/kernel/debug/tracing/trace_pipe $ ./tracex6 ... syslog-ng-548 [000] d..1 76.905673: : CPU-0 681765271 syslog-ng-548 [000] d..1 76.905690: : CPU-0 681787855 syslog-ng-548 [000] d..1 76.905707: : CPU-0 681810504 syslog-ng-548 [000] d..1 76.905725: : CPU-0 681834771 syslog-ng-548 [000] d..1 76.905745: : CPU-0 681859519 syslog-ng-548 [000] d..1 76.905766: : CPU-0 681890419 syslog-ng-548 [000] d..1 76.905783: : CPU-0 681914045 syslog-ng-548 [000] d..1 76.905800: : CPU-0 681935950 syslog-ng-548 [000] d..1 76.905816: : CPU-0 681958299 ls-690 [005] d..1 82.241308: : CPU-5 3138451 sh-691 [004] d..1 82.244570: : CPU-4 7324988 <...>-699 [007] d..1 99.961387: : CPU-7 3194027 <...>-695 [003] d..1 99.961474: : CPU-3 288901 <...>-695 [003] d..1 99.961541: : CPU-3 383145 <...>-695 [003] d..1 99.961591: : CPU-3 450365 <...>-695 [003] d..1 99.961639: : CPU-3 515751 <...>-695 [003] d..1 99.961686: : CPU-3 579047 ... The detail of patches is as follow: Patch 1/5 add the necessary core perf APIs perf_event_attrs(), perf_event_get(),perf_event_read_local() when accessing events counters in eBPF programs Patch 2/5 rewrites part of the bpf_prog_array map code and make it more generic; Patch 3/5 introduces a new bpf map type. This map only stores the pointer to struct perf_event; Patch 4/5 implements function bpf_perf_event_read() that get the selected hardware PMU conuter; Patch 5/5 gives a simple example. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:50:06 -07:00
Kaixu Xia	47efb30274	samples/bpf: example of get selected PMU counter value This is a simple example and shows how to use the new ability to get the selected Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:50:06 -07:00
Kaixu Xia	35578d7984	bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter According to the perf_event_map_fd and index, the function bpf_perf_event_read() can convert the corresponding map value to the pointer to struct perf_event and return the Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:50:06 -07:00
Kaixu Xia	ea317b267e	bpf: Add new bpf map type to store the pointer to struct perf_event Introduce a new bpf map type 'BPF_MAP_TYPE_PERF_EVENT_ARRAY'. This map only stores the pointer to struct perf_event. The user space event FDs from perf_event_open() syscall are converted to the pointer to struct perf_event and stored in map. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:50:05 -07:00
Wang Nan	2a36f0b92e	bpf: Make the bpf_prog_array_map more generic All the map backends are of generic nature. In order to avoid adding much special code into the eBPF core, rewrite part of the bpf_prog_array map code and make it more generic. So the new perf_event_array map type can reuse most of code with bpf_prog_array map and add fewer lines of special code. Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:50:05 -07:00
Kaixu Xia	ffe8690c85	perf: add the necessary core perf APIs when accessing events counters in eBPF programs This patch add three core perf APIs: - perf_event_attrs(): export the struct perf_event_attr from struct perf_event; - perf_event_get(): get the struct perf_event from the given fd; - perf_event_read_local(): read the events counters active on the current CPU; These APIs are needed when accessing events counters in eBPF programs. The API perf_event_read_local() comes from Peter and I add the corresponding SOB. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:50:05 -07:00
David S. Miller	f1d5ca4344	Merge branch 'mv88e6xxx-switchdev-fdb' Vivien Didelot says: ==================== net: dsa: mv88e6xxx: support switchdev FDB objects This patchset refactors the DSA and mv88e6xxx code to use the switchdev FDB objects. The first two patches add minor but necessary changes to switchdev, the third one implements the switchdev glue in DSA for FDB routines, and the remaining ones refactor the FDB access functions in the mv88e6xxx code. Below is an usage example (ports 0-2 belongs to br0, ports 3-4 belongs to br1): # bridge fdb add 3c:97:0e:11:30:6e dev swp2 # bridge fdb add 3c:97:0e:11:40:78 dev swp3 # bridge fdb add 3c:97:0e:11:50:86 dev swp4 # bridge fdb del 3c:97:0e:11:40:78 dev swp3 # bridge fdb 01:00:5e:00:00:01 dev eth0 self permanent 01:00:5e:00:00:01 dev eth1 self permanent 00:50:d2:10:78:15 dev swp0 master br0 permanent 3c:97:0e:11:30:6e dev swp2 self static 00:50:d2:10:78:15 dev swp3 master br1 permanent 3c:97:0e:11:50:86 dev swp4 self static # cat /sys/kernel/debug/dsa0/atu # DB T/P Vec State Addr # 001 Port 004 e 3c:97:0e:11:30:6e # 004 Port 010 e 3c:97:0e:11:50:86 For the 88E6xxx switches, FIDs 1 to num_ports will be reserved for non-bridged ports and bridge groups, and the remaining will be later used by VLANs. This change is necessary to welcome the support for hardware VLANs (which will follow soon). Changes in v2: - remove ndo_bridge_{get,set,del}link from switchdev/DSA glue code - use ether_addr_copy instead of memcpy for MAC addresses - constify MAC address in port_fdb_{add,del} - split the mv88e6xxx code refactoring into several patches ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:10 -07:00
Vivien Didelot	878205101f	net: dsa: mv88e6xxx: rework FDB add/del operations Add a low level function for the ATU Load operation, and provide FDB add and delete wrappers functions. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:09 -07:00
Vivien Didelot	6630e23617	net: dsa: mv88e6xxx: rework FDB getnext operation This commit adds a low level _mv88e6xxx_atu_getnext function and helpers to rewrite the mv88e6xxx_port_fdb_getnext operation. A mv88e6xxx_atu_entry structure is added for convenient access to the hardware, and GLOBAL_ATU_FID is defined instead of the raw 0x01 value. The previous implementation did not handle the eventual trunk mapping. If the related bit is set, then the ATU data register would contain the trunk ID, and not the port vector. Check this in the FDB getnext operation and do not handle it (yet). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:09 -07:00
Vivien Didelot	395059fb92	net: dsa: mv88e6xxx: rename ATU MAC accessors Rename the __mv88e6xxx_{read,write}_addr functions to more explicit _mv88e6xxx_atu_mac_{read,write} functions, which also respect the single underscore convention used in the file (meaning SMI lock must be held). In the meantime, define their MAC address parameters as an array of ETH_ALEN bytes instead of a char pointer. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:09 -07:00
Vivien Didelot	368b1d9c10	net: dsa: mv88e6xxx: extend fid mask The driver currently manages one FID per port (or bridge group), with a mask of DSA_MAX_PORTS bits, where 0 means that the FID is in use. The Marvell 88E6xxx switches support up to 4094 FIDs (from 1 to 0xfff; FID 0 means that multiple address databases are not being used). This patch changes the fid_mask for an fid_bitmap of 4096 bits. >From now on, FIDs 1 to num_ports are reserved for non-bridged ports and bridge groups (a bridge group gets the FID of its first member). The remaining bits will be reserved for VLAN entries. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:09 -07:00
Vivien Didelot	55045ddded	net: dsa: add support for switchdev FDB objects Remove the fdb_{add,del,getnext} function pointer in favor of new port_fdb_{add,del,getnext}. Implement the switchdev_port_obj_{add,del,dump} functions in DSA to support the SWITCHDEV_OBJ_PORT_FDB objects. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:09 -07:00
Vivien Didelot	890248261a	net: switchdev: support static FDB addresses This patch adds a is_static boolean to the switchdev_obj_fdb structure, in order to set the ndm_state to either NUD_NOARP or NUD_REACHABLE. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:09 -07:00
Vivien Didelot	1525c386a1	net: switchdev: change fdb addr for a byte array The address in the switchdev_obj_fdb structure is currently represented as a pointer. Replacing it for a 6-byte array allows switchdev to carry addresses directly read from hardware registers, not stored by the switch chip driver (as in Rocker). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:08 -07:00
Masanari Iida	4933d85c51	net:wimax: Fix doucble word "the the" in networking.xml This patch fix a double word "the the" in Documentation/DocBook/networking.xml and Documentation/DocBook/networking/API-Wimax-report-rfkill-sw.html. These files are generated from comment in source, so I had to fix the typo in net/wimax/io-rfkill.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:43:52 -07:00
Tom Herbert	10e4ea7514	net: Fix race condition in store_rps_map There is a race condition in store_rps_map that allows jump label count in rps_needed to go below zero. This can happen when concurrently attempting to set and a clear map. Scenario: 1. rps_needed count is zero 2. New map is assigned by setting thread, but rps_needed count _not_ yet incremented (rps_needed count still zero) 2. Map is cleared by second thread, old_map set to that just assigned 3. Second thread performs static_key_slow_dec, rps_needed count now goes negative Fix is to increment or decrement rps_needed under the spinlock. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 15:56:56 -07:00
Wenyu Zhang	e05176a328	openvswitch: Make 100 percents packets sampled when sampling rate is 1. When sampling rate is 1, the sampling probability is UINT32_MAX. The packet should be sampled even the prandom32() generate the number of UINT32_MAX. And none packet need be sampled when the probability is 0. Signed-off-by: Wenyu Zhang <wenyuz@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 12:00:38 -07:00
Alexei Starovoitov	da8b43c0e1	vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA, so combine them into single IFLA_VXLAN_COLLECT_METADATA flag. 'flowbased' doesn't convey real meaning of the vxlan tunnel mode. This mode can be used by routing, tc+bpf and ovs. Only ovs is strictly flow based, so 'collect metadata' is a better name for this tunnel mode. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 11:46:34 -07:00
David S. Miller	e03c512841	Merge branch 'rds-tcp-netns' Sowmini Varadhan says: ==================== RDS-TCP: Network namespace support This patch series contains the set of changes to correctly set up the infra for PF_RDS sockets that use TCP as the transport in multiple network namespaces. Patch 1 in the series is the minimal set of changes to allow a single instance of RDS-TCP to run in any (i.e init_net or other) net namespace. The changes in this patch set ensure that the execution of 'modprobe [-r] rds_tcp' sets up the kernel TCP sockets relative to the current netns, so that RDS applications can send/recv packets from that netns, and the netns can later be deleted cleanly. Patch 2 of the series further allows multiple RDS-TCP instances, one per network namespace. The changes in this patch allows dynamic creation/tear-down of RDS-TCP client and server sockets across all current and future namespaces. v2 changes from RFC sent out earlier: David Ahern comments in patch 1, net_device notifier in patch 2, patch 3 broken off and submitted separately. v3: Cong Wang review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 11:29:58 -07:00
Sowmini Varadhan	467fa15356	RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns. Register pernet subsys init/stop functions that will set up and tear down per-net RDS-TCP listen endpoints. Unregister pernet subusys functions on 'modprobe -r' to clean up these end points. Enable keepalive on both accept and connect socket endpoints. The keepalive timer expiration will ensure that client socket endpoints will be removed as appropriate from the netns when an interface is removed from a namespace. Register a device notifier callback that will clean up all sockets (and thus avoid the need to wait for keepalive timeout) when the loopback device is unregistered from the netns indicating that the netns is getting deleted. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 11:29:58 -07:00
Sowmini Varadhan	d5a8ac28a7	RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net Open the sockets calling sock_create_kern() with the correct struct net pointer, and use that struct net pointer when verifying the address passed to rds_bind(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 11:29:57 -07:00
David S. Miller	1ebd08a7e5	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-08-05 This series contains updates to i40e, i40evf and e1000e. Anjali adds support for x772 devices to i40e and i40evf. With the added support, x772 supports offloading of the outer UDP transmit and receive checksum for tunneled packets. Also supports evicting ATR filters in the hardware, so update the driver with this new feature set. Raanan provides several fixes for e1000e, first rectifies the Energy Efficient Ethernet in Sx code so that it only applies to parts that actually support EEE in Sx. Fix whitespace and moved ICH8 related define to the proper context. Fixed the ASPM locking which was reported by Bjorn Helgaas. Fix a workaround implementation for systime which could experience a large non-linear increment of the systime value when checking for overflow. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 00:19:09 -07:00
Jason A. Donenfeld	d92cff89a0	net_dbg_ratelimited: turn into no-op when !DEBUG The pr_debug family of functions turns into a no-op when -DDEBUG is not specified, opting instead to call "no_printk", which gets compiled to a no-op (but retains gcc's nice warnings about printf-style arguments). The problem with net_dbg_ratelimited is that it is defined to be a variant of net_ratelimited_function, which expands to essentially: if (net_ratelimit()) pr_debug(fmt, ...); When DEBUG is not defined, then this becomes, if (net_ratelimit()) ; This seems benign, except it isn't. Firstly, there's the obvious overhead of calling net_ratelimit needlessly, which does quite some book keeping for the rate limiting. Given that the pr_debug and net_dbg_ratelimited family of functions are sprinkled liberally through performance critical code, with developers assuming they'll be compiled out to a no-op most of the time, we certainly do not want this needless book keeping. Secondly, and most visibly, even though no debug message is printed when DEBUG is not defined, if there is a flood of invocations, dmesg winds up peppered with messages such as "net_ratelimit: 320 callbacks suppressed". This is because our aforementioned net_ratelimit() function actually prints this text in some circumstances. It's especially odd to see this when there isn't any other accompanying debug message. So, in sum, it doesn't make sense to have this function's current behavior, and instead it should match what every other debug family of functions in the kernel does with !DEBUG -- nothing. This patch replaces calls to net_dbg_ratelimited when !DEBUG with no_printk, keeping with the idiom of all the other debug print helpers. Also, though not strictly neccessary, it guards the call with an if (0) so that all evaluation of any arguments are sure to be compiled out. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 23:51:30 -07:00
Roopa Prabhu	3dcb615e68	af_mpls: add null dev check in find_outdev This patch adds null dev check for the 'cfg->rc_via_table == NEIGH_LINK_TABLE or dev_get_by_index() failed' case Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 22:03:58 -07:00
David S. Miller	3c62181840	Merge branch 'test-bpf-next' Nicolas Schichan says: ==================== test_bpf improvements Please find below the patch series with my latest changes to test_bpf. The first patch checks for unexpected NULL generated skbs before running the filter. The second patch adds the possibility for tests to generate fragmented skbs. The third patch tests LD_ABS and LD_IND on fragmented skbs. The fourth patch adds the possibility to restrict the tests being run by specifying the name/id/range of the test(s) to run via module parameters. The fifth patch tests LD_ABS and LD_IND on non fragmented skbs with various sizes and alignments. The sixth and final patch checks that the interpreter or JIT correctly resets A and X to 0. This serie is against today's net-next tree. Changes in V2: * move declaration of 'ptr' in if() block in patch 2/6. * fix various typos in patch 4/6 * rework default init of test_range array and cleanup exclude_test() return condition in patch 4/6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 22:02:32 -07:00
Nicolas Schichan	86bf1721b2	test_bpf: add tests checking that JIT/interpreter sets A and X to 0. It is mandatory for the JIT or interpreter to reset the A and X registers to 0 before running the filter. Check that it is the case on various ALU and JMP instructions. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 22:02:32 -07:00
Nicolas Schichan	08fcb08fc0	test_bpf: add more tests for LD_ABS and LD_IND. This exerces the LD_ABS and LD_IND instructions for various sizes and alignments. This also checks that X when used as an offset to a BPF_IND instruction first in a filter is correctly set to 0. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 22:02:32 -07:00
Nicolas Schichan	d2648d4e26	test_bpf: add module parameters to filter the tests to run. When developping on the interpreter or a particular JIT, it can be interesting to restrict the tests list to a specific test or a particular range of tests. This patch adds the following module parameters to the test_bpf module: * test_name=<string>: only the specified named test will be run. * test_id=<number>: only the test with the specified id will be run (see the output of test_bpf without parameters to get the test id). * test_range=<number>,<number>: only the tests within IDs in the specified id range are run (see the output of test_bpf without parameters to get the test ids). Any invalid range, test id or test name will result in -EINVAL being returned and no tests being run. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 22:02:32 -07:00
Nicolas Schichan	2cf1ad7593	test_bpf: test LD_ABS and LD_IND instructions on fragmented skbs. These new tests exercise various load sizes and offsets crossing the head/fragment boundary. Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 22:02:32 -07:00

1 2 3 4 5 ...

535074 Commits