linux

mirror of https://github.com/torvalds/linux.git synced 2024-12-19 01:23:20 +00:00

Author	SHA1	Message	Date
Gregory CLEMENT	f34dacccb4	net: mvneta: Only disable mvneta_bm for 64-bits Actually only the mvneta_bm support is not 64-bits compatible. The mvneta code itself can run on 64-bits architecture. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:52:01 -05:00
Marcin Wojtas	8d5047cf9c	net: mvneta: Convert to be 64 bits compatible Prepare the mvneta driver in order to be usable on the 64 bits platform such as the Armada 3700. [gregory.clement@free-electrons.com]: this patch was extract from a larger one to ease review and maintenance. Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:52:00 -05:00
Gregory CLEMENT	f88bee1c4b	net: mvneta: Use cacheable memory to store the rx buffer virtual address Until now the virtual address of the received buffer were stored in the cookie field of the rx descriptor. However, this field is 32-bits only which prevents to use the driver on a 64-bits architecture. With this patch the virtual address is stored in an array not shared with the hardware (no more need to use the DMA API). Thanks to this, it is possible to use cache contrary to the access of the rx descriptor member. The change is done in the swbm path only because the hwbm uses the cookie field, this also means that currently the hwbm is not usable in 64-bits. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Jisheng Zhang <jszhang@marvell.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:52:00 -05:00
Gregory CLEMENT	e9f6499965	net: mvneta: Do not allocate buffer in rxq init with HWBM For HWBM all buffers are allocated in mvneta_bm_construct() and in runtime they are put into descriptors by hardware. There is no need to fill them at this point. Suggested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:52:00 -05:00
Gregory CLEMENT	ac83b7ddf2	net: mvneta: Optimize rx path for small frame For small frame reuse the phys_addr variable instead of accessing the uncacheable value in the rx descriptor. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:52:00 -05:00
Linus Torvalds	ed8d747fd2	Fix up a couple of field names in the CREDITS file Ozgur Karatas reported that the very first entry in the CREDITS file had the wrong tag for name (M: instead of N: - it happened when moving the entry from the MAINTAINERS file, where 'M:' stands for "Maintainer"). And when I went looking, I found a couple of other cases of wrong tagging too. Reported-by: Ozgur Karatas <mueddib@yandex.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-02 10:48:50 -08:00
David S. Miller	b5b5eca9aa	Merge branch 'bpf-support-for-sockets' David Ahern says: ==================== net: Add bpf support for sockets The recently added VRF support in Linux leverages the bind-to-device API for programs to specify an L3 domain for a socket. While SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable program has support for it. Even for those programs that do support it, the API requires processes to be started as root (CAP_NET_RAW) which is not desirable from a general security perspective. This patch set leverages Daniel Mack's work to attach bpf programs to a cgroup to provide a capability to set sk_bound_dev_if for all AF_INET{6} sockets opened by a process in a cgroup when the sockets are allocated. For example: 1. configure vrf (e.g., using ifupdown2) auto eth0 iface eth0 inet dhcp vrf mgmt auto mgmt iface mgmt vrf-table auto 2. configure cgroup mount -t cgroup2 none /tmp/cgroupv2 mkdir /tmp/cgroupv2/mgmt test_cgrp2_sock /tmp/cgroupv2/mgmt 15 3. set shell into cgroup (e.g., can be done at login using pam) echo $$ >> /tmp/cgroupv2/mgmt/cgroup.procs At this point all commands run in the shell (e.g, apt) have sockets automatically bound to the VRF (see output of ss -ap 'dev == <vrf>'), including processes not running as root. This capability enables running any program in a VRF context and is key to deploying Management VRF, a fundamental configuration for networking gear, with any Linux OS installation. This patchset also exports the socket family, type and protocol as read-only allowing bpf filters to deny a process in a cgroup the ability to open specific types of AF_INET or AF_INET6 sockets. v7 - comments from Alexei v6 - add export of socket family, type and protocol ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:46:21 -05:00
David Ahern	554ae6e792	samples/bpf: add userspace example for prohibiting sockets Add examples preventing a process in a cgroup from opening a socket based family, protocol and type. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:46:09 -05:00
David Ahern	4f2e7ae56e	samples/bpf: Update bpf loader for cgroup section names Add support for section names starting with cgroup/skb and cgroup/sock. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:46:09 -05:00
David Ahern	aa4c1037a3	bpf: Add support for reading socket family, type, protocol Add socket family, type and protocol to bpf_sock allowing bpf programs read-only access. Add __sk_flags_offset[0] to struct sock before the bitfield to programmtically determine the offset of the unsigned int containing protocol and type. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:46:09 -05:00
David Ahern	ad2805dc79	samples: bpf: add userspace example for modifying sk_bound_dev_if Add a simple program to demonstrate the ability to attach a bpf program to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they are created. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:46:08 -05:00
David Ahern	6102365876	bpf: Add new cgroup attach type to enable sock modifications Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run any time a process in the cgroup opens an AF_INET or AF_INET6 socket. Currently only sk_bound_dev_if is exported to userspace for modification by a bpf program. This allows a cgroup to be configured such that AF_INET{6} sockets opened by processes are automatically bound to a specific device. In turn, this enables the running of programs that do not support SO_BINDTODEVICE in a specific VRF context / L3 domain. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:46:08 -05:00
David Ahern	b2cd12574a	bpf: Refactor cgroups code in prep for new type Code move and rename only; no functional change intended. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:44:56 -05:00
Daniele Palmas	9bd813da24	NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040 This patch adds support for PID 0x1040 of Telit LE922A. The qmi adapter requires to have DTR set for proper working, so QMI_WWAN_QUIRK_DTR has been enabled. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:42:12 -05:00
Kristian Evensen	d5c83d0d1d	cdc_ether: Fix handling connection notification Commit `bfe9b9d2df` ("cdc_ether: Improve ZTE MF823/831/910 handling") introduced a work-around in usbnet_cdc_status() for devices that exported cdc carrier on twice on connect. Before the commit, this behavior caused the link state to be incorrect. It was assumed that all CDC Ethernet devices would either export this behavior, or send one off and then one on notification (which seems to be the default behavior). Unfortunately, it turns out multiple devices sends a connection notification multiple times per second (via an interrupt), even when connection state does not change. This has been observed with several different USB LAN dongles (at least), for example 13b1:0041 (Linksys). After `bfe9b9d2df`, the link state has been set as down and then up for each notification. This has caused a flood of Netlink NEWLINK messages and syslog to be flooded with messages similar to: cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped This commit fixes the behavior by reverting usbnet_cdc_status() to how it was before `bfe9b9d2df`. The work-around has been moved to a separate status-function which is only called when a known, affect device is detected. v1->v2: * Do not open-code netif_carrier_ok() (thanks Henning Schild). * Call netif_carrier_off() instead of usb_link_change(). This prevents calling schedule_work() twice without giving the work queue a chance to be processed (thanks Bjørn Mork). Fixes: `bfe9b9d2df` ("cdc_ether: Improve ZTE MF823/831/910 handling") Reported-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:40:09 -05:00
Artem Savkov	6b6ebb6b01	ip6_offload: check segs for NULL in ipv6_gso_segment. segs needs to be checked for being NULL in ipv6_gso_segment() before calling skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference: [ 97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc [ 97.819112] IP: [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 97.825214] PGD 0 [ 97.827047] [ 97.828540] Oops: 0000 [#1] SMP [ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod [ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1 [ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010 [ 97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000 [ 97.947720] RIP: 0010:[<ffffffff816e52f9>] [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 97.956251] RSP: 0018:ffff88012fc43a10 EFLAGS: 00010207 [ 97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594 [ 97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000 [ 97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000 [ 97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3 [ 97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000 [ 97.997198] FS: 0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000 [ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0 [ 98.018149] Stack: [ 98.020157] 00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e [ 98.027584] 0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3 [ 98.035010] ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98 [ 98.042437] Call Trace: [ 98.044879] <IRQ> [ 98.046803] [<ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3] [ 98.053156] [<ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130 [ 98.059158] [<ffffffff815eefd3>] __skb_gso_segment+0x73/0x110 [ 98.064985] [<ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0 [ 98.070899] [<ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70 [ 98.077073] [<ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0 [ 98.082726] [<ffffffff815efd86>] __dev_queue_xmit+0x486/0x690 [ 98.088554] [<ffffffff8135c135>] ? cpumask_next_and+0x35/0x50 [ 98.094380] [<ffffffff815effa0>] dev_queue_xmit+0x10/0x20 [ 98.099863] [<ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge] [ 98.106907] [<ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge] [ 98.113430] [<ffffffff81627cf2>] ? nf_iterate+0x52/0x60 [ 98.118735] [<ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0 [ 98.124216] [<ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge] [ 98.130480] [<ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge] [ 98.137785] [<ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge] [ 98.143701] [<ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge] [ 98.150834] [<ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge] [ 98.157355] [<ffffffff8102fb89>] ? sched_clock+0x9/0x10 [ 98.162662] [<ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0 [ 98.168403] [<ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20 [ 98.174926] [<ffffffff813659f9>] ? timerqueue_add+0x59/0xb0 [ 98.180580] [<ffffffff815ed548>] __netif_receive_skb+0x18/0x60 [ 98.186494] [<ffffffff815ee625>] process_backlog+0x95/0x140 [ 98.192145] [<ffffffff815edccd>] net_rx_action+0x16d/0x380 [ 98.197713] [<ffffffff8170cff1>] __do_softirq+0xd1/0x283 [ 98.203106] [<ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30 [ 98.209107] <EOI> [ 98.211029] [<ffffffff8108a5c0>] do_softirq+0x50/0x60 [ 98.216166] [<ffffffff815ec853>] netif_rx_ni+0x33/0x80 [ 98.221386] [<ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun] [ 98.227388] [<ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun] [ 98.233129] [<ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net] [ 98.239392] [<ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net] [ 98.245916] [<ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost] [ 98.251919] [<ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost] [ 98.258440] [<ffffffff81003a47>] ? do_syscall_64+0x67/0x180 [ 98.264094] [<ffffffff810a44d9>] kthread+0xd9/0xf0 [ 98.268965] [<ffffffff810a4400>] ? kthread_park+0x60/0x60 [ 98.274444] [<ffffffff8170a4d5>] ret_from_fork+0x25/0x30 [ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66 [ 98.299425] RIP [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 98.305612] RSP <ffff88012fc43a10> [ 98.309094] CR2: 00000000000000cc [ 98.312406] ---[ end trace 726a2c7a2d2d78d0 ]--- Signed-off-by: Artem Savkov <asavkov@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:34:58 -05:00
Eric Dumazet	7f7bf1606f	mlx4: fix use-after-free in mlx4_en_fold_software_stats() My recent commit to get more precise rx/tx counters in ndo_get_stats64() can lead to crashes at device dismantle, as Jesper found out. We must prevent mlx4_en_fold_software_stats() trying to access tx/rx rings if they are deleted. Fix this by adding a test against priv->port_up in mlx4_en_fold_software_stats() Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port() allows us to eventually broadcast the latest/current counters to rtnetlink monitors. Fixes: `40931b8511` ("mlx4: give precise rx/tx bytes/packets counters") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:33:32 -05:00
Sunil Goutham	bd3ad7d3a1	net: thunderx: Fix transmit queue timeout issue Transmit queue timeout issue is seen in two cases - Due to a race condition btw setting stop_queue at xmit() and checking for stopped_queue in NAPI poll routine, at times transmission from a SQ comes to a halt. This is fixed by using barriers and also added a check for SQ free descriptors, incase SQ is stopped and there are only CQE_RX i.e no CQE_TX. - Contrary to an assumption, a HW errata where HW doesn't stop transmission even though there are not enough CQEs available for a CQE_TX is not fixed in T88 pass 2.x. This results in a Qset error with 'CQ_WR_FULL' stalling transmission. This is fixed by adjusting RXQ's RED levels for CQ level such that there is always enough space left for CQE_TXs. Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:32:59 -05:00
Sowmini Varadhan	721c7443dc	RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net If some error is encountered in rds_tcp_init_net, make sure to unregister_netdevice_notifier(), else we could trigger a panic later on, when the modprobe from a netns fails. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:29:26 -05:00
David S. Miller	9aac3c1879	Merge branch 'offloading-tc-rules-hw' Hadar Hen Zion says: ==================== Offloading tc rules using underline Hardware device This series adds flower classifier support in offloading tc rules when the Software ingress device is different from the Hardware ingress device, such as when dealing with IP tunnels The first two patches are a small fixes to flower, checking the skip_hw flag wasn't set before calling the Hardware offloading functions which will try to offload the rule. The next two patches are infrastructure patches, a preparation for the fourth patch which is adding support in flower to offload rules when the ingress device is not a Hardware device and therefore can't offload. In this case ndo_setup_tc is called with the mirred (egress) device. The last three patchs are adding mlx5e support to offload rules using the new "egress_device" flag. Thanks, Hadar Changes from v0: - check if CONFIG_NET_CLS_ACT is defined befor calling tc_action_ops get_dev() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:38 -05:00
Hadar Hen Zion	ebe06875ff	net/mlx5e: Support adding ingress tc rule when egress device flag is set When ndo_setup_tc is called with an egress_dev flag set, it means that the ndo call was executed on the mirred action (egress) device and not on the ingress device. In order to support this kind of ndo_setup_tc call, and insert the correct decap rule to the hardware, the uplink device on the same eswitch should be found. Currently, we use this resolution between the mirred device and the uplink on the same eswitch to offload vxlan shared device decap rules. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:38 -05:00
Hadar Hen Zion	726293f1f8	net/mlx5e: Save the represntor netdevice as part of the representor Replace the representor private data to a net_device pointer holding the representor netdevice, instead of void pointer holding mlx5e_priv. It will be used by a new eswitch service function, returning the uplink representor netdevice. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:37 -05:00
Hadar Hen Zion	718f13e72b	net/mlx5e: Bring back representor's ndos that were accidentally removed The VF Representor udp tunnel ndo entries were removed by mistake, return them. Fixes: `370bad0f9a` ('net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:37 -05:00
Hadar Hen Zion	7091d8c705	net/sched: cls_flower: Add offload support using egress Hardware device In order to support hardware offloading when the device given by the tc rule is different from the Hardware underline device, extract the mirred (egress) device from the tc action when a filter is added, using the new tc_action_ops, get_dev(). Flower caches the information about the mirred device and use it for calling ndo_setup_tc in filter change, update stats and delete. Calling ndo_setup_tc of the mirred (egress) device instead of the ingress device will allow a resolution between the software ingress device and the underline hardware device. The resolution will take place inside the offloading driver using 'egress_device' flag added to tc_to_netdev struct which is provided to the offloading driver. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:37 -05:00
Hadar Hen Zion	255cb30425	net/sched: act_mirred: Add new tc_action_ops get_dev() Adding support to a new tc_action_ops. get_dev is a general option which allows to get the underline device when trying to offload a tc rule. In case of mirred action the returned device is the mirred (egress) device. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:36 -05:00
Hadar Hen Zion	3036dab670	net/sched: cls_flower: Provide a filter to replace/destroy hardware filter functions Instead of providing many arguments to fl_hw_{replace/destroy}_filter functions, just provide cls_fl_filter struct that includes all the relevant args. This patches doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:36 -05:00
Hadar Hen Zion	796852197c	net/sched: cls_flower: Try to offload only if skip_hw flag isn't set Check skip_hw flag isn't set before calling fl_hw_{replace/destroy}_filter and fl_hw_update_stats functions. Replace the call to tc_should_offload with tc_can_offload. tc_can_offload only checks if the device supports offloading, the check for skip_hw flag is done earlier in the flow. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:36 -05:00
Hadar Hen Zion	55330f0596	net/sched: Add separate check for skip_hw flag Creating a difference between two possible cases: 1. Not offloading tc rule since the user sets 'skip_hw' flag. 2. Not offloading tc rule since the device doesn't support offloading. This patch doesn't add any new functionality. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 13:28:36 -05:00
Florian Westphal	25429d7b7d	tcp: allow to turn tcp timestamp randomization off Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple flows, we can deduct the relative qdisc delays (think of fq pacing). This should work even if we have one flow per remote peer." Having random per flow (or host) offsets doesn't allow that anymore so add a way to turn this off. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:49:59 -05:00
Florian Westphal	95a22caee3	tcp: randomize tcp timestamp offsets for each connection jiffies based timestamps allow for easy inference of number of devices behind NAT translators and also makes tracking of hosts simpler. commit `ceaa1fef65` ("tcp: adding a per-socket timestamp offset") added the main infrastructure that is needed for per-connection ts randomization, in particular writing/reading the on-wire tcp header format takes the offset into account so rest of stack can use normal tcp_time_stamp (jiffies). So only two items are left: - add a tsoffset for request sockets - extend the tcp isn generator to also return another 32bit number in addition to the ISN. Re-use of ISN generator also means timestamps are still monotonically increasing for same connection quadruple, i.e. PAWS will still work. Includes fixes from Eric Dumazet. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:49:59 -05:00
David S. Miller	7df5358d47	Merge branch 'qed-iscsi' Manish Rangankar says: ==================== Add QLogic FastLinQ iSCSI (qedi) driver. This series introduces hardware offload iSCSI initiator driver for the 41000 Series Converged Network Adapters (579xx chip) by Qlogic. The overall driver design includes a common module ('qed') and protocol specific dependent modules ('qedi' for iSCSI). This is an open iSCSI driver, modifications to open iSCSI user components 'iscsid', 'iscsiuio', etc. are required for the solution to work. The user space changes are also in the process of being submitted. https://groups.google.com/forum/#!forum/open-iscsi The 'qed' common module, under drivers/net/ethernet/qlogic/qed/, is enhanced with functionality required for the iSCSI support. This series is based on: net tree base: Merge of net and net-next as of 11/29/2016 Changes from RFC v2: 1. qedi patches are squashed into single patch to prevent krobot warning. 2. Fixed 'hw_p_cpuq' incompatible pointer type. 3. Fixed sparse incompatible types in comparison expression. 4. Misc fixes with latest 'checkpatch --strict' option. 5. Remove int_mode option from MODULE_PARAM. 6. Prefix all MODULE_PARAM params with qedi_*. 7. Use CONFIG_QED_ISCSI instead of CONFIG_QEDI 8. Added bad task mem access fix. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:44:38 -05:00
Yuval Mintz	1d6cff4fca	qed: Add iSCSI out of order packet handling. This patch adds out of order packet handling for hardware offloaded iSCSI. Out of order packet handling requires driver buffer allocation and assistance. Signed-off-by: Arun Easi <arun.easi@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:44:38 -05:00
Yuval Mintz	fc831825f9	qed: Add support for hardware offloaded iSCSI. This adds the backbone required for the various HW initalizations which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ 4xxxx line of adapters - FW notification, resource initializations, etc. Signed-off-by: Arun Easi <arun.easi@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:44:37 -05:00
Eli Cooper	80d1106aea	Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()" This reverts commit `ae148b0858` ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"). skb->protocol is now set in __ip_local_out() and __ip6_local_out() before dst_output() is called. It is no longer necessary to do it for each tunnel. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:34:22 -05:00
Eli Cooper	b4e479a96f	ipv6: Set skb->protocol properly for local output When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called, fixing a bug where GSO packets sent through an ipip6 tunnel are dropped when xfrm is involved. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:34:22 -05:00
Eli Cooper	f418043910	ipv4: Set skb->protocol properly for local output When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IP before dst_output() is called, fixing a bug where GSO packets sent through a sit tunnel are dropped when xfrm is involved. Cc: stable@vger.kernel.org Signed-off-by: Eli Cooper <elicooper@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:34:22 -05:00
Philip Pettersson	84ac726023	packet: fix race condition in packet_set_ring When packet_set_ring creates a ring buffer it will initialize a struct timer_list if the packet version is TPACKET_V3. This value can then be raced by a different thread calling setsockopt to set the version to TPACKET_V1 before packet_set_ring has finished. This leads to a use-after-free on a function pointer in the struct timer_list when the socket is closed as the previously initialized timer will not be deleted. The bug is fixed by taking lock_sock(sk) in packet_setsockopt when changing the packet version while also taking the lock at the start of packet_set_ring. Fixes: `f6fb8f100b` ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Philip Pettersson <philip.pettersson@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:16:49 -05:00
Linus Torvalds	4aa675aaf2	KVM fixes for v4.9-rc8 All architectures avoid memory corruption in an error path. ARM prevents bogus acknowledgement of interrupts. -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJYQaDPAAoJEED/6hsPKofoq8gH/iJR/fcYg1ovboEaDIDdm/PI XzbNgrYZID8Nk04chU2Dh1eD8k3DG64txuOEs+jf3XBPYNnU8TAlw6qHVMG6kzGJ zA0CLGgH62DKXLuvnDJ75mpiJmzioGd4hdk0G8CIb9W2ySSUgrcmXMI3AoVP44lY LKTCITKq6ePfQ7AIbd3a6YXaR0ZTNP52e1Y4vx+Hsl9WcrMUGKyCmd9IcDI9DrZr ahMn+wx3Wzvb/NzH25OYkMAC9X5C7+b6O0IZm0ie8F8iU+JLlgGNiAHxQ5yAbu28 hzINTTUnwIxgoi/ZN0M8i+fo0RLKq5OCzPMTnUUgdloBL786XREW2t3Ca0kqavg= =XdK2 -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "All architectures avoid memory corruption in an error path. ARM prevents bogus acknowledgement of interrupts" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: use after free in kvm_ioctl_create_device() KVM: arm/arm64: vgic: Don't notify EOI for non-SPIs	2016-12-02 09:15:26 -08:00
Linus Torvalds	3e52d063d8	Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "Here is the revert for the regression of the i2c-octeon driver I mentioned last time. I wished for a bit more feedback, but all people working actively on it are in need of this patch, so here it goes" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: octeon: thunderx: Limit register access retries"	2016-12-02 09:12:44 -08:00
Lino Sanfilippo	2219d5ed77	net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler The driver already uses its private lock for synchronization between xmit and xmit completion handler making the additional use of the xmit_lock unnecessary. Furthermore the driver does not set NETIF_F_LLTX resulting in xmit to be called with the xmit_lock held and then taking the private lock while xmit completion handler does the reverse, first take the private lock, then the xmit_lock. Fix these issues by not taking the xmit_lock in the tx completion handler. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:10:25 -05:00
Lino Sanfilippo	151a14db22	net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers An explicit dma sync for device directly after mapping as well as an explicit dma sync for cpu directly before unmapping is unnecessary and costly on the hotpath. So remove these calls. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:10:24 -05:00
Rasmus Villemoes	b14945ac3e	net: atarilance: use %8ph for printing hex string This is already using the %pM printf extension; might as well also use %ph to make the code smaller. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 12:03:35 -05:00
Arnd Bergmann	d709b2a186	net/mlx5e: skip loopback selftest with !CONFIG_INET When CONFIG_INET is disabled, the new selftest results in a link error: drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: In function `mlx5e_test_loopback': en_selftest.c:(.text.mlx5e_test_loopback+0x2ec): undefined reference to `ip_send_check' en_selftest.c:(.text.mlx5e_test_loopback+0x34c): undefined reference to `udp4_hwcsum' This hides the specific test in that configuration. Fixes: `0952da791c` ("net/mlx5e: Add support for loopback selftest") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 11:55:57 -05:00
Arnd Bergmann	8ab2ae655b	default exported asm symbols to zero With binutils-2.26 and before, a weak missing symbol was kept during the final link, and a missing CRC for an export would lead to that CRC being treated as zero implicitly. With binutils-2.27, the crc symbol gets dropped, and any module trying to use it will fail to load. This sets the weak CRC symbol to zero explicitly, making it defined in vmlinux, which in turn lets us load the modules referring to that CRC. The comment above the __CRC_SYMBOL macro suggests that this was always the intention, although it also seems that all symbols defined in C have a correct CRC these days, and only the exports that are now done in assembly need this. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Adam Borowski <kilobyte@angband.pl> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-02 08:51:22 -08:00
Sudeep Holla	909e481e24	arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions The core and the cluster sleep state entry latencies can't be same as cluster sleep involves more work compared to core level e.g. shared cache maintenance. Experiments have shown on an average about 100us more latency for the cluster sleep state compared to the core level sleep. This patch fixes the entry latency for the cluster sleep state. Fixes: `28e10a8f3a` ("arm64: dts: juno: Add idle-states to device tree") Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: "Jon Medhurst (Tixy)" <tixy@linaro.org> Reviewed-by: Liviu Dudau <Liviu.Dudau@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2016-12-02 17:28:17 +01:00
Daniel Borkmann	366cbf2f46	bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to caller After `326fe02d1e` ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers need to hold rcu_read_lock() already to make sure BPF program doesn't get released in the background. Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading. Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping in XDP supported drivers and to keep the typecheck on the context intact. For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock() out of the helper. When the driver gets atomic replace support, this will move to call-sites eventually. mlx5 needs actual fixing as it has the same issue as described already in `326fe02d1e` ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"), that is, we're under RCU bh at this time, BPF programs are released via call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue reset. Fixes: `86994156c7` ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 11:06:24 -05:00
Soheil Hassas Yeganeh	83a1a1a70e	sock: reset sk_err for ICMP packets read from error queue Only when ICMP packets are enqueued onto the error queue, sk_err is also set. Before `f5f99309fa` (sock: do not set sk_err in sock_dequeue_err_skb), a subsequent error queue read would set sk_err to the next error on the queue, or 0 if empty. As no error types other than ICMP set this field, sk_err should not be modified upon dequeuing them. Only for ICMP errors, reset the (racy) sk_err. Some applications, like traceroute, rely on it and go into a futile busy POLLERR loop otherwise. In principle, sk_err has to be set while an ICMP error is queued. Testing is_icmp_err_skb(skb_next) approximates this without requiring a full queue walk. Applications that receive both ICMP and other errors cannot rely on this legacy behavior, as other errors do not set sk_err in the first place. Fixes: `f5f99309fa` (sock: do not set sk_err in sock_dequeue_err_skb) Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 10:55:39 -05:00
David S. Miller	f577e22c73	Merge branch 'lwt-bpf' Thomas Graf says: ==================== bpf: BPF for lightweight tunnel encapsulation This series implements BPF program invocation from dst entries via the lightweight tunnels infrastructure. The BPF program can be attached to lwtunnel_input(), lwtunnel_output() or lwtunnel_xmit() and see an L3 skb as context. Programs attached to input and output are read-only. Programs attached to lwtunnel_xmit() can modify and redirect, push headers and redirect packets. The facility can be used to: - Collect statistics and generate sampling data for a subset of traffic based on the dst utilized by the packet thus allowing to extend the existing realms. - Apply additional per route/dst filters to prohibit certain outgoing or incoming packets based on BPF filters. In particular, this allows to maintain per dst custom state across multiple packets in BPF maps and apply filters based on statistics and behaviour observed over time. - Attachment of L2 headers at transmit where resolving the L2 address is not required. - Possibly many more. v3 -> v4: - Bumped LWT_BPF_MAX_HEADROOM from 128 to 256 (Alexei) - Renamed bpf_skb_push() helper to bpf_skb_change_head() to relate to existing bpf_skb_change_tail() helper (Alexei/Daniel) - Added check in __bpf_redirect_common() to verify that program added a link header before redirecting to a l2 device. Adding the check to lwt-bpf code was considered but dropped due to massive code required due to retrieval of net_device via per-cpu redirect buffer. A test case was added to cover the scenario when a program directs to an l2 device without adding an appropriate l2 header. (Alexei) - Prohibited access to tc_classid (Daniel) - Collapsed bpf_verifier_ops instance for lwt in/out as they are identical (Daniel) - Some cosmetic changes v2 -> v3: - Added real world sample lwt_len_hist_kern.c which demonstrates how to collect a histogram on packet sizes for all packets flowing through a number of routes. - Restricted output to be read-only. Since the header can no longer be modified, the rerouting functionality has been removed again. - Added test case which cover destructive modification of packet data. v1 -> v2: - Added new BPF_LWT_REROUTE return code for program to indicate that new route lookup should be performed. Suggested by Tom. - New sample to illustrate rerouting - New patch 05: Recursion limit for lwtunnel_output for the case when user creates circular dst redirection. Also resolves the issue for ILA. - Fix to ensure headroom for potential future L2 header is still guaranteed ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 10:52:05 -05:00
Thomas Graf	f74599f7c5	bpf: Add tests and samples for LWT-BPF Adds a series of tests to verify the functionality of attaching BPF programs at LWT hooks. Also adds a sample which collects a histogram of packet sizes which pass through an LWT hook. $ ./lwt_len_hist.sh Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.253.2 () port 0 AF_INET : demo Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 10.00 39857.69 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 22 \| \| 64 -> 127 : 98 \| \| 128 -> 255 : 213 \| \| 256 -> 511 : 1444251 \|****** \| 512 -> 1023 : 660610 \|* \| 1024 -> 2047 : 535241 \| \| 2048 -> 4095 : 19 \| \| 4096 -> 8191 : 180 \| \| 8192 -> 16383 : `5578023` \|********************************* \| 16384 -> 32767 : 632099 \|* \| 32768 -> 65535 : 6575 \| \| Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 10:52:00 -05:00
Thomas Graf	3a0af8fd61	bpf: BPF for lightweight tunnel infrastructure Registers new BPF program types which correspond to the LWT hooks: - BPF_PROG_TYPE_LWT_IN => dst_input() - BPF_PROG_TYPE_LWT_OUT => dst_output() - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit() The separate program types are required to differentiate between the capabilities each LWT hook allows: * Programs attached to dst_input() or dst_output() are restricted and may only read the data of an skb. This prevent modification and possible invalidation of already validated packet headers on receive and the construction of illegal headers while the IP headers are still being assembled. * Programs attached to lwtunnel_xmit() are allowed to modify packet content as well as prepending an L2 header via a newly introduced helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is invoked after the IP header has been assembled completely. All BPF programs receive an skb with L3 headers attached and may return one of the following error codes: BPF_OK - Continue routing as per nexthop BPF_DROP - Drop skb and return EPERM BPF_REDIRECT - Redirect skb to device as per redirect() helper. (Only valid in lwtunnel_xmit() context) The return codes are binary compatible with their TC_ACT_ relatives to ease compatibility. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-02 10:51:49 -05:00

1 2 3 4 5 ...

636817 Commits