Commit Graph

752444 Commits

Author SHA1 Message Date
Andrew Lunn
9e4d60938a net: phy: mdio-gpio: Remove reset function
The platform data can contain a function to call to reset
the bit banging interface. It is not used, so remove it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:59:10 -04:00
Andrew Lunn
712e5a5ce7 net: phy_ mdio-gpio: Fixup , which should be ;
Seems like an old typ0.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:59:10 -04:00
Daniel Borkmann
34df9d37ab Merge branch 'bpf-type-format'
Martin KaFai Lau says:

====================
This patch introduces BPF Type Format (BTF).

BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map.  Hence, it basically focus
on the C programming language which the modern BPF is primary
using.  The first use case is to provide a generic pretty print
capability for a BPF map.

A modified pahole that can convert dwarf to BTF is here:

  https://github.com/iamkafai/pahole/tree/btf

Please see individual patch for details.

v5:
- Remove BTF_KIND_FLOAT and BTF_KIND_FUNC which are not
  currently used.  They can be added in the future.
  Some bpf_df_xxx() are removed together.
- Add comment in patch 7 to clarify that the new bpffs_map_fops
  should not be extended further.

v4:
- Fix warning (remove unneeded semicolon)
- Remove a redundant variable (nr_bytes) from btf_int_check_meta() in
  patch 1.  Caught by W=1.

v3:
- Rebase to bpf-next
- Fix sparse warning (by adding static)
- Add BTF header logging: btf_verifier_log_hdr()
- Fix the alignment test on btf->type_off
- Add tests for the BTF header
- Lower the max BTF size to 16MB.  It should be enough
  for some time.  We could raise it later if it would
  be needed.

v2:
- Use kvfree where needed in patch 1 and 2
- Also consider BTF_INT_OFFSET() in the btf_int_check_meta()
  in patch 1
- Fix an incorrect goto target in map_create() during
  the btf-error-path in patch 7
- re-org some local vars to keep the rev xmas tree in btf.c
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:48:35 +02:00
Martin KaFai Lau
c0fa1b6c3e bpf: btf: Add BTF tests
This patch tests the BTF loading, map_create with BTF
and the changes in libbpf.

-r: Raw tests that test raw crafted BTF data
-f: Test LLVM compiled bpf prog with BTF data
-g: Test BPF_OBJ_GET_INFO_BY_FD for btf_fd
-p: Test pretty print

The tools/testing/selftests/bpf/Makefile will probe
for BTF support in llc and pahole before generating
debug info (-g) and convert them to BTF.  You can supply
the BTF supported binary through the following make variables:
LLC, BTF_PAHOLE and LLVM_OBJCOPY.

LLC: The lastest llc with -mattr=dwarfris support for the bpf target.
     It is only in the master of the llvm repo for now.
BTF_PAHOLE: The modified pahole with BTF support:
	    https://github.com/iamkafai/pahole/tree/btf
	    To add a BTF section: "pahole -J bpf_prog.o"
LLVM_OBJCOPY: Any llvm-objcopy should do

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:47:42 +02:00
Martin KaFai Lau
8a138aed4a bpf: btf: Add BTF support to libbpf
If the ".BTF" elf section exists, libbpf will try to create
a btf_fd (through BPF_BTF_LOAD).  If that fails, it will still
continue loading the bpf prog/map without the BTF.

If the bpf_object has a BTF loaded, it will create a map with the btf_fd.
libbpf will try to figure out the btf_key_id and btf_value_id of a map by
finding the BTF type with name "<map_name>_key" and "<map_name>_value".
If they cannot be found, it will continue without using the BTF.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:25 +02:00
Martin KaFai Lau
3bd86a8409 bpf: btf: Sync bpf.h and btf.h to tools/
This patch sync up the bpf.h and btf.h to tools/

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:25 +02:00
Martin KaFai Lau
a26ca7c982 bpf: btf: Add pretty print support to the basic arraymap
This patch adds pretty print support to the basic arraymap.
Support for other bpf maps can be added later.

This patch adds new attrs to the BPF_MAP_CREATE command to allow
specifying the btf_fd, btf_key_id and btf_value_id.  The
BPF_MAP_CREATE can then associate the btf to the map if
the creating map supports BTF.

A BTF supported map needs to implement two new map ops,
map_seq_show_elem() and map_check_btf().  This patch has
implemented these new map ops for the basic arraymap.

It also adds file_operations, bpffs_map_fops, to the pinned
map such that the pinned map can be opened and read.
After that, the user has an intuitive way to do
"cat bpffs/pathto/a-pinned-map" instead of getting
an error.

bpffs_map_fops should not be extended further to support
other operations.  Other operations (e.g. write/key-lookup...)
should be realized by the userspace tools (e.g. bpftool) through
the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
Follow up patches will allow the userspace to obtain
the BTF from a map-fd.

Here is a sample output when reading a pinned arraymap
with the following map's value:

struct map_value {
	int count_a;
	int count_b;
};

cat /sys/fs/bpf/pinned_array_map:

0: {1,2}
1: {3,4}
2: {5,6}
...

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:25 +02:00
Martin KaFai Lau
60197cfb6e bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd
This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd.
The original BTF data, which was used to create the BTF fd during
the earlier BPF_BTF_LOAD call, will be returned.

The userspace is expected to allocate buffer
to info.info and the buffer size is set to info.info_len before
calling BPF_OBJ_GET_INFO_BY_FD.

The original BTF data is copied to the userspace buffer (info.info).
Only upto the user's specified info.info_len will be copied.

The original BTF data size is set to info.info_len.  The userspace
needs to check if it is bigger than its allocated buffer size.
If it is, the userspace should realloc with the kernel-returned
info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:25 +02:00
Martin KaFai Lau
f56a653c1f bpf: btf: Add BPF_BTF_LOAD command
This patch adds a BPF_BTF_LOAD command which
1) loads and verifies the BTF (implemented in earlier patches)
2) returns a BTF fd to userspace.  In the next patch, the
   BTF fd can be specified during BPF_MAP_CREATE.

It currently limits to CAP_SYS_ADMIN.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:25 +02:00
Martin KaFai Lau
b00b8daec8 bpf: btf: Add pretty print capability for data with BTF type info
This patch adds pretty print capability for data with BTF type info.
The current usage is to allow pretty print for a BPF map.

The next few patches will allow a read() on a pinned map with BTF
type info for its key and value.

This patch uses the seq_printf() infra.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:25 +02:00
Martin KaFai Lau
179cde8cef bpf: btf: Check members of struct/union
This patch checks a few things of struct's members:

1) It has a valid size (e.g. a "const void" is invalid)
2) A member's size (+ its member's offset) does not exceed
   the containing struct's size.
3) The member's offset satisfies the alignment requirement

The above can only be done after the needs_resolve member's type
is resolved.  Hence, the above is done together in
btf_struct_resolve().

Each possible member's type (e.g. int, enum, modifier...) implements
the check_member() ops which will be called from btf_struct_resolve().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:24 +02:00
Martin KaFai Lau
eb3f595dab bpf: btf: Validate type reference
After collecting all btf_type in the first pass in an earlier patch,
the second pass (in this patch) can validate the reference types
(e.g. the referring type does exist and it does not refer to itself).

While checking the reference type, it also gathers other information (e.g.
the size of an array).  This info will be useful in checking the
struct's members in a later patch.  They will also be useful in doing
pretty print later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:24 +02:00
Martin KaFai Lau
69b693f0ae bpf: btf: Introduce BPF Type Format (BTF)
This patch introduces BPF type Format (BTF).

BTF (BPF Type Format) is the meta data format which describes
the data types of BPF program/map.  Hence, it basically focus
on the C programming language which the modern BPF is primary
using.  The first use case is to provide a generic pretty print
capability for a BPF map.

BTF has its root from CTF (Compact C-Type format).  To simplify
the handling of BTF data, BTF removes the differences between
small and big type/struct-member.  Hence, BTF consistently uses u32
instead of supporting both "one u16" and "two u32 (+padding)" in
describing type and struct-member.

It also raises the number of types (and functions) limit
from 0x7fff to 0x7fffffff.

Due to the above changes,  the format is not compatible to CTF.
Hence, BTF starts with a new BTF_MAGIC and version number.

This patch does the first verification pass to the BTF.  The first
pass checks:
1. meta-data size (e.g. It does not go beyond the total btf's size)
2. name_offset is valid
3. Each BTF_KIND (e.g. int, enum, struct....) does its
   own check of its meta-data.

Some other checks, like checking a struct's member is referring
to a valid type, can only be done in the second pass.  The second
verification pass will be implemented in the next patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 21:46:24 +02:00
David S. Miller
3f4fe759f0 Merge branch 'ipv6-followup-to-fib6_info-change'
David Ahern says:

====================
net/ipv6: followup to fib6_info change

Followup to fib change for IPv6.

First 2 patches rename fib6_info struct elements to match its name,
and rename addrconf_dst_alloc to match what it returns.

Patches 3-7 refactor the code to remove the need for fib6_idev reducing
fib6_info by another 8 bytes to 200 bytes.

Patch 8 fixes the gfp flags argument to addrconf_prefix_route in a
couple of places.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:14 -04:00
David Ahern
27b10608a2 net/ipv6: Fix gfp_flags arg to addrconf_prefix_route
Eric noticed that __ipv6_ifa_notify is called under rcu_read_lock, so
the gfp argument to addrconf_prefix_route can not be GFP_KERNEL.

While scrubbing other calls I noticed addrconf_addr_gen has one
place with GFP_ATOMIC that can be GFP_KERNEL.

Fixes: acb54e3cba ("net/ipv6: Add gfp_flags to route add functions")
Reported-by: syzbot+2add39b05179b31f912f@syzkaller.appspotmail.com
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:13 -04:00
David Ahern
dcd1f57295 net/ipv6: Remove fib6_idev
fib6_idev can be obtained from __in6_dev_get on the nexthop device
rather than caching it in the fib6_info. Remove it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:13 -04:00
David Ahern
647d4c1363 net/ipv6: Remove compare of fib6_idev from rt6_duplicate_nexthop
After 4832c30d54 ("net: ipv6: put host and anycast routes on device
with address") the comparison of idev does not add value since it
correlates to the nexthop device which is already compared. Remove
the idev comparison.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:13 -04:00
David Ahern
6fe7494144 net/ipv6: Change ip6_route_get_saddr to get dev from route
Prior to 4832c30d54 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.

Commit 4832c30d54 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change ip6_route_get_saddr can just pass the nexthop
device to ipv6_dev_get_saddr.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:13 -04:00
David Ahern
eea68cd371 net/ipv6: Remove unnecessary checks on fib6_idev
Prior to 4832c30d54 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.

Commit 4832c30d54 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change a couple of device checks during route lookups
are no longer needed. Remove them.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:13 -04:00
David Ahern
9ee8cbb2fd net/ipv6: Remove aca_idev
aca_idev has only 1 user - inet6_fill_ifacaddr - and it only
wants the device index which can be extracted from the fib6_info
nexthop.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:13 -04:00
David Ahern
360a9887c8 net/ipv6: Rename addrconf_dst_alloc
addrconf_dst_alloc now returns a fib6_info. Update the name
and its users to reflect the change.

Rename only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:13 -04:00
David Ahern
93c2fb253d net/ipv6: Rename fib6_info struct elements
Change the prefix for fib6_info struct elements from rt6i_ to fib6_.
rt6i_pcpu and rt6i_exception_bucket are left as is given that they
point to rt6_info entries.

Rename only; not functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:40:12 -04:00
Olivier Gayot
ab913455dd docs: ip-sysctl.txt: fix name of some ipv6 variables
The name of the following proc/sysctl entries were incorrectly
documented:

    /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_number
    /proc/sys/net/ipv6/conf/<interface>/max_hbt_opts_number
    /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_length
    /proc/sys/net/ipv6/conf/<interface>/max_hbt_length

Their name was set to the name of the symbol in the .data field of the
control table instead of their .proc name.

Signed-off-by: Olivier Gayot <olivier.gayot@sigexec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 15:20:09 -04:00
Ronak Doshi
65ec0bd1c7 vmxnet3: fix incorrect dereference when rxvlan is disabled
vmxnet3_get_hdr_len() is used to calculate the header length which in
turn is used to calculate the gso_size for skb. When rxvlan offload is
disabled, vlan tag is present in the header and the function references
ip header from sizeof(ethhdr) and leads to incorrect pointer reference.

This patch fixes this issue by taking sizeof(vlan_ethhdr) into account
if vlan tag is present and correctly references the ip hdr.

Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Acked-by: Louis Luo <llouis@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:59:05 -04:00
Cong Wang
f7e4367268 llc: hold llc_sap before release_sock()
syzbot reported we still access llc->sap in llc_backlog_rcv()
after it is freed in llc_sap_remove_socket():

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
 llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785
 llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
 llc_conn_service net/llc/llc_conn.c:400 [inline]
 llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75
 llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891
 sk_backlog_rcv include/net/sock.h:909 [inline]
 __release_sock+0x12f/0x3a0 net/core/sock.c:2335
 release_sock+0xa4/0x2b0 net/core/sock.c:2850
 llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204

llc->sap is refcount'ed and llc_sap_remove_socket() is paired
with llc_sap_add_socket(). This can be amended by holding its refcount
before llc_sap_remove_socket() and releasing it after release_sock().

Reported-by: <syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:54:53 -04:00
Eric Dumazet
88078d98d1 net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.

While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.

We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:44:11 -04:00
Jonathan Corbet
02b94fc70f MAINTAINERS: Direct networking documentation changes to netdev
Networking docs changes go through the networking tree, so patch the
MAINTAINERS file to direct authors to the right place.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:42:24 -04:00
Colin Ian King
f3335545b3 atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit"
Trivial fix to spelling mistake in message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:41:49 -04:00
Pawel Dembicki
4ec7eb3ff6 net: qmi_wwan: add Wistron Neweb D19Q1
This modem is embedded on dlink dwr-960 router.
The oem configuration states:

T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0
D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1
P: Vendor=1435 ProdID=d191 Rev=ff.ff
S: Manufacturer=Android
S: Product=Android
S: SerialNumber=0123456789ABCDEF
C:* #Ifs= 6 Cfg#= 1 Atr=80 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none)
E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=84(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=(none)
E: Ad=86(I) Atr=03(Int.) MxPS= 10 Ivl=32ms
E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E: Ad=88(I) Atr=03(Int.) MxPS= 8 Ivl=32ms
E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
I:* If#= 5 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=(none)
E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=125us

Tested on openwrt distribution

Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:38:01 -04:00
Colin Ian King
5e84b38b07 net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN"
Trivial fix to spelling mistake

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:37:10 -04:00
Zhao Chen
292eba02db net-next/hinic: add arm64 support
This patch enables arm64 platform support for the HINIC driver.

Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:34:44 -04:00
Jose Abreu
565020aaee net: stmmac: Disable ACS Feature for GMAC >= 4
ACS Feature is currently enabled for GMAC >= 4 but the llc_snap status
is never checked in descriptor rx_status callback. This will cause
stmmac to always strip packets even that ACS feature is already
stripping them.

Lets be safe and disable the ACS feature for GMAC >= 4 and always strip
the packets for this GMAC version.

Fixes: 477286b53f ("stmmac: add GMAC4 core support")
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:33:44 -04:00
Maxime Chevallier
da42bb2713 net: mvpp2: Fix DMA address mask size
PPv2 TX/RX descriptors uses 40bits DMA addresses, but 41 bits masks were
used (GENMASK_ULL(40, 0)).

This commit fixes that by using the correct mask.

Fixes: e7c5359f2e ("net: mvpp2: introduce PPv2.2 HW descriptors and adapt accessors")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:12:14 -04:00
David S. Miller
50f694a62f Merge branch 'TCP-data-delivery-and-ECN-stats-tracking'
Yuchung Cheng says:

====================
tracking TCP data delivery and ECN stats

This patch series improve tracking the data delivery status
  1. minor improvement on SYN data
  2. accounting bytes delivered with CE marks
  3. exporting the delivery stats to applications

s.t. users can get better sense of TCP performance at per host,
per connection, and even per application message level.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:05:17 -04:00
Yuchung Cheng
feb5f2ec64 tcp: export packets delivery info
Export data delivered and delivered with CE marks to
1) SNMP TCPDelivered and TCPDeliveredCE
2) getsockopt(TCP_INFO)
3) Timestamping API SOF_TIMESTAMPING_OPT_STATS

Note that for SCM_TSTAMP_ACK, the delivery info in
SOF_TIMESTAMPING_OPT_STATS is reported before the info
was fully updated on the ACK.

These stats help application monitor TCP delivery and ECN status
on per host, per connection, even per message level.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:05:16 -04:00
Yuchung Cheng
e21db6f69a tcp: track total bytes delivered with ECN CE marks
Introduce a new delivered_ce stat in tcp socket to estimate
number of packets being marked with CE bits. The estimation is
done via ACKs with ECE bit. Depending on the actual receiver
behavior, the estimation could have biases.

Since the TCP sender can't really see the CE bit in the data path,
so the sender is technically counting packets marked delivered with
the "ECE / ECN-Echo" flag set.

With RFC3168 ECN, because the ECE bit is sticky, this count can
drastically overestimate the nummber of CE-marked data packets

With DCTCP-style ECN this should be reasonably precise unless there
is loss in the ACK path, in which case it's not precise.

With AccECN proposal this can be made still more precise, even in
the case some degree of ACK loss.

However this is sender's best estimate of CE information.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:05:16 -04:00
Yuchung Cheng
a77fa0104a tcp: new helper to calculate newly delivered
Add new helper tcp_newly_delivered() to prepare the ECN accounting change.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:05:16 -04:00
Yuchung Cheng
bef5767f37 tcp: better delivery accounting for SYN-ACK and SYN-data
the tcp_sock:delivered has inconsistent accounting for SYN and FIN.
1. it counts pure FIN
2. it counts pure SYN
3. it counts SYN-data twice
4. it does not count SYN-ACK

For congestion control perspective it does not matter much as C.C. only
cares about the difference not the aboslute value. But the next patch
would export this field to user-space so it's better to report the absolute
value w/o these caveats.

This patch counts SYN, SYN-ACK, or SYN-data delivery once always in
the "delivered" field.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 13:05:16 -04:00
sunlianwen
bb9aaaa184 net: change the comment of dev_mc_init
The comment of dev_mc_init() is wrong. which use dev_mc_flush
instead of dev_mc_init.

Signed-off-by: Lianwen Sun <sunlw.fnst@cn.fujitsu.com
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-19 12:58:20 -04:00
Jesper Dangaard Brouer
97e19cce05 bpf: reserve xdp_frame size in xdp headroom
Commit 6dfb970d3d ("xdp: avoid leaking info stored in frame data on
page reuse") tried to allow user/bpf_prog to (re)use area used by
xdp_frame (stored in frame headroom), by memset clearing area when
bpf_xdp_adjust_head give bpf_prog access to headroom area.

The mentioned commit had two bugs. (1) Didn't take bpf_xdp_adjust_meta
into account. (2) a combination of bpf_xdp_adjust_head calls, where
xdp->data is moved into xdp_frame section, can cause clearing
xdp_frame area again for area previously granted to bpf_prog.

After discussions with Daniel, we choose to implement a simpler
solution to the problem, which is to reserve the headroom used by
xdp_frame info.

This also avoids the situation where bpf_prog is allowed to adjust/add
headers, and then XDP_REDIRECT later drops the packet due to lack of
headroom for the xdp_frame.  This would likely confuse the end-user.

Fixes: 6dfb970d3d ("xdp: avoid leaking info stored in frame data on page reuse")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-19 17:52:12 +02:00
Wolfram Sang
0cbc94daa5 mmc: renesas_sdhi_internal_dmac: limit DMA RX for old SoCs
Early revisions of certain SoCs cannot do multiple DMA RX streams in
parallel. To avoid data corruption, only allow one DMA RX channel and
fall back to PIO, if needed.

Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Nguyen Viet Dung <dung.nguyen.aj@renesas.com>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Cc: stable@vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2018-04-19 14:57:17 +02:00
Jiri Kosina
b658912cb0 HID: i2c-hid: fix inverted return value from i2c_hid_command()
i2c_hid_command() returns non-zero in error cases (the actual
errno). Error handling in for I2C_HID_QUIRK_RESEND_REPORT_DESCR
case in i2c_hid_resume() had the check inverted; fix that.

Fixes: 3e83eda467 ("HID: i2c-hid: Fix resume issue on Raydium touchscreen device")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2018-04-19 09:25:15 +02:00
Michael Ellerman
56376c5864 powerpc/kvm: Fix lockups when running KVM guests on Power8
When running KVM guests on Power8 we can see a lockup where one CPU
stops responding. This often leads to a message such as:

  watchdog: CPU 136 detected hard LOCKUP on other CPUs 72
  Task dump for CPU 72:
  qemu-system-ppc R  running task    10560 20917  20908 0x00040004

And then backtraces on other CPUs, such as:

  Task dump for CPU 48:
  ksmd            R  running task    10032  1519      2 0x00000804
  Call Trace:
    ...
    --- interrupt: 901 at smp_call_function_many+0x3c8/0x460
        LR = smp_call_function_many+0x37c/0x460
    pmdp_invalidate+0x100/0x1b0
    __split_huge_pmd+0x52c/0xdb0
    try_to_unmap_one+0x764/0x8b0
    rmap_walk_anon+0x15c/0x370
    try_to_unmap+0xb4/0x170
    split_huge_page_to_list+0x148/0xa30
    try_to_merge_one_page+0xc8/0x990
    try_to_merge_with_ksm_page+0x74/0xf0
    ksm_scan_thread+0x10ec/0x1ac0
    kthread+0x160/0x1a0
    ret_from_kernel_thread+0x5c/0x78

This is caused by commit 8c1c7fb0b5 ("powerpc/64s/idle: avoid sync
for KVM state when waking from idle"), which added a check in
pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is
already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and
test of kvm_hstate.hwthread_req.

The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when
entering the guest, so it can then come out to cede with
KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after
setting hwthread_req to 1, but because hwthread_state is still
KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we
wake up from idle and won't go to kvm_start_guest. From there the
thread will return somewhere garbage and crash.

Fix it by skipping the store of hwthread_state, but not the test of
hwthread_req, when coming out of idle. It's OK to skip the sync in
that case because hwthread_req will have been set on the same thread,
so there is no synchronisation required.

Fixes: 8c1c7fb0b5 ("powerpc/64s/idle: avoid sync for KVM state when waking from idle")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19 16:22:20 +10:00
Eric Dumazet
415787d779 ipv6: frags: fix a lockdep false positive
lockdep does not know that the locks used by IPv4 defrag
and IPv6 reassembly units are of different classes.

It complains because of following chains :

1) sch_direct_xmit()        (lock txq->_xmit_lock)
    dev_hard_start_xmit()
     xmit_one()
      dev_queue_xmit_nit()
       packet_rcv_fanout()
        ip_check_defrag()
         ip_defrag()
          spin_lock()     (lock frag queue spinlock)

2) ip6_input_finish()
    ipv6_frag_rcv()       (lock frag queue spinlock)
     ip6_frag_queue()
      icmpv6_param_prob() (lock txq->_xmit_lock at some point)

We could add lockdep annotations, but we also can make sure IPv6
calls icmpv6_param_prob() only after the release of the frag queue spinlock,
since this naturally makes frag queue spinlock a leaf in lock hierarchy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18 23:19:39 -04:00
Michael Neuling
13a83eac37 powerpc/eeh: Fix enabling bridge MMIO windows
On boot we save the configuration space of PCIe bridges. We do this so
when we get an EEH event and everything gets reset that we can restore
them.

Unfortunately we save this state before we've enabled the MMIO space
on the bridges. Hence if we have to reset the bridge when we come back
MMIO is not enabled and we end up taking an PE freeze when the driver
starts accessing again.

This patch forces the memory/MMIO and bus mastering on when restoring
bridges on EEH. Ideally we'd do this correctly by saving the
configuration space writes later, but that will have to come later in
a larger EEH rewrite. For now we have this simple fix.

The original bug can be triggered on a boston machine by doing:
  echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
On boston, this PHB has a PCIe switch on it.  Without this patch,
you'll see two EEH events, 1 expected and 1 the failure we are fixing
here. The second EEH event causes the anything under the PHB to
disappear (i.e. the i40e eth).

With this patch, only 1 EEH event occurs and devices properly recover.

Fixes: 652defed48 ("powerpc/eeh: Check PCIe link after reset")
Cc: stable@vger.kernel.org # v3.11+
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19 13:02:38 +10:00
Subash Abhinov Kasiviswanathan
64e86fec54 net: qualcomm: rmnet: Fix warning seen with fill_info
When the last rmnet device attached to a real device is removed, the
real device is unregistered from rmnet. As a result, the real device
lookup fails resulting in a warning when the fill_info handler is
called as part of the rmnet device unregistration.

Fix this by returning the rmnet flags as 0 when no real device is
present.

WARNING: CPU: 0 PID: 1779 at net/core/rtnetlink.c:3254
rtmsg_ifinfo_build_skb+0xca/0x10d
Modules linked in:
CPU: 0 PID: 1779 Comm: ip Not tainted 4.16.0-11872-g7ce2367 #1
Stack:
 7fe655f0 60371ea3 00000000 00000000
 60282bc6 6006b116 7fe65600 60371ee8
 7fe65660 6003a68c 00000000 900000000
Call Trace:
 [<6006b116>] ? printk+0x0/0x94
 [<6001f375>] show_stack+0xfe/0x158
 [<60371ea3>] ? dump_stack_print_info+0xe8/0xf1
 [<60282bc6>] ? rtmsg_ifinfo_build_skb+0xca/0x10d
 [<6006b116>] ? printk+0x0/0x94
 [<60371ee8>] dump_stack+0x2a/0x2c
 [<6003a68c>] __warn+0x10e/0x13e
 [<6003a82c>] warn_slowpath_null+0x48/0x4f
 [<60282bc6>] rtmsg_ifinfo_build_skb+0xca/0x10d
 [<60282c4d>] rtmsg_ifinfo_event.part.37+0x1e/0x43
 [<60282c2f>] ? rtmsg_ifinfo_event.part.37+0x0/0x43
 [<60282d03>] rtmsg_ifinfo+0x24/0x28
 [<60264e86>] dev_close_many+0xba/0x119
 [<60282cdf>] ? rtmsg_ifinfo+0x0/0x28
 [<6027c225>] ? rtnl_is_locked+0x0/0x1c
 [<6026ca67>] rollback_registered_many+0x1ae/0x4ae
 [<600314be>] ? unblock_signals+0x0/0xae
 [<6026cdc0>] ? unregister_netdevice_queue+0x19/0xec
 [<6026ceec>] unregister_netdevice_many+0x21/0xa1
 [<6027c765>] rtnl_delete_link+0x3e/0x4e
 [<60280ecb>] rtnl_dellink+0x262/0x29c
 [<6027c241>] ? rtnl_get_link+0x0/0x3e
 [<6027f867>] rtnetlink_rcv_msg+0x235/0x274

Fixes: be81a85f5f ("net: qualcomm: rmnet: Implement fill_info")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18 21:23:06 -04:00
Haiyang Zhang
0dcec221dd hv_netvsc: Add NetVSP v6 and v6.1 into version negotiation
This patch adds the NetVSP v6 and 6.1 message structures, and includes
these versions into NetVSC/NetVSP version negotiation process.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18 21:20:44 -04:00
Stephen Hemminger
0fe554a46a hv_netvsc: propogate Hyper-V friendly name into interface alias
This patch implement the 'Device Naming' feature of the Hyper-V
network device API. In Hyper-V on the host through the GUI or PowerShell
it is possible to enable the device naming feature which causes
the host to make available to the guest the name of the device.
This shows up in the RNDIS protocol as the friendly name.

The name has no particular meaning and is limited to 256 characters.
The value can only be set via PowerShell on the host, but could
be scripted for mass deployments. The default value is the
string 'Network Adapter' and since that is the same for all devices
and useless, the driver ignores it.

In Windows, the value goes into a registry key for use in SNMP
ifAlias. For Linux, this patch puts the value in the network
device alias property; where it is visible in ip tools and SNMP.

The host provided ifAlias is just a suggestion, and can be
overridden by later ip commands.

Also requires exporting dev_set_alias in netdev core.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18 21:19:46 -04:00
David S. Miller
c4ff0b0fc0 Merge branch 'r8169-series-with-further-smaller-improvements'
Heiner Kallweit says:

====================
r8169: series with further smaller improvements

This series includes further smaller improvements.

Then I think the basic cleanup has been done and next step would be
preparing the switch to phylib.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18 21:12:00 -04:00
Heiner Kallweit
6ed0e08f9e r8169: remove jumbo_tx_csum from chip config struct
According to the chip configuration entries only RTL8169 (ver <= 06)
supports tx checksumming for jumbo packets.
By the way: constant JUMBO_1K is a little misleading because it refers
to the standard packet size and not to a jumbo packet size.

By implementing this rule we can get rid of configuring tx checksumming
support per chip type.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-18 21:11:59 -04:00