Commit Graph

724497 Commits

Author SHA1 Message Date
John Fastabend
bcecb4bbf8 net: ptr_ring: otherwise safe empty checks can overrun array bounds
When running consumer and/or producer operations and empty checks in
parallel its possible to have the empty check run past the end of the
array. The scenario occurs when an empty check is run while
__ptr_ring_discard_one() is in progress. Specifically after the
consumer_head is incremented but before (consumer_head >= ring_size)
check is made and the consumer head is zeroe'd.

To resolve this, without having to rework how consumer/producer ops
work on the array, simply add an extra dummy slot to the end of the
array. Even if we did a rework to avoid the extra slot it looks
like the normal case checks would suffer some so best to just
allocate an extra pointer.

Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 11:36:35 -05:00
Daniel Borkmann
5620e1a8e2 Merge branch 'bpf-offload-report-dev'
Jakub Kicinski says:

====================
This series is a redo of reporting offload device information to
user space after the first attempt did not take into account name
spaces.  As requested by Kirill offloads are now protected by an
r/w sem.  This allows us to remove the workqueue and free the
offload state fully when device is removed (suggested by Alexei).

Net namespace is reported with a device/inode pair.

The accompanying bpftool support is placed in common code because
maps will have very similar info.  Note that the UAPI information
can't be nicely encapsulated into a struct, because in case we
need to grow the device information the new fields will have to
be added at the end of struct bpf_prog_info, we can't grow
structures in the middle of bpf_prog_info.

v3:
 - use dev_get_by_index();
 - redo ns code (new patch 6).
v2:
 - rework the locking in patch 1 (use RCU instead of locking
   dependencies);
 - grab RTNL for a short time in patch 6;
 - minor update to the test in patch 8.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:24 +01:00
Jakub Kicinski
752d7b4501 selftests/bpf: test device info reporting for bound progs
Check if bound programs report correct device info.  Test
in local namespace, in remote one, back to the local ns,
remove the device and check that information is cleared.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:23 +01:00
Jakub Kicinski
522622104e tools: bpftool: report device information for offloaded programs
Print the just-exposed device information about device to which
program is bound.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:23 +01:00
Jakub Kicinski
675fc275a3 bpf: offload: report device information for offloaded programs
Report to the user ifindex and namespace information of offloaded
programs.  If device has disappeared return -ENODEV.  Specify the
namespace using dev/inode combination.

CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:23 +01:00
Jakub Kicinski
cdab6ba866 nsfs: generalize ns_get_path() for path resolution with a task
ns_get_path() takes struct task_struct and proc_ns_ops as its
parameters.  For path resolution directly from a namespace,
e.g. based on a networking device's net name space, we need
more flexibility.  Add a ns_get_path_cb() helper which will
allow callers to use any method of obtaining the name space
reference.  Convert ns_get_path() to use ns_get_path_cb().

Following patches will bring a networking user.

CC: Eric W. Biederman <ebiederm@xmission.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:23 +01:00
Jakub Kicinski
ad8ad79f4f bpf: offload: free program id when device disappears
Bound programs are quite useless after their device disappears.
They are simply waiting for reference count to go to zero,
don't list them in BPF_PROG_GET_NEXT_ID by freeing their ID
early.

Note that orphaned offload programs will return -ENODEV on
BPF_OBJ_GET_INFO_BY_FD so user will never see ID 0.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:23 +01:00
Jakub Kicinski
ce3b9db4db bpf: offload: free prog->aux->offload when device disappears
All bpf offload operations should now be under bpf_devs_lock,
it's safe to free and clear the entire offload structure,
not only the netdev pointer.

__bpf_prog_offload_destroy() will no longer be called multiple
times.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:23 +01:00
Jakub Kicinski
cae1927c0b bpf: offload: allow netdev to disappear while verifier is running
To allow verifier instruction callbacks without any extra locking
NETDEV_UNREGISTER notification would wait on a waitqueue for verifier
to finish.  This design decision was made when rtnl lock was providing
all the locking.  Use the read/write lock instead and remove the
workqueue.

Verifier will now call into the offload code, so dev_ops are moved
to offload structure.  Since verifier calls are all under
bpf_prog_is_dev_bound() we no longer need static inline implementations
to please builds with CONFIG_NET=n.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:23 +01:00
Jakub Kicinski
9a18eedb14 bpf: offload: don't use prog->aux->offload as boolean
We currently use aux->offload to indicate that program is bound
to a specific device.  This forces us to keep the offload structure
around even after the device is gone.  Add a bool member to
struct bpf_prog_aux to indicate if offload was requested.

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:22 +01:00
Jakub Kicinski
e0d3974ac7 bpf: offload: don't require rtnl for dev list manipulation
We don't need the RTNL lock for all operations on offload state.
We only need to hold it around ndo calls.  The device offload
initialization doesn't require it.  The soon-to-come querying
of the offload info will only need it partially.  We will also
be able to remove the waitqueue in following patches.

Use struct rw_semaphore because map offload will require sleeping
with the semaphore held for read.

Suggested-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-31 16:12:22 +01:00
Roman Gushchin
fb982666e3 tools/bpftool: fix bpftool build with bintutils >= 2.9
Bpftool build is broken with binutils version 2.29 and later.
The cause is commit 003ca0fd2286 ("Refactor disassembler selection")
in the binutils repo, which changed the disassembler() function
signature.

Fix this by adding a new "feature" to the tools/build/features
infrastructure and make it responsible for decision which
disassembler() function signature to use.

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-30 01:07:36 +01:00
Roman Gushchin
4bfe3bd3cc tools/bpftool: use version from the kernel source tree
Bpftool determines it's own version based on the kernel
version, which is picked from the linux/version.h header.

It's strange to use the version of the installed kernel
headers, and makes much more sense to use the version
of the actual source tree, where bpftool sources are.

Fix this by building kernelversion target and use
the resulting string as bpftool version.

Example:
before:

$ bpftool version
bpftool v4.14.6

after:
$ bpftool version
bpftool v4.15.0-rc3

$bpftool version --json
{"version":"4.15.0-rc3"}

Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-30 01:07:36 +01:00
David S. Miller
6bb8824732 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/ipv6/ip6_gre.c is a case of parallel adds.

include/trace/events/tcp.h is a little bit more tricky.  The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-29 15:42:26 -05:00
Linus Torvalds
2758b3e3e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) IPv6 gre tunnels end up with different default features enabled
    depending upon whether netlink or ioctls are used to bring them up.
    Fix from Alexey Kodanev.

 2) Fix read past end of user control message in RDS< from Avinash
    Repaka.

 3) Missing RCU barrier in mini qdisc code, from Cong Wang.

 4) Missing policy put when reusing per-cpu route entries, from Florian
    Westphal.

 5) Handle nested PCI errors properly in bnx2x driver, from Guilherme G.
    Piccoli.

 6) Run nested transport mode IPSEC packets via tasklet, from Herbert
    Xu.

 7) Fix handling poll() for stream sockets in tipc, from Parthasarathy
    Bhuvaragan.

 8) Fix two stack-out-of-bounds issues in IPSEC, from Steffen Klassert.

 9) Another zerocopy ubuf handling fix, from Willem de Bruijn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
  strparser: Call sock_owned_by_user_nocheck
  sock: Add sock_owned_by_user_nocheck
  skbuff: in skb_copy_ubufs unclone before releasing zerocopy
  tipc: fix hanging poll() for stream sockets
  sctp: Replace use of sockets_allocated with specified macro.
  bnx2x: Improve reliability in case of nested PCI errors
  tg3: Enable PHY reset in MTU change path for 5720
  tg3: Add workaround to restrict 5762 MRRS to 2048
  tg3: Update copyright
  net: fec: unmap the xmit buffer that are not transferred by DMA
  tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path
  tipc: error path leak fixes in tipc_enable_bearer()
  RDS: Check cmsg_len before dereferencing CMSG_DATA
  tcp: Avoid preprocessor directives in tracepoint macro args
  tipc: fix memory leak of group member when peer node is lost
  net: sched: fix possible null pointer deref in tcf_block_put
  tipc: base group replicast ack counter on number of actual receivers
  net_sched: fix a missing rcu barrier in mini_qdisc_pair_swap()
  net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
  ip6_gre: fix device features for ioctl setup
  ...
2017-12-28 23:20:21 -08:00
Linus Torvalds
fd84b751dd nouveau and i915 regression fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaRa4qAAoJEAx081l5xIa+mEYQAIH3TxsQAKU6htpoLySb0FbP
 1svdRSLbJa2dLk2qChMblc+a+GL8HNUmRCp6vNvpmU6lFNfuHpZ4nBx6CR3UuYjx
 7DKiHuF/TeuzCza+StVvFxD6NpxnL4/i5lGopxCspDLujrirj6p4hlTMGF1ZQhLt
 1VYL02IbD2oPacZ/vnT1cgv6EVoNdfJkdGUIGU4O0w+pTTLhnty9RlBNonVDJ7CS
 42KEFld9jEwS0HAd5Sxq28Njt0MSj/ZXPuR92yAm4jGLfRF/+v2pbTa5BUCkcn+R
 58/e1ZgJtofYmz+RujWLDHjxBmVVx856fqN+3Fdi84+rGeO/q8h1hUM+C2W/0+Zs
 626nfNVzRCm9AFIUL8GGzoGchHUFFYWRzXb6ymptK4SZLyjr4VBt5QZxeHA6toyv
 rrvFqTV/pmSF3yEOjdVl1pM69naPGjEpMDK2MkJQZF3g5r9lIXENcvr3QFxsJrlf
 EeQfv9qDK8LOuqk4iq48gmDT90AmZEVErf7j77dKXgpAGN1mnkvYlvwVFORgt5QZ
 rITCyfKpBtW41+cu5dEu7bFvg45JVcLPTctaW5i/IeYC/Rn7Q+1ghr5wIAYsJgiF
 rvlSajrpEGeX1QP6ETyW+XkfL6X225GDnUXmCxbis/C1a7d4J2R2Qi0JkW7nMc3S
 R/98A850ue9zmXB59ZUT
 =uS+n
 -----END PGP SIGNATURE-----

Merge tag 'drm-fixes-for-v4.15-rc6' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "nouveau and i915 regression fixes"

* tag 'drm-fixes-for-v4.15-rc6' of git://people.freedesktop.org/~airlied/linux:
  drm/nouveau: fix race when adding delayed work items
  i915: Reject CCS modifiers for pipe C on Geminilake
  drm/i915/gvt: Fix pipe A enable as default for vgpu
2017-12-28 23:16:24 -08:00
Linus Torvalds
c0208a33cb One more fix for the runtime PM clk patches. We're calling a runtime
PM API that may schedule from somewhere that we can't do that. We
 change to the async version of pm_runtime_put() to fix it.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJaRXo1AAoJEK0CiJfG5JUlTooP/jOuAt/qy2USUbzreVngKMcS
 GKU0/yJVigLBGrslZnofgQlN9QwM/CWD1fUHokUsp7aBOlgLsG4C3gr+Sidrnwq+
 XxjwZ0LWk4z+2s8p5BJ1PEEeO3HQDtxqnQVlw2P9ZsBzUCLWF5dF3mY0CCHlJ5tD
 25lQFx4lb+j29gh9gn4abGtJ5D/ZPAreRKqDVl3QYN7+AJCPecM67B1Hb2AivQrm
 v1pBBF0rZZokDjp96A12LSsy7Oq7IcbuXKcEcX0VfgLoWB9bHaAHpI+7XzKsVlHB
 TdrfFfX7Eh8rDSbHmJXeJU6YP8TsdCCfUItkyznVzYt3FUo3PPQBGYUVHSUKR/Qc
 NY7gK/Uq/o4/NrD35zd+Q9I6FBnfuSN/ToJh3D8IJUyx11Dz23AFw0iz2INuwMsN
 WZRz4jld8p+KI51QPGwwOpqYmPEb9GWHVWdfNXcHhkYNOaE7yutLeDCagQZBRKn0
 1ej7gPJVSqfs1RsFetWtxLUtbZc34rIaDdhqDglMhOUwucNq22pH7f+5V0hpnKTz
 iECLiNNubW4Q9E6aQYk+2v0QtrVWH+rn2FodR9Njv++ZDSqoQDaquQco84p0xi/H
 VI5pMo5ynlEZxwQdcvXtj3Ee8vtirURbl0lnyySZptKTEObe02QK2fOsRHwrO7Tn
 W2kllbEkg4+7RSMljxmF
 =cUDR
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fix from Stephen Boyd:
 "One more fix for the runtime PM clk patches. We're calling a runtime
  PM API that may schedule from somewhere that we can't do that. We
  change to the async version of pm_runtime_put() to fix it"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: use atomic runtime pm api in clk_core_is_enabled
2017-12-28 23:14:47 -08:00
Linus Torvalds
4f2382f380 LED fix for 4.15-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJaRVfiAAoJEL1qUBy3i3wmhwwP/iOaRjSGlHLLqXstts2cJMfL
 yu79TzVfndq4eahIP34RyV6Yh/8w59LAD06Mtv5ALP3A/3ENArTmd+FDjMgUI4Lp
 WES2A+f7A2tWOPQGwu3FjHJ7VXSVWxqlcoWdSRiVXWssk1r20eVquKR9pgt/URwW
 FS2i3cLp+lQa8n2LeG1SLNHccPbl3V+ecgIi23RflE1s8ZxLsFFewImhtIOWABcf
 GDg66Vqpg5RHT9dWDsC67IsbyXUWtrkusWq3EzRolS90YnXRPK6nLY4OcbV+7In8
 2WF9lzGZlcnfhgJ2/u/iQPUsEeHPG8v/I6RhKjjOuUd+wnOySAOZQYfTiH7tA1fR
 SbjAohpcpPJycUOwhEkZ0JUXxzXKRyxf+NK59AzZNqezdMrJ7NgWibsYi7UpT6Ge
 mdlph+vVEaovh9KMFyZHWWES85mYfiNn7M3MYZaakUTHJ9z4kKXkRhLxSHCH1Xv5
 +EzEXmlUXWe2Nf/Fgj2B8SKQKTVb4U7cCz1SUA4c7nPZOdGGRakmCDZVlsZRK4j5
 /sn71OL/ES/3dXZK1bHaWmB+VIJLxfGfy/1bmqBX417JFoY3EjeNBPGSHicL3Rdz
 W8ooPeMCE8xd4uB7y+6rsE50h/TG/JKyk9ELtabol9Gnzu3Gt5M+wcSvGrKoXxBi
 0eBvOHx+JM09h+7JnJpe
 =1acw
 -----END PGP SIGNATURE-----

Merge tag 'led_fixes_for_4.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds

Pull LED fix from Jacek Anaszewski:
 "A single LED fix for brightness setting when delay_off is 0"

* tag 'led_fixes_for_4.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
  led: core: Fix brightness setting when setting delay_off=0
2017-12-28 23:09:45 -08:00
Linus Torvalds
19286e4a7a Third pull request for 4.15-rc
- cxgb4 fix for an iser testing failure as debugged by Steve and Sagi.
   The problem was a driver bug in the handling of shutting down a QP.
 - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on error
   unwind and a use after free bug
 - Improper congestion counter values on mlx5 when link aggregation is enabled
 - ipoib lockdep regression introduced in this merge window
 - hfi1 regression supporting the device in a VM introduced in a recent patch
 - Typo that breaks future uAPI compatibility in the verbs core
 - More SELinux related oops fixing
 - Fix an oops during error unwind in mlx5
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaRIC/AAoJEDht9xV+IJsaJfQP/1Z97/kDlJGIJQ4vBJ52xdHV
 LfRdmCBqU5nrAihEBpFLRc2S+kaSJbYAY48tRn28Jx6s9dmSvU6v2J2IqhmnM6p6
 ruWLR0Yqjg+xHcw+eaEoscJjRw+jDUEeVOgfbYc0HViWwvMNTrBB32HpAV48HuAl
 aCbM/qrQYXdYuJBImM4glERIpjlvYKoxv4D9xCJhJRRQvTnKOymHzZpKbqNujWxl
 dzCmZeOrw+HVxNW9MHHtUxClBoLNnykfRVKzMcdDjsqJ+Fdo2bY3ksgMvgiatRwY
 NxGfixhouhOz9vjN/ljpWXxTV5TTm6Nrib8XcHuOWjcYn/AFwJMMRsM+1w1AuCKs
 Zviq7QVApZzYuvHw1ewupRGvDX+P13sufD5sbc6cfVUT3w6ZX0Clpspl4++JN4ER
 WvBZikozaviL3w9ir0drlZ6k9BDnjQ6P7wZcBjDZC/j0zXKM65rISZrTsK7TeiTk
 lBNdLCkwZhO0dvafCNwA910tTaXEPhqqAh8Okob2A5U5lUAewd0AEHJusL/iCmSl
 uXnnxu8ik61QzOqwneEHSyVMkOSLEC+kk13fiFAq/LjPUSm9N/MihZd4JNxwSa6W
 4Rah7IKdh9F6qEnaKLPEfHxPhfghhb7O51zCA8mwA/JNCneqc4Gqi0U2JXkuloml
 395aK2aZSShIkZvIwbI8
 =IkGi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is the next batch of for-rc patches from RDMA. It includes the
  fix for the ipoib regression I mentioned last time, and the result of
  a fairly major debugging effort to get iser working reliably on cxgb4
  hardware - it turns out the cxgb4 driver was not handling QP error
  flushing properly causing iser to fail.

   - cxgb4 fix for an iser testing failure as debugged by Steve and
     Sagi. The problem was a driver bug in the handling of shutting down
     a QP.

   - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on
     error unwind and a use after free bug

   - Improper congestion counter values on mlx5 when link aggregation is
     enabled

   - ipoib lockdep regression introduced in this merge window

   - hfi1 regression supporting the device in a VM introduced in a
     recent patch

   - Typo that breaks future uAPI compatibility in the verbs core

   - More SELinux related oops fixing

   - Fix an oops during error unwind in mlx5"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/mlx5: Fix mlx5_ib_alloc_mr error flow
  IB/core: Verify that QP is security enabled in create and destroy
  IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
  IB/mlx5: Serialize access to the VMA list
  IB/hfi: Only read capability registers if the capability exists
  IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
  IB/mlx5: Fix congestion counters in LAG mode
  RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy
  RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning
  RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
  iw_cxgb4: when flushing, complete all wrs in a chain
  iw_cxgb4: reflect the original WR opcode in drain cqes
  iw_cxgb4: Only validate the MSN for successful completions
2017-12-28 23:06:01 -08:00
David S. Miller
d367341b25 mlx5-shared-4.16-1
mlx5 shared code for both rdma-next and net-next trees.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaRXPQAAoJEEg/ir3gV/o+H4wH/2CkV3tOLfRekNd4CFSoH78A
 zH0Gjwa3P7aXybTmhXbMNCYLEoVEZ5pSlToOmjz1FrmxhH62JQ80WyKOcYtiHMBg
 3x5tFZboLc9tMGwPhyBJBjyiH+Gh9ZMoD6hBFgSvIG/hNPUb1W48/Pc+R61gOrMw
 6ADU+6mIf5cHNQ4c/V/SBlfiQjSXN4Y38knhTeZy8dLcZZVg1eMn+pj7W/haAyb6
 t3IMEaUmlDYwQmtxTT2snK4VutEPfxYGv1gyKSkZXmY74aRvSzlgV7PqXM3qsV4W
 8ZEhEHZJGi6NXC2hk5FQSSPWhQOhAmpjTHm8aImK0SIf68YajjzaZnT9S+eMmdY=
 =uMjj
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-shared-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 E-Switch updates 2017-12-19

This series includes updates for mlx5 E-Switch infrastructures,
to be merged into net-next and rdma-next trees.

Mark's patches provide E-Switch refactoring that generalize the mlx5
E-Switch vf representors interfaces and data structures. The serious is
mainly focused on moving ethernet (netdev) specific representors logic out
of E-Switch (eswitch.c) into mlx5e representor module (en_rep.c), which
provides better separation and allows future support for other types of vf
representors (e.g. RDMA).

Gal's patches at the end of this serious, provide a simple syntax fix and
two other patches that handles vport ingress/egress ACL steering name
spaces to be aligned with the Firmware/Hardware specs.

V1->V2:
 - Addressed coding style comments in patches #1 and #7
 - The series is still based on rc4, as now I see net-next is also @rc4.

V2->V3:
 - Fixed compilation warning, reported by Dave.

Please pull and let me know if there's any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 19:32:59 -05:00
Gal Pressman
9b93ab981e net/mlx5: Separate ingress/egress namespaces for each vport
Each vport has its own root flow table for the ACL flow tables and root
flow table is per namespace, therefore we should create a namespace for
each vport.

Fixes: efdc810ba3 ("net/mlx5: Flow steering, Add vport ACL support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:52 +02:00
Gal Pressman
4484e29948 net/mlx5: Fix ingress/egress naming mistake
The functions names do not represent their actions, switch the mistaken
ingress/egress naming.

Fixes: fba53f7b57 ("net/mlx5: Introduce mlx5_flow_steering structure")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:52 +02:00
Gal Pressman
18a89ab766 net/mlx5e: E-Switch, Use the name of static array instead of its address
Using the address of a static array is the same as using its name (in
this specific use-case), but it's confusing and makes the code less
readable.

Fixes: 1bd27b11c1 ("net/mlx5: Introduce E-switch QoS management")
Fixes: bd77bf1cb5 ("net/mlx5: Add SRIOV VF max rate configuration support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:51 +02:00
Mark Bloch
2c47bf80e8 net/mlx5e: E-Switch, Move send-to-vport rule struct to en_rep
Move struct mlx5_esw_sq which keeps send-to-vport rule to from the eswitch
code to mlx5e and rename it to better reflect where it belongs

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:51 +02:00
Mark Bloch
a4b97ab421 net/mlx5: E-Switch, Create generic header struct to be used by representors
Now that we don't store type dependent data in struct mlx5_eswitch_rep
we can create a generic interface, and representor type.

struct mlx5_eswitch_rep will store an array of interfaces, each
interface is used by a different representor type.

Once we moved to a more generic interface, rdma driver representors can
be added and utilize the same mechanism as the Ethernet driver
representors use.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:50 +02:00
David S. Miller
d5902f6d1f Merge branch 'strparser-Fix-lockdep-issue'
Tom Herbert says:

====================
strparser: Fix lockdep issue

When sock_owned_by_user returns true in strparser. Fix is to add and
call sock_owned_by_user_nocheck since the check for owned by user is
not an error condition in this case.
====================

Fixes: 43a0c6751a ("strparser: Stream parser for messages")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-and-tested-by: <syzbot+c91c53af67f9ebe599a337d2e70950366153b295@syzkaller.appspotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 14:28:23 -05:00
Tom Herbert
d66fa9ec53 strparser: Call sock_owned_by_user_nocheck
strparser wants to check socket ownership without producing any
warnings. As indicated by the comment in the code, it is permissible
for owned_by_user to return true.

Fixes: 43a0c6751a ("strparser: Stream parser for messages")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-and-tested-by: <syzbot+c91c53af67f9ebe599a337d2e70950366153b295@syzkaller.appspotmail.com>
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 14:28:22 -05:00
Tom Herbert
602f7a2714 sock: Add sock_owned_by_user_nocheck
This allows checking socket lock ownership with producing lockdep
warnings.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 14:28:22 -05:00
Willem de Bruijn
f72c4ac695 skbuff: in skb_copy_ubufs unclone before releasing zerocopy
skb_copy_ubufs must unclone before it is safe to modify its
skb_shared_info with skb_zcopy_clear.

Commit b90ddd5687 ("skbuff: skb_copy_ubufs must release uarg even
without user frags") ensures that all skbs release their zerocopy
state, even those without frags.

But I forgot an edge case where such an skb arrives that is cloned.

The stack does not build such packets. Vhost/tun skbs have their
frags orphaned before cloning. TCP skbs only attach zerocopy state
when a frag is added.

But if TCP packets can be trimmed or linearized, this might occur.
Tracing the code I found no instance so far (e.g., skb_linearize
ends up calling skb_zcopy_clear if !skb->data_len).

Still, it is non-obvious that no path exists. And it is fragile to
rely on this.

Fixes: b90ddd5687 ("skbuff: skb_copy_ubufs must release uarg even without user frags")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 14:26:22 -05:00
David S. Miller
8d1666fdfc Merge branch 'mlx4-misc-for-4.16'
Tariq Toukan says:

====================
mlx4 misc for 4.16

This patchset contains misc cleanups and improvements
to the mlx4 Core and Eth drivers.

In patches 1 and 2 I reduce and reorder the branches in the RX csum flow.
In patch 3 I align the FMR unmapping flow with the device spec, to allow
  a remapping afterwards.
Patch 4 by Moni changes the default QoS settings so that a pause
  frame stops all traffic regardless of its prio.

Series generated against net-next commit:
836df24a70 net: hns3: hns3_get_channels() can be static
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:14 -05:00
Moni Shoua
a42b63c1ac net/mlx4_en: Change default QoS settings
Change the default mapping between TC and TCG as follows:

Prio     |             TC/TCG
         |      from             to
         |    (set by FW)      (set by SW)
---------+-----------------------------------
0        |      0/0              0/7
1        |      1/0              0/6
2        |      2/0              0/5
3        |      3/0              0/4
4        |      4/0              0/3
5        |      5/0              0/2
6        |      6/0              0/1
7        |      7/0              0/0

These new settings cause that a pause frame for any prio stops
traffic for all prios.

Fixes: 564c274c3d ("net/mlx4_en: DCB QoS support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
fd4a3e2828 net/mlx4_core: Cleanup FMR unmapping flow
Remove redundant and not essential operations in fmr unmap/free.
According to device spec, in FMR unmap it is sufficient to set
ownership bit to SW. This allows remapping afterwards.

Fixes: 8ad11fb6b0 ("IB/mlx4: Implement FMRs")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
dc484851ed net/mlx4_en: RX csum, reorder branches
Use early goto commands, and save else branches.
This uses less indentations and brackets, making the code
more readable.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
345ef18c24 net/mlx4_en: RX csum, remove redundant branches and checks
Do not check IPv6 bit in cqe status if CONFIG_IPV6 is not enabled.
Function check_csum() is reached only with IPv4 or IPv6 set (if enabled),
if IPv6 is not set (or is not enabled) it is redundant to test the
IPv4 bit.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Jiri Pirko
8ec6957403 net: sched: don't set extack message in case the qdisc will be created
If the qdisc is not found here, it is going to be created. Therefore,
this is not an error path. Remove the extack message set and don't
confuse user with error message in case the qdisc was created
successfully.

Fixes: 0921559811 ("net: sched: sch_api: handle generic qdisc errors")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:18:31 -05:00
Parthasarathy Bhuvaragan
517d7c79bd tipc: fix hanging poll() for stream sockets
In commit 42b531de17 ("tipc: Fix missing connection request
handling"), we replaced unconditional wakeup() with condtional
wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND.

This breaks the applications which do a connect followed by poll
with POLLOUT flag. These applications are not woken when the
connection is ESTABLISHED and hence sleep forever.

In this commit, we fix it by including the POLLOUT event for
sockets in TIPC_CONNECTING state.

Fixes: 42b531de17 ("tipc: Fix missing connection request handling")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:15:26 -05:00
David S. Miller
d3dbd7a6aa Merge branch 'AVE-ethernet'
Kunihiko Hayashi says:

====================
net: add UniPhier AVE ethernet support

This series adds support for Socionext AVE ethernet controller implemented
on UniPhier SoCs. This driver supports RGMII/RMII modes.

v8: https://www.spinics.net/lists/netdev/msg474374.html

The PHY patch included in v1 has already separated in:
http://www.spinics.net/lists/netdev/msg454595.html

Changes since v8:
- move operators at the beginning of the line to the end of the previous line
- dt-bindings: add blank lines before mdio and phy subnodes

Changes since v7:
- dt-bindings: fix mdio subnode description

Changes since v6:
- sort the order of local variables from longest to shortest line
- fix ave_probe() which calls register_netdev() at the end of initialization
- dt-bindings: remove phy node descriptions in mdio node

Changes since v5:
- replace license boilerplate with SPDX Identifier
- remove inline directives and an unused function

Changes since v4:
- fix larger integer warning on AVE_PFMBYTE_MASK0

Changes since v3:
- remove checking dma address and use dma_set_mask() to restirct address
- replace ave_mdio_busywait() with read_poll_timeout()
- replace functions to access to registers with readl/writel() directly
- replace a function to access to macaddr with ave_hw_write_macaddr()
- change return value of ave_dma_map() to error value
- move mdiobus_unregister() from ave_remove() to ave_uninit()
- eliminate else block at the end of ave_dma_map()
- add mask definitions for packet filter
- sort bitmap definitions in descending order
- add error check to some functions
- rename and sort functions to clear sub-categories
- fix error value consistency
- remove unneeded initializers
- change type of constant arrays

Changes since v2:
- replace clk_get() with devm_clk_get()
- replace reset_control_get() with devm_reset_control_get_optional_shared()
- add error return when the error occurs on the above *_get functions
- sort soc data and compatible strings
- remove clearly obvious comments
- modify dt-bindings document consistent with these modifications

Changes since v1:
- add/remove devicetree properties and sub-node
  - remove "internal-phy-interrupt" and "desc-bits" property
  - add SoC data structures based on compatible strings
  - add node operation to apply "mdio" sub-node
- add support for features
  - add support for {get,set}_pauseparam and pause frame operations
  - add support for ndo_get_stats64 instead of ndo_get_stats
- replace with desiable functions
  - replace check for valid phy_mode with phy_interface{_mode}_is_rgmii()
  - replace phy attach message with phy_attached_info()
  - replace 32bit operation with {upper,lower}_32_bits() on ave_wdesc_addr()
  - replace nway_reset and get_link with generic functions
- move operations to proper functions
  - move phy_start_aneg() to ndo_open,
    and remove unnecessary PHY interrupt operations
    See http://www.spinics.net/lists/netdev/msg454590.html
  - move irq initialization and descriptor memory allocation to ndo_open
  - move initialization of reset and clock and mdiobus to ndo_init
- fix skbuffer operations
  - fix skb alignment operations and add Rx buffer adjustment for descriptor
    See http://www.spinics.net/lists/netdev/msg456014.html
  - add error returns when dma_map_single() failed
- clean up code structures
  - clean up wait-loop and wake-queue conditions
  - add ave_wdesc_addr() and offset definitions
  - add ave_macaddr_init() to clean up mac-address operation
  - fix checking whether Tx entry is not enough
  - fix supported features of phydev
  - add necessary free/disable operations
  - add phydev check on ave_{get,set}_wol()
  - remove netif_carrier functions, phydev initializer, and Tx budget check
- change obsolate codes
  - replace ndev->{base_addr,irq} with the members of ave_private
- rename goto labels and mask definitions, and remove unused codes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:10:40 -05:00
Kunihiko Hayashi
4c270b55a5 net: ethernet: socionext: add AVE ethernet driver
The UniPhier platform from Socionext provides the AVE ethernet
controller that includes MAC and MDIO bus supporting RGMII/RMII
modes. The controller is named AVE.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:10:40 -05:00
Kunihiko Hayashi
c5a9ef30af dt-bindings: net: add DT bindings for Socionext UniPhier AVE
DT bindings for the AVE ethernet controller found on Socionext's
UniPhier platforms.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Jassi Brar <jaswinder.singh@linaro.org>
Acked-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:10:39 -05:00
Ganesh Goudar
b39ab14097 cxgb4/cxgb4vf: support for XLAUI Port Type
Add support for new Backplane XLAUI port type.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:09:08 -05:00
Ganesh Goudar
b9525301b1 cxgb4: display VNI correctly
Fix incorrect VNI display in mps_tcam

Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 11:59:57 -05:00
Mark Bloch
5ed99fb421 net/mlx5e: Move ethernet representors data into separate struct
Ethernet representors have a need to store data which is applicable
only for them. Create a priv void pointer in struct mlx5_eswitch_rep
and move mlx5e to store the relevant data there. As part of this change
we also initialize rep_if in mlx5e_rep_register_vf_vports() as otherwise the
E-Switch code will copy a priv value which is garbage.

We also rename mlx5_eswitch_get_uplink_netdev() to
mlx5_eswitch_get_uplink_priv() and make it return void *.
This way E-Switch code doesn't need to deal with net devices and
we leave the task of getting it to mlx5e.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
159fe63922 net/mlx5: E-Switch, Create a dedicated send to vport rule deletion function
In order for representors to send packets directly to VFs we use an
E-Switch function which insert special rules into the HW. For symmetry
create an E-Switch function that deletes these rules as well.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
f7a68945a5 net/mlx5: E-Switch, Move mlx5e only logic outside E-Switch
In our pursuit to cleanup e-switch sub-module from mlx5e specific code,
we move the functions that insert/remove the flow steering rules that
allow mlx5e representors to send packets directly to VFs into the EN
driver code.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
4c66df01f5 net/mlx5: E-Switch, Simplify representor load/unload callback API
In the load() callback for loading representors we don't really need
struct mlx5_eswitch but struct mlx5_core_dev, pass it directly.

In the unload() callback for unloading representors we don't need the
struct mlx5_eswitch argument, remove it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
6ed1803abe net/mlx5: E-Switch, Refactor load/unload of representors
Refactor the load/unload stages for better code reuse.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
e8d31c4d65 net/mlx5: E-Switch, Refactor vport representors initialization
Refactor the init stage of vport representors registration.
vport number and hw id can be assigned by the E-Switch driver and not by
the netdevice driver. While here, make the error path of mlx5_eswitch_init()
a reverse order of the good path, also use kcalloc to allocate an array
instead of kzalloc.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
kbuild test robot
836df24a70 net: hns3: hns3_get_channels() can be static
Fixes: 482d2e9c1c ("net: hns3: add support to query tqps number")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 20:41:59 -05:00
David S. Miller
fcffe2edbd Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-28

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Fix incorrect state pruning related to recognition of zero initialized
   stack slots, where stacksafe exploration would mistakenly return a
   positive pruning verdict too early ignoring other slots, from Gianluca.

2) Various BPF to BPF calls related follow-up fixes. Fix an off-by-one
   in maximum call depth check, and rework maximum stack depth tracking
   logic to fix a bypass of the total stack size check reported by Jann.
   Also fix a bug in arm64 JIT where prog->jited_len was uninitialized.
   Addition of various test cases to BPF selftests, from Alexei.

3) Addition of a BPF selftest to test_verifier that is related to BPF to
   BPF calls which demonstrates a late caller stack size increase and
   thus out of bounds access. Fixed above in 2). Test case from Jann.

4) Addition of correlating BPF helper calls, BPF to BPF calls as well
   as BPF maps to bpftool xlated dump in order to allow for better
   BPF program introspection and debugging, from Daniel.

5) Fixing several bugs in BPF to BPF calls kallsyms handling in order
   to get it actually to work for subprogs, from Daniel.

6) Extending sparc64 JIT support for BPF to BPF calls and fix a couple
   of build errors for libbpf on sparc64, from David.

7) Allow narrower context access for BPF dev cgroup typed programs in
   order to adapt to LLVM code generation. Also adjust memlock rlimit
   in the test_dev_cgroup BPF selftest, from Yonghong.

8) Add netdevsim Kconfig entry to BPF selftests since test_offload.py
   relies on netdevsim device being available, from Jakub.

9) Reduce scope of xdp_do_generic_redirect_map() to being static,
   from Xiongwei.

10) Minor cleanups and spelling fixes in BPF verifier, from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 20:40:32 -05:00
Jakub Kicinski
4f83435ad7 nfp: bpf: allocate vNIC priv for keeping track of the offloaded program
After TC offloads were converted to callbacks we have no choice
but keep track of the offloaded filter in the driver.

Since this change came a little late in the release cycle
there were a number of conflicts and allocation of vNIC priv
structure seems to have slipped away in linux-next.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-27 20:37:38 -05:00