Commit Graph

704772 Commits

Author SHA1 Message Date
David S. Miller
ef70f9a22d Merge branch 'bpf-sockmap'
John Fastabend says:

====================
BPF: sockmap and sk redirect support

This series implements a sockmap and socket redirect helper for BPF
using a model similar to XDP netdev redirect. A sockmap is a BPF map
type that holds references to sock structs. Then with a new sk
redirect bpf helper BPF programs can use the map to redirect skbs
between sockets,

      bpf_sk_redirect_map(map, key, flags)

Finally, we need a call site to attach our BPF logic to do socket
redirects. We added hooks to recv_sock using the existing strparser
infrastructure to do this. The call site is added via the BPF attach
map call. To enable users to use this infrastructure a new BPF program
BPF_PROG_TYPE_SK_SKB is created that allows users to reference sock
details, such as port and ip address fields, to build useful socket
layer program. The sockmap datapath is as follows,

     recv -> strparser -> verdict/action

where this series implements the drop and redirect actions.
Additional, actions can be added as needed.

A sample program is provided to illustrate how a sockmap can
be integrated with cgroups and used to add/delete sockets in
a sockmap. The program is simple but should show many of the
key ideas.

To test this work test_maps in selftests/bpf was leveraged.
We added a set of tests to add sockets and do send/recv ops
on the sockets to ensure correct behavior. Additionally, the
selftests tests a series of negative test cases. We can expand
on this in the future.

I also have a basic test program I use with iperf/netperf
clients that could be sent as an additional sample if folks
want this. It needs a bit of cleanup to send to the list and
wasn't included in this series.

For people who prefer git over pulling patches out of their mail
editor I've posted the code here,

https://github.com/jrfastab/linux-kernel-xdp/tree/sockmap

For some background information on the genesis of this work
it might be helpful to review these slides from netconf 2017
by Thomas Graf,

http://vger.kernel.org/netconf2017.html
https://docs.google.com/a/covalent.io/presentation/d/1dwSKSBGpUHD3WO5xxzZWj8awV_-xL-oYhvqQMOBhhtk/edit?usp=sharing

Thanks to Daniel Borkmann for reviewing and providing initial
feedback.
====================

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
6f6d33f3b3 bpf: selftests add sockmap tests
This generates a set of sockets, attaches BPF programs, and sends some
simple traffic using basic send/recv pattern. Additionally, we do a bunch
of negative tests to ensure adding/removing socks out of the sockmap fail
correctly.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
41bc94f535 bpf: selftests: add tests for new __sk_buff members
This adds tests to access new __sk_buff members from sk skb program
type.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
69e8cc134b bpf: sockmap sample program
This program binds a program to a cgroup and then matches hard
coded IP addresses and adds these to a sockmap.

This will receive messages from the backend and send them to
the client.

     client:X <---> frontend:10000 client:X <---> backend:10001

To keep things simple this is only designed for 1:1 connections
using hard coded values. A more complete example would allow many
backends and clients.

To run,

 # sockmap <cgroup2_dir>

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
8a31db5615 bpf: add access to sock fields and pkt data from sk_skb programs
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
174a79ff95 bpf: sockmap with sk redirect support
Recently we added a new map type called dev map used to forward XDP
packets between ports (6093ec2dc3). This patches introduces a
similar notion for sockets.

A sockmap allows users to add participating sockets to a map. When
sockets are added to the map enough context is stored with the
map entry to use the entry with a new helper

  bpf_sk_redirect_map(map, key, flags)

This helper (analogous to bpf_redirect_map in XDP) is given the map
and an entry in the map. When called from a sockmap program, discussed
below, the skb will be sent on the socket using skb_send_sock().

With the above we need a bpf program to call the helper from that will
then implement the send logic. The initial site implemented in this
series is the recv_sock hook. For this to work we implemented a map
attach command to add attributes to a map. In sockmap we add two
programs a parse program and a verdict program. The parse program
uses strparser to build messages and pass them to the verdict program.
The parse programs use the normal strparser semantics. The verdict
program is of type SK_SKB.

The verdict program returns a verdict SK_DROP, or  SK_REDIRECT for
now. Additional actions may be added later. When SK_REDIRECT is
returned, expected when bpf program uses bpf_sk_redirect_map(), the
sockmap logic will consult per cpu variables set by the helper routine
and pull the sock entry out of the sock map. This pattern follows the
existing redirect logic in cls and xdp programs.

This gives the flow,

 recv_sock -> str_parser (parse_prog) -> verdict_prog -> skb_send_sock
                                                     \
                                                      -> kfree_skb

As an example use case a message based load balancer may use specific
logic in the verdict program to select the sock to send on.

Sample programs are provided in future patches that hopefully illustrate
the user interfaces. Also selftests are in follow-on patches.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
a6f6df69c4 bpf: export bpf_prog_inc_not_zero
bpf_prog_inc_not_zero will be used by upcoming sockmap patches this
patch simply exports it so we can pull it in.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
b005fd189c bpf: introduce new program type for skbs on sockets
A class of programs, run from strparser and soon from a new map type
called sock map, are used with skb as the context but on established
sockets. By creating a specific program type for these we can use
bpf helpers that expect full sockets and get the verifier to ensure
these helpers are not used out of context.

The new type is BPF_PROG_TYPE_SK_SKB. This patch introduces the
infrastructure and type.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
John Fastabend
db5980d804 net: fixes for skb_send_sock
A couple fixes to new skb_send_sock infrastructure. However, no users
currently exist for this code (adding user in next handful of patches)
so it should not be possible to trigger a panic with existing in-kernel
code.

Fixes: 306b13eb3c ("proto_ops: Add locked held versions of sendmsg and sendpage")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:52 -07:00
John Fastabend
45f91bdcd5 net: add sendmsg_locked and sendpage_locked to af_inet6
To complete the sendmsg_locked and sendpage_locked implementation add
the hooks for af_inet6 as well.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:52 -07:00
John Fastabend
f26de110f4 net: early init support for strparser
It is useful to allow strparser to init sockets before the read_sock
callback has been established.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:52 -07:00
Noralf Trønnes
d956e1293b drm/gem-cma-helper: Remove drm_gem_cma_dumb_map_offset()
There are no more users of drm_gem_cma_dumb_map_offset(), so remove it.

Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-20-git-send-email-noralf@tronnes.org
2017-08-16 20:21:24 +02:00
Noralf Trønnes
dec844bcea drm/virtio: Use the drm_driver.dumb_destroy default
virtio_gpu_mode_dumb_destroy() is the same as drm_gem_dumb_destroy()
which is the drm_driver.dumb_destroy default, so no need to set it.

Cc: David Airlie <airlied@linux.ie>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-19-git-send-email-noralf@tronnes.org
2017-08-16 20:20:00 +02:00
Noralf Trønnes
9b5b5ca5ab drm/bochs: Use the drm_driver.dumb_destroy default
drm_gem_dumb_destroy() is the drm_driver.dumb_destroy default,
so no need to set it.

Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-17-git-send-email-noralf@tronnes.org
2017-08-16 20:18:55 +02:00
Noralf Trønnes
8d4acc1893 drm/mgag200: Use the drm_driver.dumb_destroy default
drm_gem_dumb_destroy() is the drm_driver.dumb_destroy default,
so no need to set it.

Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-15-git-send-email-noralf@tronnes.org
2017-08-16 20:18:22 +02:00
Noralf Trønnes
4d12c2335d drm/exynos: Use .dumb_map_offset and .dumb_destroy defaults
This driver can use the drm_driver.dumb_destroy and
drm_driver.dumb_map_offset defaults, so no need to set them.
Use drm_gem_dumb_map_offset() in exynos_drm_gem_map_ioctl() and
remove exynos_drm_gem_dumb_map_offset().

Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-14-git-send-email-noralf@tronnes.org
2017-08-16 20:17:50 +02:00
Noralf Trønnes
99da7cd668 drm/msm: Use the drm_driver.dumb_destroy default
drm_gem_dumb_destroy() is the drm_driver.dumb_destroy default,
so no need to set it.

Cc: Rob Clark <robdclark@gmail.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-13-git-send-email-noralf@tronnes.org
2017-08-16 20:17:12 +02:00
Noralf Trønnes
29374feb4f drm/ast: Use the drm_driver.dumb_destroy default
drm_gem_dumb_destroy() is the drm_driver.dumb_destroy default,
so no need to set it.

Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-11-git-send-email-noralf@tronnes.org
2017-08-16 20:16:10 +02:00
Noralf Trønnes
3164e4e31a drm/qxl: Use the drm_driver.dumb_destroy default
drm_gem_dumb_destroy() is the drm_driver.dumb_destroy default,
so no need to set it.

Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-10-git-send-email-noralf@tronnes.org
2017-08-16 20:15:38 +02:00
Noralf Trønnes
e966f5df9c drm/udl: Use the drm_driver.dumb_destroy default
drm_gem_dumb_destroy() is the drm_driver.dumb_destroy default,
so no need to set it.

Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-9-git-send-email-noralf@tronnes.org
2017-08-16 20:14:58 +02:00
Noralf Trønnes
4411971592 drm/cirrus: Use the drm_driver.dumb_destroy default
drm_gem_dumb_destroy() is the drm_driver.dumb_destroy default,
so no need to set it.

Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-8-git-send-email-noralf@tronnes.org
2017-08-16 20:14:22 +02:00
Noralf Trønnes
bcf877181e drm/tegra: Use .dumb_map_offset and .dumb_destroy defaults
This driver can use the drm_driver.dumb_destroy and
drm_driver.dumb_map_offset defaults, so no need to set them.

Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-7-git-send-email-noralf@tronnes.org
2017-08-16 20:13:48 +02:00
Noralf Trønnes
aacc0b7d76 drm/gma500: Use .dumb_map_offset and .dumb_destroy defaults
This driver can use the drm_driver.dumb_destroy and
drm_driver.dumb_map_offset defaults, so no need to set them.

Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-6-git-send-email-noralf@tronnes.org
2017-08-16 20:13:11 +02:00
Noralf Trønnes
3c856a7202 drm/mxsfb: Use .dumb_map_offset and .dumb_destroy defaults
This driver can use the drm_driver.dumb_destroy and
drm_driver.dumb_map_offset defaults, so no need to set them.

Cc: Marek Vasut <marex@denx.de>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-4-git-send-email-noralf@tronnes.org
2017-08-16 20:12:19 +02:00
Colin Ian King
47f078339b Revert "staging: fsl-mc: be consistent when checking strcmp() return"
The previous fix removed the equal to zero comparisons by the strcmps and
now the function always returns true. Revert this change to restore the
original correctly functioning code.

Detected by CoverityScan, CID#1452267 ("Constant expression result")

This reverts commit b93ad9a067.

Fixes: b93ad9a067 ("staging: fsl-mc: be consistent when checking strcmp() return")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-16 11:12:06 -07:00
Noralf Trønnes
1d53275968 drm/meson: Use .dumb_map_offset and .dumb_destroy defaults
This driver can use the drm_driver.dumb_destroy and
drm_driver.dumb_map_offset defaults, so no need to set them.

Cc: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-3-git-send-email-noralf@tronnes.org
2017-08-16 20:11:43 +02:00
Noralf Trønnes
a3c92b9e4c drm/kirin: Use .dumb_map_offset and .dumb_destroy defaults
This driver can use the drm_driver.dumb_destroy and
drm_driver.dumb_map_offset defaults, so no need to set them.

Cc: Xinliang Liu <z.liuxinliang@hisilicon.com>
Cc: Rongrong Zou <zourongrong@gmail.com>
Cc: Xinwei Kong <kong.kongxinwei@hisilicon.com>
Cc: Chen Feng <puck.chen@hisilicon.com>
Signed-off-by: Noralf Trønnes <noralf@tronnes.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1502034068-51384-2-git-send-email-noralf@tronnes.org
2017-08-16 20:11:06 +02:00
Arvind Yadav
d369bcaf7d net: 3c509: constify pnp_device_id
pnp_device_id are not supposed to change at runtime. All functions
working with pnp_device_id provided by <linux/pnp.h> work with
const pnp_device_id. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:10:12 -07:00
David Ahern
c7b725be84 net: igmp: Use ingress interface rather than vrf device
Anuradha reported that statically added groups for interfaces enslaved
to a VRF device were not persisting. The problem is that igmp queries
and reports need to use the data in the in_dev for the real ingress
device rather than the VRF device. Update igmp_rcv accordingly.

Fixes: e58e415968 ("net: Enable support for VRF with ipv4 multicast")
Reported-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:08:55 -07:00
Steven Rostedt
9c8783201c sched/completion: Document that reinit_completion() must be called after complete_all()
The complete_all() function modifies the completion's "done" variable to
UINT_MAX, and no other caller (wait_for_completion(), etc) will modify
it back to zero. That means that any call to complete_all() must have a
reinit_completion() before that completion can be used again.

Document this fact by the complete_all() function.

Also document that completion_done() will always return true if
complete_all() is called.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170816131202.195c2f4b@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-16 20:08:10 +02:00
Veerasenareddy Burru
251564f601 liquidio: update VF's netdev->max_mtu if there's a change in PF's MTU
A VF's MTU is capped at the parent PF's MTU.  So if there's a change in the
PF's MTU, then update the VF's netdev->max_mtu.

Also remove duplicate log messages for MTU change.

Signed-off-by: Veerasenareddy Burru <veerasenareddy.burru@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:07:55 -07:00
David S. Miller
5022111ecb Merge branch 'net-sizeof-cleanups'
Stephen Hemminger says:

====================
net: various sizeof cleanups

Noticed some places that were using sizeof as an operator.
This is legal C but is not the convention used in the kernel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:01:58 -07:00
stephen hemminger
31975e27a4 mlx4: sizeof style usage
The kernel coding style is to treat sizeof as a function
(ie. with parenthesis) not as an operator.

Also use kcalloc and kmalloc_array

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:01:57 -07:00
stephen hemminger
9d2ee98daf skge: add paren around sizeof arg
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:01:57 -07:00
stephen hemminger
a4a765031d virtio: put paren around sizeof
Kernel coding style is to put paren around operand of sizeof.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:01:57 -07:00
stephen hemminger
120390468b tun/tap: use paren's with sizeof
Although sizeof is an operator in C. The kernel coding style convention
is to always use it like a function and add parenthesis.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:01:57 -07:00
David S. Miller
79db795833 sparc64: Don't clibber fixed registers in __multi4.
%g4 and %g5 are fixed registers used by the kernel for the thread
pointer and the per-cpu offset.  Use %o4 and %g7 instead.

Diagnosis by Anthony Yznaga.

Fixes: 1b4af13ff2 ("sparc64: Add __multi3 for gcc 7.x and later.")
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 10:59:54 -07:00
Konstantin Khlebnikov
6b0355f4a9 net_sched/hfsc: opencode trivial set_active() and set_passive()
Any move comment abount update_vf() into right place.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 10:55:34 -07:00
Konstantin Khlebnikov
959466588a net_sched: call qlen_notify only if child qdisc is empty
This callback is used for deactivating class in parent qdisc.
This is cheaper to test queue length right here.

Also this allows to catch draining screwed backlog and prevent
second deactivation of already inactive parent class which will
crash kernel for sure. Kernel with print warning at destruction
of child qdisc where no packets but backlog is not zero.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 10:55:34 -07:00
Maor Gottlieb
870201f95f IB/uverbs: Fix NULL pointer dereference during device removal
As part of ib_uverbs_remove_one which might be triggered upon
reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace
application.
If device was removed after uverbs fd was opened but before
ib_uverbs_get_context was called, the event file will be accessed
before it was allocated, result in NULL pointer dereference:

[ 72.325873] BUG: unable to handle kernel NULL pointer dereference at (null)
...
[ 72.325984] IP: _raw_spin_lock_irqsave+0x22/0x40
[ 72.327123] Call Trace:
[ 72.327168] ib_uverbs_async_handler.isra.8+0x2e/0x160 [ib_uverbs]
[ 72.327216] ? synchronize_srcu_expedited+0x27/0x30
[ 72.327269] ib_uverbs_remove_one+0x120/0x2c0 [ib_uverbs]
[ 72.327330] ib_unregister_device+0xd0/0x180 [ib_core]
[ 72.327373] mlx5_ib_remove+0x74/0x140 [mlx5_ib]
[ 72.327422] mlx5_remove_device+0xfb/0x110 [mlx5_core]
[ 72.327466] mlx5_unregister_interface+0x3c/0xa0 [mlx5_core]
[ 72.327509] mlx5_ib_cleanup+0x10/0x962 [mlx5_ib]
[ 72.327546] SyS_delete_module+0x155/0x230
[ 72.328472] ? exit_to_usermode_loop+0x70/0xa6
[ 72.329370] do_syscall_64+0x54/0xc0
[ 72.330262] entry_SYSCALL64_slow_path+0x25/0x25

Fix it by checking that user context was allocated before
trigger the event.

Fixes: 036b106357 ('IB/uverbs: Enable device removal when there are active user space applications')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-08-16 12:53:15 -04:00
Paul Burton
293962d678 PCI: xilinx: Allow build on MIPS platforms
Allow the xilinx-pcie driver to be built on MIPS platforms which make use
of generic PCI drivers rather than legacy MIPS-specific interfaces.  This
is used on the MIPS Boston development board.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bharat Kumar Gogada <bharatku@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Ravikiran Gummaluri <rgummal@xilinx.com>
2017-08-16 11:44:37 -05:00
Paul Burton
aac2e96bf9 PCI: xilinx: Don't enable config completion interrupts
The Xilinx AXI bridge for PCI Express device provides interrupts indicating
the completion of config space accesses. We have previously
enabled/unmasked them but do nothing with them besides acknowledge them.

Leave the interrupts masked in order to avoid servicing a large number of
pointless interrupts during boot.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bharat Kumar Gogada <bharatku@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Ravikiran Gummaluri <rgummal@xilinx.com>
2017-08-16 11:44:37 -05:00
Paul Burton
d0b5dda62e PCI: xilinx: Unify INTx & MSI interrupt decode
The INTx & MSI interrupt decode paths duplicated a fair bit of common
functionality. They also strictly handled interrupts in order of INTx then
MSI, so if both types of interrupt were to be asserted simultaneously and
the MSI interrupt were first in the FIFO then the INTx code would read it &
ignore it before the MSI code then had to read it again, wasting the
original FIFO read.

Unify the INTx & MSI decode in order to reduce that duplication & allow a
single FIFO read to be performed for each interrupt regardless of its type.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bharat Kumar Gogada <bharatku@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Ravikiran Gummaluri <rgummal@xilinx.com>
2017-08-16 11:44:37 -05:00
Paul Burton
b8550f11bd PCI: xilinx-nwl: Translate INTx range to hwirqs 0-3
The devicetree binding documentation for the Xilinx NWL PCIe root port
bridge shows an example which uses an interrupt-map property to map PCI
INTx interrupts to hardware IRQ numbers 1-4. The driver creates an IRQ
domain with size 4, which therefore covers the hwirq range 0-3.

This means that if we attempt to make use of the INTD interrupt then we're
likely to hit a WARN() in irq_domain_associate() because INTD, or hwirw=4,
is outside of the range covered by the IRQ domain.  irq_domain_associate()
will then return -EINVAL and we'll be unable to make use of INTD.

Fix this by making use of the pci_irqd_intx_xlate() helper function to
translate the 1-4 range used in the DT to a 0-3 range used within the
driver, and stop adding 1 to decoded hwirq numbers.

Whilst cleaning up INTx handling we make use of the new PCI_NUM_INTX macro
& drop the custom INTX definitions.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: "Sören Brinkmann" <soren.brinkmann@xilinx.com>
2017-08-16 11:44:37 -05:00
Paul Burton
5c125683fc PCI: xilinx: Translate INTx range to hwirqs 0-3
The pcie-xilinx driver creates an IRQ domain of size 4 for legacy PCI INTx
interrupts, which at first glance seems reasonable since there are 4
possible such interrupts. Unfortunately the driver then proceeds to use the
range 1-4 as the hwirq numbers for INTA-INTD, causing warnings & broken
interrupts when attempting to use INTD/hwirq=4 due to it being beyond the
range of the IRQ domain:

  WARNING: CPU: 0 PID: 1 at kernel/irq/irqdomain.c:365
      irq_domain_associate+0x170/0x220
  error: hwirq 0x4 is too large for dummy
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W
      4.12.0-rc5-00126-g19e1b3a10aad-dirty #427
  Stack : 0000000000000000 0000000000000004 0000000000000006 ffffffff8092c78a
          0000000000000061 ffffffff8018bf60 0000000000000000 0000000000000000
          ffffffff8088c287 ffffffff80811d18 a8000000ffc60000 ffffffff80926678
          0000000000000001 0000000000000000 ffffffff80887880 ffffffff80960000
          ffffffff80920000 ffffffff801e6744 ffffffff80887880 a8000000ffc4f8f8
          000000000000089c ffffffff8018d260 0000000000010000 ffffffff80811d18
          0000000000000000 0000000000000001 0000000000000000 0000000000000000
          0000000000000000 a8000000ffc4f840 0000000000000000 ffffffff8042cf34
          0000000000000000 0000000000000000 0000000000000000 0000000000040c00
          0000000000000000 ffffffff8010d1c8 0000000000000000 ffffffff8042cf34
          ...
  Call Trace:
  [<ffffffff8010d1c8>] show_stack+0x80/0xa0
  [<ffffffff8042cf34>] dump_stack+0xd4/0x110
  [<ffffffff8013ea98>] __warn+0xf0/0x108
  [<ffffffff8013eb14>] warn_slowpath_fmt+0x3c/0x48
  [<ffffffff80196528>] irq_domain_associate+0x170/0x220
  [<ffffffff80196bf0>] irq_create_mapping+0x88/0x118
  [<ffffffff801976a8>] irq_create_fwspec_mapping+0xb8/0x320
  [<ffffffff80197970>] irq_create_of_mapping+0x60/0x70
  [<ffffffff805d1318>] of_irq_parse_and_map_pci+0x20/0x38
  [<ffffffff8049c210>] pci_fixup_irqs+0x60/0xe0
  [<ffffffff8049cd64>] xilinx_pcie_probe+0x28c/0x478
  [<ffffffff804e8ca8>] platform_drv_probe+0x50/0xd0
  [<ffffffff804e73a4>] driver_probe_device+0x2c4/0x3a0
  [<ffffffff804e7544>] __driver_attach+0xc4/0xd0
  [<ffffffff804e5254>] bus_for_each_dev+0x64/0xa8
  [<ffffffff804e5e40>] bus_add_driver+0x1f0/0x268
  [<ffffffff804e8000>] driver_register+0x68/0x118
  [<ffffffff801001a4>] do_one_initcall+0x4c/0x178
  [<ffffffff808d3ca8>] kernel_init_freeable+0x204/0x2b0
  [<ffffffff80730b68>] kernel_init+0x10/0xf8
  [<ffffffff80106218>] ret_from_kernel_thread+0x14/0x1c

Fix this by making use of the new pci_irqd_intx_xlate() helper to translate
the INTx 1-4 range into the 0-3 range suitable for the IRQ domain of size
4, and stop adding 1 to the hwirq number decoded from the interrupt FIFO
which is already in the range 0-3.

Whilst we're here we switch to using PCI_NUM_INTX rather than the magic
number 4, making it clearer what the 4 means.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bharat Kumar Gogada <bharatku@xilinx.com>
Cc: Michal Simek <michal.simek@xilinx.com>
Cc: Ravikiran Gummaluri <rgummal@xilinx.com>
2017-08-16 11:44:36 -05:00
Shawn Lin
2ba5991f34 PCI: rockchip: Factor out rockchip_pcie_get_phys()
We plan to introduce per-lane PHYs, so factor out rockchip_pcie_get_phys()
to make it easier in the future.  No functional change intended.

Tested-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Acked-by: Kishon Vijay Abraham I <kishon@ti.com>
2017-08-16 11:43:59 -05:00
Shawn Lin
b6502e0dcf PCI: rockchip: Control optional 12v power supply
Get vpcie12v from DT and control it if available.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-16 11:43:59 -05:00
Shawn Lin
828bdcfbdb dt-bindings: PCI: rockchip: Add vpcie12v-supply for Rockchip PCIe controller
The PCIe connector provide a optional 12V power supply for high power
downstream components, so we add this as an optional one if we need to
control it.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rob Herring <robh@kernel.org>
2017-08-16 11:43:59 -05:00
Shawn Lin
54f910abe1 PCI: keystone-dw: Remove unused ks_pcie, pci variables
The ks_pcie and pci variables in ks_dw_pcie_msi_irq_mask() and
ks_dw_pcie_msi_irq_unmask() are never used.  Remove them.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-16 11:43:16 -05:00
Paul Burton
341d3299c0 PCI: faraday: Use PCI_NUM_INTX
Use the PCI_NUM_INTX macro to indicate the number of PCI INTx interrupts
rather than the magic number 4. This makes it clearer where the number
comes from & what it relates to.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-08-16 11:42:28 -05:00