Commit Graph

767603 Commits

Author SHA1 Message Date
Amritha Nambiar
fc9bab24e9 net: Enable Tx queue selection based on Rx queues
This patch adds support to pick Tx queue based on the Rx queue(s) map
configuration set by the admin through the sysfs attribute
for each Tx queue. If the user configuration for receive queue(s) map
does not apply, then the Tx queue selection falls back to CPU(s) map
based selection and finally to hashing.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02 09:06:24 +09:00
Amritha Nambiar
c6345ce7d3 net: Record receive queue number for a connection
This patch adds a new field to sock_common 'skc_rx_queue_mapping'
which holds the receive queue number for the connection. The Rx queue
is marked in tcp_finish_connect() to allow a client app to do
SO_INCOMING_NAPI_ID after a connect() call to get the right queue
association for a socket. Rx queue is also marked in tcp_conn_request()
to allow syn-ack to go on the right tx-queue associated with
the queue on which syn is received.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02 09:06:24 +09:00
Amritha Nambiar
755c31cd85 net: sock: Change tx_queue_mapping in sock_common to unsigned short
Change 'skc_tx_queue_mapping' field in sock_common structure from
'int' to 'unsigned short' type with ~0 indicating unset and
other positive queue values being set. This will accommodate adding
a new 'unsigned short' field in sock_common in the next patch for
rx_queue_mapping.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02 09:06:24 +09:00
Amritha Nambiar
04157469b7 net: Use static_key for XPS maps
Use static_key for XPS maps to reduce the cost of extra map checks,
similar to how it is used for RPS and RFS. This includes static_key
'xps_needed' for XPS and another for 'xps_rxqs_needed' for XPS using
Rx queues map.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02 09:06:24 +09:00
Amritha Nambiar
80d19669ec net: Refactor XPS for CPUs and Rx queues
Refactor XPS code to support Tx queue selection based on
CPU(s) map or Rx queue(s) map.

Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-02 09:06:23 +09:00
Linus Torvalds
021c91791a Linux 4.18-rc3 2018-07-01 16:04:53 -07:00
Linus Torvalds
d3bc0e67f8 for-4.18-rc2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAls4zz0ACgkQxWXV+ddt
 WDs0ZhAAplEAcN1BP986BS7GpjrG20vQtP9AnHlnSEJJJmsnykpspBylOcLRkKjF
 LKBfBPCKqIo7kn5ebKT1Kk7zJPkOOEfmxGW7hffVN/oa/oMtmgJbbHDUgl2TDgdu
 rky1O+Bj+S37s5rhiXAJ4oU9ekdpWIlN30GczfynjiPqGigKh/cINsEEhQIIAiJG
 PRDQfSIJeh67x1AP0KE8sJAYSsaeFxT+kHrT/NPs1NFDSzrQSa/QWPFVjGVVuI/Y
 w84Mo0EqdRV7tap7D3QyWyYea6zdP00PG8TyLl0Kba+LckFbzpNN5hP3SUxleBzL
 0ZBJi7/tOqnrMV3YaGm40dLfgD4B+CFt8zDyg2JvWUxxEzfQfYif7KIT2IV8fSqS
 QrVw2NrzQC7EZ4Zu98wCN7dyyOE8yhqbq805YdG3Nj+zT6DqRu01TBo4Yr/Ek8ux
 +ITAtQVbaOZmTIt/qh/Oxc5jRsurAno1FP3XRH+1hfSlS7xc3LfI1CUbX3jAKzXN
 edxdM4/h+d4nekvROnKBH4EheS6+ZVfgzYlYUW9c2rjcJ1RHhDElbh14+IoM6LKJ
 nJ+Cp+744F6W5jaG3oWElJrdhlY31mWUjiZaj2CHl16EcH3MToytrxKMX+OWo95W
 gChnKicrtpO6+9nbED3Tdhp7SkbDysun6jvEpSgdlm8+2H5Kwrw=
 =KWL7
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "We have a few regression fixes for qgroup rescan status tracking and
  the vm_fault_t conversion that mixed up the error values"

* tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix mount failure when qgroup rescan is in progress
  Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion
  btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
2018-07-01 12:38:16 -07:00
Linus Torvalds
4a770e638f Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
 "Followup to procfs-seq_file series this window"

This fixes a memory leak by making sure that proc seq files release any
private data on close.  The 'proc_seq_open' has to be properly paired
with 'proc_seq_release' that releases the extra private data.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  proc: add proc_seq_release
2018-07-01 12:32:19 -07:00
Linus Torvalds
d7563ca5bf Staging/IIO fixes for 4.18-rc3
Here are a few small staging and IIO driver fixes for 4.18-rc3.
 
 Nothing major or big, all just fixes for reported problems since
 4.18-rc1.  All of these have been in linux-next this week with no
 reported problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWziQ7w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynwZACgjpcPVTEiA0W02s4B/GhmVt1NMeUAnjgeLDzY
 yRz6SX18lSmjHqCUXAk1
 =/2QW
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO fixes from Greg KH:
 "Here are a few small staging and IIO driver fixes for 4.18-rc3.

  Nothing major or big, all just fixes for reported problems since
  4.18-rc1. All of these have been in linux-next this week with no
  reported problems"

* tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: android: ion: Return an ERR_PTR in ion_map_kernel
  staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write()
  iio: imu: inv_mpu6050: Fix probe() failure on older ACPI based machines
  iio: buffer: fix the function signature to match implementation
  iio: mma8452: Fix ignoring MMA8452_INT_DRDY
  iio: tsl2x7x/tsl2772: avoid potential division by zero
  iio: pressure: bmp280: fix relative humidity unit
2018-07-01 12:20:20 -07:00
Linus Torvalds
652788a90d TTY/Serial fixes for 4.18-rc3
Here are 5 fixes for the tty core and some serial drivers.
 
 The tty core one fix some security and other issues reported by the
 syzbot that I have taken too long in responding to (sorry Tetsuo!).  The
 8350 serial driver fix resolves an issue of devices that used to work
 properly stopping working as they shouldn't have been added to a
 blacklist.
 
 All of these have been in linux-next for a few days with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWziSHw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ym5LACg0xI1fC7LAKrLSLKglU/H4Wsv6b0AoNIkfbWi
 wxAZZKscwFVKNpv6gN9n
 =YgAj
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are five fixes for the tty core and some serial drivers.

  The tty core ones fix some security and other issues reported by the
  syzbot that I have taken too long in responding to (sorry Tetsuo!).

  The 8350 serial driver fix resolves an issue of devices that used to
  work properly stopping working as they shouldn't have been added to a
  blacklist.

  All of these have been in linux-next for a few days with no reported
  issues"

* tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  vt: prevent leaking uninitialized data to userspace via /dev/vcs*
  serdev: fix memleak on module unload
  serial: 8250_pci: Remove stalled entries in blacklist
  n_tty: Access echo_* variables carefully.
  n_tty: Fix stall at n_tty_receive_char_special().
2018-07-01 12:05:53 -07:00
Linus Torvalds
c2aee376cf USB fixes for 4.18-rc3
Here is a number of USB gadget and other driver fixes for 4.18-rc3.
 
 There's a bunch of them here, most of them being gadget driver and xhci
 host controller fixes for reported issues (as normal), but there are
 also some new device ids, and some fixes for the typec code.
 
 There is an acpi core patch in here that was acked by the acpi
 maintainer as it is needed for the typec fixes in order to properly
 solve a problem in that driver.
 
 All of these have been in linux-next this week with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWziS+Q8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yn8QQCfU8PnLo+uPoV+DU7Nm1486mHTP4YAoLU3Q0nE
 oRmnzA3TnxytluWgbC7M
 =H8MG
 -----END PGP SIGNATURE-----

Merge tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here is a number of USB gadget and other driver fixes for 4.18-rc3.

  There's a bunch of them here, most of them being gadget driver and
  xhci host controller fixes for reported issues (as normal), but there
  are also some new device ids, and some fixes for the typec code.

  There is an acpi core patch in here that was acked by the acpi
  maintainer as it is needed for the typec fixes in order to properly
  solve a problem in that driver.

  All of these have been in linux-next this week with no reported
  issues"

* tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
  usb: chipidea: host: fix disconnection detect issue
  usb: typec: tcpm: fix logbuffer index is wrong if _tcpm_log is re-entered
  typec: tcpm: Fix a msecs vs jiffies bug
  NFC: pn533: Fix wrong GFP flag usage
  usb: cdc_acm: Add quirk for Uniden UBC125 scanner
  staging/typec: fix tcpci_rt1711h build errors
  usb: typec: ucsi: Fix for incorrect status data issue
  usb: typec: ucsi: acpi: Workaround for cache mode issue
  acpi: Add helper for deactivating memory region
  usb: xhci: increase CRS timeout value
  usb: xhci: tegra: fix runtime PM error handling
  usb: xhci: remove the code build warning
  xhci: Fix kernel oops in trace_xhci_free_virt_device
  xhci: Fix perceived dead host due to runtime suspend race with event handler
  dwc2: gadget: Fix ISOC IN DDMA PID bitfield value calculation
  usb: gadget: dwc2: fix memory leak in gadget_init()
  usb: gadget: composite: fix delayed_status race condition when set_interface
  usb: dwc2: fix isoc split in transfer with no data
  usb: dwc2: alloc dma aligned buffer for isoc split in
  usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub
  ...
2018-07-01 11:50:16 -07:00
Linus Torvalds
c350d6d1d7 Add a missing export required by riscv and unicore
-----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAls4b+ILHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNrVw//SiXXoTDekxaQmyExQcEhac4ZzTjERoDFaE9zo14D
 x7iwdOXFQN9pRJwtUX70QDxQxc8iJT1j6CXMRUJZL9ZKdEOVbqfX2k7P8W4TJkhU
 id2PeZTxmreWOKL8EJB1mEjckZxP/qQLKBp9qwNET6rPf18WmSDb6wQuUcaLgDhl
 n4UJrzXuiCaEQNor30BvGDM4jQ4IOFh2JVjWqwPubcgPgOoopZ0TWSPeAxRo38MH
 2hDTSMFEV/QqM/slDhHxJacwFyMc2mWxyHg66/YGzTmX/ZciABq2eMiTwhSMumSp
 wDP/4/9TvaM6QGPkWblTFr18sRECBaNo59e5Lz0g/KtJCXwIeAiBqCKaIct6CYAf
 hHdSaxvYzPgIOs91uSgGbhtlweg3TdjSRJIbB8k1xkub7uAnT0hpeHnqPy4DUfjy
 DB25YTt0Qabnmfyz37UpbCQN0PPYY4TLvZc9N81WXiKW2BOIogvNeiCNbS06rvXa
 wxn/haocbMSbh3k41jxbkrFBKiALiz1QaZRgH0omdOvXSN//WgI8qAt6F8begDc8
 dBu5aTu50oDv6kLoe4aONTH72Jrt/q6xBI9dN5cD73luBm1p+QyoYhekHUPNMef7
 7ek/lBJLo1Xq+bEuqgnaPZRC7mWYGTvnEe+r0uYW3cOw+PGx/VNfPc6B43J/YaCS
 Hv4=
 =IUWC
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping

Pull dma mapping fixlet from Christoph Hellwig:
 "Add a missing export required by riscv and unicore"

* tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: export swiotlb_dma_ops
2018-07-01 10:45:13 -07:00
Ilpo Järvinen
1236f22fba tcp: prevent bogus FRTO undos with non-SACK flows
If SACK is not enabled and the first cumulative ACK after the RTO
retransmission covers more than the retransmitted skb, a spurious
FRTO undo will trigger (assuming FRTO is enabled for that RTO).
The reason is that any non-retransmitted segment acknowledged will
set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is
no indication that it would have been delivered for real (the
scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK
case so the check for that bit won't help like it does with SACK).
Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo
in tcp_process_loss.

We need to use more strict condition for non-SACK case and check
that none of the cumulatively ACKed segments were retransmitted
to prove that progress is due to original transmissions. Only then
keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in
non-SACK case.

(FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS
to better indicate its purpose but to keep this change minimal, it
will be done in another patch).

Besides burstiness and congestion control violations, this problem
can result in RTO loop: When the loss recovery is prematurely
undoed, only new data will be transmitted (if available) and
the next retransmission can occur only after a new RTO which in case
of multiple losses (that are not for consecutive packets) requires
one RTO per loss to recover.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-01 19:23:13 +09:00
Stafford Horne
ae15a41a64 openrisc: entry: Fix delay slot exception detection
Originally in patch e6d20c55a4 ("openrisc: entry: Fix delay slot
detection") I fixed delay slot detection, but only for QEMU.  We missed
that hardware delay slot detection using delay slot exception flag (DSX)
was still broken.  This was because QEMU set the DSX flag in both
pre-exception supervision register (ESR) and supervision register (SR)
register, but on real hardware the DSX flag is only set on the SR
register during exceptions.

Fix this by carrying the DSX flag into the SR register during exception.
We also update the DSX flag read locations to read the value from the SR
register not the pt_regs SR register which represents ESR.  The ESR
should never have the DSX flag set.

In the process I updated/removed a few comments to match the current
state.  Including removing a comment saying that the DSX detection logic
was inefficient and needed to be rewritten.

I have tested this on QEMU with a patch ensuring it matches the hardware
specification.

Link: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg00000.html
Fixes: e6d20c55a4 ("openrisc: entry: Fix delay slot detection")
Signed-off-by: Stafford Horne <shorne@gmail.com>
2018-07-01 16:48:24 +09:00
David S. Miller
271b955e52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-07-01

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) A bpf_fib_lookup() helper fix to change the API before freeze to
   return an encoding of the FIB lookup result and return the nexthop
   device index in the params struct (instead of device index as return
   code that we had before), from David.

2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
   reject progs when set_memory_*() fails since it could still be RO.
   Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
   an issue, and a memory leak in s390 JIT found during review, from
   Daniel.

3) Multiple fixes for sockmap/hash to address most of the syzkaller
   triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
   a missing sock_map_release() routine to hook up to callbacks, and a
   fix for an omitted bucket lock in sock_close(), from John.

4) Two bpftool fixes to remove duplicated error message on program load,
   and another one to close the libbpf object after program load. One
   additional fix for nfp driver's BPF offload to avoid stopping offload
   completely if replace of program failed, from Jakub.

5) Couple of BPF selftest fixes that bail out in some of the test
   scripts if the user does not have the right privileges, from Jeffrin.

6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
   where we need to set the flag that some of the test cases are expected
   to fail, from Kleber.

7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
   since it has no relation to it and lirc2 users often have configs
   without cgroups enabled and thus would not be able to use it, from Sean.

8) Fix a selftest failure in sockmap by removing a useless setrlimit()
   call that would set a too low limit where at the same time we are
   already including bpf_rlimit.h that does the job, from Yonghong.

9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-01 09:27:44 +09:00
Daniel Borkmann
bf2b866a2f Merge branch 'bpf-sockmap-fixes'
John Fastabend says:

====================
This addresses two syzbot issues that lead to identifying (by Eric and
Wei) a class of bugs where we don't correctly check for IPv4/v6
sockets and their associated state. The second issue was a locking
omission in sockhash.

The first patch addresses IPv6 socks and fixing an error where
sockhash would overwrite the prot pointer with IPv4 prot. To fix
this build similar solution to TLS ULP. Although we continue to
allow socks in all states not just ESTABLISH in this patch set
because as Martin points out there should be no issue with this
on the sockmap ULP because we don't use the ctx in this code. Once
multiple ULPs coexist we may need to revisit this. However we
can do this in *next trees.

The other issue syzbot found that the tcp_close() handler missed
locking the hash bucket lock which could result in corrupting the
sockhash bucket list if delete and close ran at the same time.
And also the smap_list_remove() routine was not working correctly
at all. This was not caught in my testing because in general my
tests (to date at least lets add some more robust selftest in
bpf-next) do things in the "expected" order, create map, add socks,
delete socks, then tear down maps. The tests we have that do the
ops out of this order where only working on single maps not multi-
maps so we never saw the issue. Thanks syzbot. The fix is to
restructure the tcp_close() lock handling. And fix the obvious
bug in smap_list_remove().

Finally, during review I noticed the release handler was omitted
from the upstream code (patch 4) due to an incorrect merge conflict
fix when I ported the code to latest bpf-next before submitting.
This would leave references to the map around if the user never
closes the map.

v3: rework patches, dropping ESTABLISH check and adding rcu
    annotation along with the smap_list_remove fix

v4: missed one more case where maps was being accessed without
    the sk_callback_lock, spoted by Martin as well.

v5: changed to use a specific lock for maps and reduced callback
    lock so that it is only used to gaurd sk callbacks. I think
    this makes the logic a bit cleaner and avoids confusion
    ovoer what each lock is doing.

Also big thanks to Martin for thorough review he caught at least
one case where I missed a rcu_call().
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:33 +02:00
John Fastabend
caac76a517 bpf: sockhash, add release routine
Add map_release_uref pointer to hashmap ops. This was dropped when
original sockhash code was ported into bpf-next before initial
commit.

Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:32 +02:00
John Fastabend
e9db4ef6bf bpf: sockhash fix omitted bucket lock in sock_close
First the sk_callback_lock() was being used to protect both the
sock callback hooks and the psock->maps list. This got overly
convoluted after the addition of sockhash (in sockmap it made
some sense because masp and callbacks were tightly coupled) so
lets split out a specific lock for maps and only use the callback
lock for its intended purpose. This fixes a couple cases where
we missed using maps lock when it was in fact needed. Also this
makes it easier to follow the code because now we can put the
locking closer to the actual code its serializing.

Next, in sock_hash_delete_elem() the pattern was as follows,

  sock_hash_delete_elem()
     [...]
     spin_lock(bucket_lock)
     l = lookup_elem_raw()
     if (l)
        hlist_del_rcu()
        write_lock(sk_callback_lock)
         .... destroy psock ...
        write_unlock(sk_callback_lock)
     spin_unlock(bucket_lock)

The ordering is necessary because we only know the {p}sock after
dereferencing the hash table which we can't do unless we have the
bucket lock held. Once we have the bucket lock and the psock element
it is deleted from the hashmap to ensure any other path doing a lookup
will fail. Finally, the refcnt is decremented and if zero the psock
is destroyed.

In parallel with the above (or free'ing the map) a tcp close event
may trigger tcp_close(). Which at the moment omits the bucket lock
altogether (oops!) where the flow looks like this,

  bpf_tcp_close()
     [...]
     write_lock(sk_callback_lock)
     for each psock->maps // list of maps this sock is part of
         hlist_del_rcu(ref_hash_node);
         .... destroy psock ...
     write_unlock(sk_callback_lock)

Obviously, and demonstrated by syzbot, this is broken because
we can have multiple threads deleting entries via hlist_del_rcu().

To fix this we might be tempted to wrap the hlist operation in a
bucket lock but that would create a lock inversion problem. In
summary to follow locking rules the psocks maps list needs the
sk_callback_lock (after this patch maps_lock) but we need the bucket
lock to do the hlist_del_rcu.

To resolve the lock inversion problem pop the head of the maps list
repeatedly and remove the reference until no more are left. If a
delete happens in parallel from the BPF API that is OK as well because
it will do a similar action, lookup the lock in the map/hash, delete
it from the map/hash, and dec the refcnt. We check for this case
before doing a destroy on the psock to ensure we don't have two
threads tearing down a psock. The new logic is as follows,

  bpf_tcp_close()
  e = psock_map_pop(psock->maps) // done with map lock
  bucket_lock() // lock hash list bucket
  l = lookup_elem_raw(head, hash, key, key_size);
  if (l) {
     //only get here if elmnt was not already removed
     hlist_del_rcu()
     ... destroy psock...
  }
  bucket_unlock()

And finally for all the above to work add missing locking around  map
operations per above. Then add RCU annotations and use
rcu_dereference/rcu_assign_pointer to manage values relying on RCU so
that the object is not free'd from sock_hash_free() while it is being
referenced in bpf_tcp_close().

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:32 +02:00
John Fastabend
54fedb42c6 bpf: sockmap, fix smap_list_map_remove when psock is in many maps
If a hashmap is free'd with open socks it removes the reference to
the hash entry from the psock. If that is the last reference to the
psock then it will also be free'd by the reference counting logic.
However the current logic that removes the hash reference from the
list of references is broken. In smap_list_remove() we first check
if the sockmap entry matches and then check if the hashmap entry
matches. But, the sockmap entry sill always match because its NULL in
this case which causes the first entry to be removed from the list.
If this is always the "right" entry (because the user adds/removes
entries in order) then everything is OK but otherwise a subsequent
bpf_tcp_close() may reference a free'd object.

To fix this create two list handlers one for sockmap and one for
sockhash.

Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com
Fixes: 8111038444 ("bpf: sockmap, add hash map support")
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:31 +02:00
John Fastabend
9901c5d77e bpf: sockmap, fix crash when ipv6 sock is added
This fixes a crash where we assign tcp_prot to IPv6 sockets instead
of tcpv6_prot.

Previously we overwrote the sk->prot field with tcp_prot even in the
AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot
are used.

Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as
'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously
crashing case here.

Fixes: 174a79ff95 ("bpf: sockmap with sk redirect support")
Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:21:31 +02:00
Daniel Borkmann
0b9e3d543f Merge branch 'bpf-bpftool-libbpf-improvements'
Jakub Kicinski says:

====================
Set of random updates to bpftool and libbpf.  I'm preparing for
extending bpftool prog load, but there is a good number of
improvements that can be made before bpf -> bpf-next merge
helping to keep the later patch set to a manageable size as well.

First patch is a bpftool build speed improvement.  Next missing
program types are added to libbpf program type detection by section
name.  The ability to load programs from '.text' section is restored
when ELF file doesn't contain any pseudo calls.

In bpftool I remove my Author comments as unnecessary sign of vanity.
Last but not least missing option is added to bash completions and
processing of options in bash completions is improved.
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:52 +02:00
Jakub Kicinski
121c58bed0 tools: bpftool: deal with options upfront
Remove options (in getopt() sense, i.e. starting with a dash like
-n or --NAME) while parsing arguments for bash completions.  This
allows us to refer to position-dependent parameters better, and
complete options at any point.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
ef347a340b tools: bpftool: add missing --bpffs to completions
--bpffs is not suggested by bash completions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
71e07ddcdc tools: bpftool: drop unnecessary Author comments
Drop my author comments, those are from the early days of
bpftool and make little sense in tree, where we have quite
a few people contributing and git to attribute the work.

While at it bump some copyrights.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
eac7d84519 tools: libbpf: don't return '.text' as a program for multi-function programs
Make bpf_program__next() skip over '.text' section if object file
has pseudo calls.  The '.text' section is hardly a program in that
case, it's more of a storage for code of functions other than main.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
9a94f277c4 tools: libbpf: restore the ability to load programs from .text section
libbpf used to be able to load programs from the default section
called '.text'.  It's not very common to leave sections unnamed,
but if it happens libbpf will fail to load the programs reporting
-EINVAL from the kernel.  The -EINVAL comes from bpf_obj_name_cpy()
because since 48cca7e44f ("libbpf: add support for bpf_call")
libbpf does not resolve program names for programs in '.text',
defaulting to '.text'.  '.text', however, does not pass the
(isalnum(*src) || *src == '_') check in bpf_obj_name_cpy().

With few extra lines of code we can limit the pseudo call
assumptions only to objects which actually contain code relocations.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
9aba36139a tools: libbpf: allow setting ifindex for programs and maps
Users of bpf_object__open()/bpf_object__load() APIs may want to
load the programs and maps onto a device for offload.  Allow
setting ifindex on those sub-objects.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
d9b683d746 tools: libbpf: add section names for missing program types
Specify default section names for BPF_PROG_TYPE_LIRC_MODE2
and BPF_PROG_TYPE_LWT_SEG6LOCAL, these are the only two
missing right now.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Jakub Kicinski
c256429fbd tools: bpftool: use correct make variable type to improve compilation time
Commit 4bfe3bd3cc ("tools/bpftool: use version from the kernel
source tree") added version to bpftool.  The version used is
equal to the kernel version and obtained by running make kernelversion
against kernel source tree.  Version is then communicated
to the sources with a command line define set in CFLAGS.

Use a simply expanded variable for the version, otherwise the
recursive make will run every time CFLAGS are used.

This brings the single-job compilation time for me from almost
16 sec down to less than 4 sec.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-01 01:01:50 +02:00
Linus Torvalds
883c9ab9eb Merge branch 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes and cleanups from Helge Deller:
 "Nothing exiting in this patchset, just

   - small cleanups of header files

   - default to 4 CPUs when building a SMP kernel

   - mark 16kB and 64kB page sizes broken

   - addition of the new io_pgetevents syscall"

* 'parisc-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Build kernel without -ffunction-sections
  parisc: Reduce debug output in unwind code
  parisc: Wire up io_pgetevents syscall
  parisc: Default to 4 SMP CPUs
  parisc: Convert printk(KERN_LEVEL) to pr_lvl()
  parisc: Mark 16kB and 64kB page sizes BROKEN
  parisc: Drop struct sigaction from not exported header file
2018-06-30 14:16:30 -07:00
Linus Torvalds
08af78d7a5 ARM: SoC fixes for 4.18-rc
A smaller batch for the end of the week (let's see if I can keep the
 weekly cadence going for once).
 
 All medium-grade fixes here, nothing worrisome:
  - Fixes for some fairly old bugs around SD card write-protect detection
    and GPIO interrupt assignments on Davinci.
  - Wifi module suspend fix for Hikey.
  - Minor DT tweaks to fix inaccuracies for Amlogic platforms, on of
    which solves booting with third-party u-boot.
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAls32nkPHG9sb2ZAbGl4
 b20ubmV0AAoJEIwa5zzehBx381gP/ihYGEiM1iSp1+WJaR3YaVHIt4VbnZV76A/T
 oCeX/9X11o1tundMbyX5iBY30SlHA+GrGEEETQyGDJ+an2hBxfJVJzG+u0AFVtkr
 orf0v5UbUJZqxsU3cnzB508wuIgpdouZ60cqXT0HJOfC6NV9oL5yV1ZWKguWWuWL
 KgRqavz/9QyTaUiphwdhG+n1Ey+EVH1uPUqRxh3Md8jMKscMWcd36D2OsMmu3AbZ
 O73KRoIr4SgXwnk6V2q/xoAHyshURhnVDHmEuyO1fJh9b7OZMEJiMcFmr8RC6SLr
 /ooc0nAtJyCdyJl2h9+XGONLB+pxDVL9O9dWU21YrCdGMPAjBY1e9Ppeus+u+Zzt
 H1bk2bDTZe5Oybx1M5xCgMtc7Snar+F1kUySFS7JXwEWHUwbEVpiSz9s0IRnpRgD
 yQJn3ybxMHHFpJba3VFZeg7+cmNMq5n+XilZDmTp+mCcdRlnX+3HMt2tgf9WZJgq
 MwkVNdHykHzs7Uw0IaLFDfdvUbMnjn/4iHoBdfWpQPjoDBpXcSmo6rhpi1WUbKnW
 LF4zTywaaCifwfuvb4p2K6ByRg2zUwrqrlYtx6og5D0ARhI6Izqv6YEjoY/d5+nl
 NeC/whEFFG0O5lFH32Oy8XuhPwLLOTW5wXd0vYlFWTy9YuO5GZ3nlqb73v4cPvsC
 +34hp/7x
 =55JJ
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "A smaller batch for the end of the week (let's see if I can keep the
  weekly cadence going for once).

  All medium-grade fixes here, nothing worrisome:

   - Fixes for some fairly old bugs around SD card write-protect
     detection and GPIO interrupt assignments on Davinci.

   - Wifi module suspend fix for Hikey.

   - Minor DT tweaks to fix inaccuracies for Amlogic platforms, one
     of which solves booting with third-party u-boot"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: dts: hikey960: Define wl1837 power capabilities
  arm64: dts: hikey: Define wl1835 power capabilities
  ARM64: dts: meson-gxl: fix Mali GPU compatible string
  ARM64: dts: meson-axg: fix ethernet stability issue
  ARM64: dts: meson-gx: fix ATF reserved memory region
  ARM64: dts: meson-gxl-s905x-p212: Add phy-supply for usb0
  ARM64: dts: meson: fix register ranges for SD/eMMC
  ARM64: dts: meson: disable sd-uhs modes on the libretech-cc
  ARM: dts: da850: Fix interrups property for gpio
  ARM: davinci: board-da850-evm: fix WP pin polarity for MMC/SD
2018-06-30 14:08:06 -07:00
Linus Torvalds
22d3e0c36e Kbuild fixes for v4.18
- introduce __diag_* macros and suppress -Wattribute-alias warnings from GCC 8
 
 - fix stack protector test script for x86_64
 
 - fix line number handling in Kconfig
 
 - document that '#' starts a comment in Kconfig
 
 - handle P_SYMBOL property in dump debugging of Kconfig
 
 - correct help message of LD_DEAD_CODE_DATA_ELIMINATION
 
 - fix occasional segmentation faults in Kconfig
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJbN450AAoJED2LAQed4NsGFdAP/0fc2NhkzQMvz1EBEc2n93LC
 FUXew75tsX2ZewssoLzb4Iepkb/mHU+fjhBaE65S+Xu2/6mNfId9a7HAtywvFyO2
 ZUQPXHjMHnLEPRKuzQy34uCy9/wWCiqi8rpWUsOEohmNIcLaF0vMZf5Ifod7wIr7
 pnix3b9Q+dY+l49TSsSv4MX7F9qs5fXRhEarcQ3jYEb3yRUEXgmli3hV1wRita/n
 tJhFDiIdJDeISDkgmHUuOhjFnv5Yf3WJTXi/ILZ2zvpGjjqNDAwxtyzGnPMShQEc
 fxk3/1nkg9h/ScVAaGavrYYmiiH8XsqWY2q6p52jTK3kD+yTXaVakPSmxw8UHImh
 aNWQutzMF8GYEsb+ld1ncsNrwfgd40mA25mEyb/ZPSw2IdNBrXtIVbw7XiBLi8eH
 recAlRN0MouzD7+sXafgtoKopqanQbB/rMqDO4ULfnVvZLWDmZVbfreCc+qrJtiJ
 mqydBMUVxrvB+qf5SHQ7WlDmXWHY1xQuxXzS0gRVGT14EsyD6yhC2D62pEHnB7uG
 zE1pGemOCzOlGY6nDAbtQVR1n5AAWEZYveZXUuFn+vuqR7ZtYxCFUFOS0u621zFI
 HMI9B81ifdNV2efT2VTVi6Tnnvn44sAXOYjaULX6566EyX0/mOL5CWZlTqn5SKOn
 PwNxc7ZeCylTbkZww2c4
 =oABr
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - introduce __diag_* macros and suppress -Wattribute-alias warnings
   from GCC 8

 - fix stack protector test script for x86_64

 - fix line number handling in Kconfig

 - document that '#' starts a comment in Kconfig

 - handle P_SYMBOL property in dump debugging of Kconfig

 - correct help message of LD_DEAD_CODE_DATA_ELIMINATION

 - fix occasional segmentation faults in Kconfig

* tag 'kbuild-fixes-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: loop boundary condition fix
  kbuild: reword help of LD_DEAD_CODE_DATA_ELIMINATION
  kconfig: handle P_SYMBOL in print_symbol()
  kconfig: document Kconfig source file comments
  kconfig: fix line numbers for if-entries in menu tree
  stack-protector: Fix test with 32-bit userland and CONFIG_64BIT=y
  powerpc: Remove -Wattribute-alias pragmas
  disable -Wattribute-alias warning for SYSCALL_DEFINEx()
  kbuild: add macro for controlling warnings to linux/compiler.h
2018-06-30 13:05:30 -07:00
Linus Torvalds
0fbc4aeabc Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "The biggest diffstat comes from self-test updates, plus there's entry
  code fixes, 5-level paging related fixes, console debug output fixes,
  and misc fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Clean up the printk()s in show_fault_oops()
  x86/mm: Drop unneeded __always_inline for p4d page table helpers
  x86/efi: Fix efi_call_phys_epilog() with CONFIG_X86_5LEVEL=y
  selftests/x86/sigreturn: Do minor cleanups
  selftests/x86/sigreturn/64: Fix spurious failures on AMD CPUs
  x86/entry/64/compat: Fix "x86/entry/64/compat: Preserve r8-r11 in int $0x80"
  x86/mm: Don't free P4D table when it is folded at runtime
  x86/entry/32: Add explicit 'l' instruction suffix
  x86/mm: Get rid of KERN_CONT in show_fault_oops()
2018-06-30 11:42:14 -07:00
Linus Torvalds
d7d5388679 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Tooling fixes mostly, plus a build warning fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  perf/core: Move inline keyword at the beginning of declaration
  tools/headers: Pick up latest kernel ABIs
  perf tools: Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE]
  perf script: Fix crash because of missing evsel->priv
  perf script: Add missing output fields in a hint
  perf bench: Fix numa report output code
  perf stat: Remove duplicate event counting
  perf alias: Rebuild alias expression string to make it comparable
  perf alias: Remove trailing newline when reading sysfs files
  perf tools: Fix a clang 7.0 compilation error
  tools include uapi: Synchronize bpf.h with the kernel
  tools include uapi: Update if_link.h to pick IFLA_{BRPORT_ISOLATED,VXLAN_TTL_INHERIT}
  tools include powerpc: Update arch/powerpc/include/uapi/asm/unistd.h copy to get 'rseq' syscall
  perf tools: Update x86's syscall_64.tbl, adding 'io_pgetevents' and 'rseq'
  tools headers uapi: Synchronize drm/drm.h
  perf intel-pt: Fix packet decoding of CYC packets
  perf tests: Add valid callback for parse-events test
  perf tests: Add event parsing error handling to parse events test
  perf report powerpc: Fix crash if callchain is empty
  perf test session topology: Fix test on s390
  ...
2018-06-30 11:26:25 -07:00
Linus Torvalds
34a484d58c selinux/stable-4.18 PR 20180629
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEcQCq365ubpQNLgrWVeRaWujKfIoFAls2gk0UHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQVeRaWujKfIqEvRAAgjtXjU7cN9Vj7GZpwgSjJKyXSruR
 RDM19CRDKIo27UoxaqD92hFnmDreoysdzLi9cunDxshsUbGdHeyXvaOkY2apqpkn
 msPIKO3pmoiq0Umze1/smgl6g0ruPxc+ZSslfuHLUjephogfuDGKgfTeLN60z6up
 KaiVPoaSni+DZSsiFOkzUHwQFfhTRqONBx/tfPo2H1K0bP2dy9YuOCdvpZEqpc7P
 8tx/uF3pYrUBuq9ufEgWt2VC0fU1uZJEI21BQqKcrXLYlmVr73pWYLXnV1NWck0A
 rY5DABxbDf1sXCAIhYRJJDiM51uPltmFfntGF3sS1OKYOgyxIxf51xwgl9dXsGOA
 jOFFwUuXeHJpWICTm1PQCdpA/mzLVgrPzt8ULPE6zYnP+LbBId6RSrPz6Irhswal
 /wiq1mlhdAdeBf2r/tY3VXUy8dMNLjfeHLD3scx07hfFuswXCPTC6xwlyiKPsgKD
 1hGCQazZZTNAcMVI4A00SpPZVGx1yU/Yu/+8vMKyV5BmT55TPcQElcQy+ZZxuX1a
 711B9U7P6w7k8UwL+hVSncgwCLI1vb4MKnmrGZRH9wXiUdaKcgivaDDoeoLTtqBF
 dcIP84OtigAHbJHxWXoTmuRX9KGmtgF5sUBgFv2kg8R8EgnmVKaWJxyyttRg4awo
 RZXmTh2eC3p3IGw=
 =fzCd
 -----END PGP SIGNATURE-----

Merge tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fix from Paul Moore:
 "One fairly straightforward patch to fix a longstanding issue where a
  process could stall while accessing files in selinuxfs and block
  everyone else due to a held mutex.

  The patch passes all our tests and looks to apply cleanly to your
  current tree"

* tag 'selinux-pr-20180629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: move user accesses in selinuxfs out of locked regions
2018-06-30 11:15:12 -07:00
Linus Torvalds
e6e5bec43c for-linus-20180629
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAls2oVcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpr89D/9PaJCtLpCEU1HdaohIDXDyJKBdtCllIsqJ
 YAJTlUkFRkMvsbdEA3U43rVRVlu7zxP3aRg79VRI+kdZ8y8XSiisAF/QUbG/Bwd3
 NuFqKj4Hm5xFTtLMKIUlumR2/exD4ZpPHKnWmD4idp1PZRoTUwxeIjqVv77L6Nfy
 c4/GVlz2t9buWxH/5PSa//rsT49xM6CLpyIwO2lFsNEk2HYE/uAWya5KZVJ1YKVw
 mSjUdyFtJ//lFK9lVU4YkdNF0d5w2PcEoCfNyPe0n9F43iITACo2om7rkSkTgciS
 izPAs1DkqNdWPta2twULt656WlUoGlwnWyFchU34N/I/pDiww4kdCQFfNIOi6qBW
 7rPQ4xpywiw/U93C2GLEeE09++T8+yO4KuELhIcN5+D3D2VGRIuaGcf4xeY0CX3A
 a335EZIxBGOfxyejknBDg3BsoUcLvejbANKD8ltis3zciGf/QyLt+FFqx25WCzkN
 N028ppml8WPSiqhSlFDMyfscO1dx/WSYh1q7ANrBiPPaEjFYmakzEDT0cZW4JsUh
 av4jqYt3cqrFIXyhRHJM2wjFymQv6aVnmLa4zOKCFbwm9KzMWUUFjc4R962T/sQK
 0g+eCSFGidii4JZ4ghOAQk2dqCTIZqV72pmLlpJBGu+SAIQxkh+29h0LOi5U3v1Y
 FnTtJ1JhCg==
 =0uqV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20180629' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Small set of fixes for this series. Mostly just minor fixes, the only
  oddball in here is the sg change.

  The sg change came out of the stall fix for NVMe, where we added a
  mempool and limited us to a single page allocation. CONFIG_SG_DEBUG
  sort-of ruins that, since we'd need to account for that. That's
  actually a generic problem, since lots of drivers need to allocate SG
  lists. So this just removes support for CONFIG_SG_DEBUG, which I added
  back in 2007 and to my knowledge it was never useful.

  Anyway, outside of that, this pull contains:

   - clone of request with special payload fix (Bart)

   - drbd discard handling fix (Bart)

   - SATA blk-mq stall fix (me)

   - chunk size fix (Keith)

   - double free nvme rdma fix (Sagi)"

* tag 'for-linus-20180629' of git://git.kernel.dk/linux-block:
  sg: remove ->sg_magic member
  drbd: Fix drbd_request_prepare() discard handling
  blk-mq: don't queue more if we get a busy return
  block: Fix cloning of requests with a special payload
  nvme-rdma: fix possible double free of controller async event buffer
  block: Fix transfer when chunk sectors exceeds max
2018-06-30 10:47:46 -07:00
Roopa Prabhu
35e8c7ba08 net: fib_rules: bring back rule_exists to match rule during add
After commit f9d4b0c1e9 ("fib_rules: move common handling of newrule
delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find
for existing rule lookup in both the add and del paths. While this
is good for the delete path, it solves a few problems but opens up
a few invalid key matches in the add path.

$ip -4 rule add table main tos 10 fwmark 1
$ip -4 rule add table main tos 10
RTNETLINK answers: File exists

The problem here is rule_find does not check if the key masks in
the new and old rule are the same and hence ends up matching a more
secific rule. Rule key masks cannot be easily compared today without
an elaborate if-else block. Its best to introduce key masks for easier
and accurate rule comparison in the future. Until then, due to fear of
regressions this patch re-introduces older loose rule_exists during add.
Also fixes both rule_exists and rule_find to cover missing attributes.

Fixes: f9d4b0c1e9 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:11:13 +09:00
David S. Miller
1a84d7fdb5 Merge branch 'mlxsw-Add-resource-scale-tests'
Petr Machata says:

====================
mlxsw: Add resource scale tests

There are a number of tests that check features of the Linux networking
stack. By running them on suitable interfaces, one can exercise the
mlxsw offloading code. However none of these tests attempts to push
mlxsw to the limits supported by the ASIC.

As an additional wrinkle, the "limits supported by the ASIC" themselves
may not be a set of fixed numbers, but rather depend on a profile that
determines how the ASIC resources are allocated for different purposes.

This patchset introduces several tests that verify capability of mlxsw
to offload amounts of routes, flower rules, and mirroring sessions that
match predicted ASIC capacity, at different configuration profiles.
Additionally they verify that amounts exceeding the predicted capacity
can *not* be offloaded.

These are not generic tests, but ones that are tailored for mlxsw
specifically. For that reason they are not added to net/forwarding
selftests subdirectory, but rather to a newly-added drivers/net/mlxsw.

Patches #1, #2 and #3 tweak the generic forwarding/lib.sh to support the
new additions.

In patches #4 and #5, new libraries for interfacing with devlink are
introduced, first a generic one, then a Spectrum-specific one.

In patch #6, a devlink resource test is introduced.

Patches #7 and #8, #9 and #10, and #11 and #12 introduce three scale
tests: router, flower and mirror-to-gretap. The first of each pair of
patches introduces a generic portion of the test (mlxsw-specific), the
second introduces a Spectrum-specific wrapper.

Patch #13 then introduces a scale test driver that runs (possibly a
subset of) the tests introduced by patches from previous paragraph.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:16 +09:00
Yuval Mintz
1b6130df62 selftests: mlxsw: Add scale test for resources
Add a scale test capable of validating that offloaded network
functionality is indeed functional at scale when configured to
the different KVD profiles available.

Start by testing offloaded routes are functional at scale by
passing traffic on each one of them in turn.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:16 +09:00
Petr Machata
9136074d56 selftests: mlxsw: Add target for mirror-to-gretap test on spectrum
Add a wrapper around mlxsw/mirror_gre_scale.sh that parameterized number
of offloadable mirrors on Spectrum machines.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:16 +09:00
Petr Machata
b973b78aae selftests: mlxsw: Add scale test for mirror-to-gretap
Test that it's possible to offload a given number of mirrors.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:16 +09:00
Petr Machata
741a7661f0 selftests: mlxsw: Add target for tc flower test on spectrum
Add a wrapper around mlxsw/tc_flower_scale.sh that parameterizes the
generic tc flower scale test template with Spectrum-specific target
values.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
d67a94e81f selftests: mlxsw: Add tc flower scale test
Add test of capacity to offload flower.

This is a generic portion of the test that is meant to be called from a
driver that supplies a particular number of rules to be tested with.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Yuval Mintz
c51a744a28 selftests: mlxsw: Add target for router test on spectrum
IPv4 routes in Spectrum are based on the kvd single-hash, but as it's
a hash we need to assume we cannot reach 100% of its capacity.

Add a wrapper that provides us with good/bad target numbers for the
Spectrum ASIC.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
[petrm@mellanox.com: Drop shebang.]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Arkadi Sharshevsky
d98307c52b selftests: mlxsw: Add router test
This test aims for both stand alone and internal usage by the resource
infra. The test receives the number routes to offload and checks:
- The routes were offloaded correctly
- Traffic for each route.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Yuval Mintz
b030c33811 selftests: mlxsw: Add devlink KVD resource test
Add a selftest that can be used to perform basic sanity of the devlink
resource API as well as test the behavior of KVD manipulation in the
driver.

This is the first case of a HW-only test - in order to test the devlink
resource a driver capable of exposing resources has to be provided
first.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
[petrm@mellanox.com: Extracted two patches out of this patch. Tweaked
commit message.]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
5aeba3e89b selftests: mlxsw: Add devlink_lib_spectrum.sh
This library builds on top of devlink_lib.sh and contains functionality
specific to Spectrum ASICs, e.g., re-partitioning the various KVD
sub-parts.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
[petrm@mellanox.com: Split this out from another patch. Fix line length
in devlink_sp_read_kvd_defaults().]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
bc7cbb1e9f selftests: forwarding: Add devlink_lib.sh
This helper library contains wrappers to devlink functionality agnostic
to the underlying device.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
[petrm@mellanox.com: Split this out from another patch.]
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
68d9cea594 selftests: forwarding: lib: Parameterize NUM_NETIFS in two functions
setup_wait() and tc_offload_check() both assume that all NUM_NETIFS
interfaces are relevant for a given test. However, the scale test script
acts as an umbrella for a number of sub-tests, some of which may not
require all the interfaces.

Thus it's suboptimal for tc_offload_check() to query all the interfaces.
In case of setup_wait() it's incorrect, because the sub-test in question
of course doesn't configure any interfaces beyond what it needs, and
setup_wait() then ends up waiting indefinitely for the extraneous
interfaces to come up.

For that reason, give setup_wait() and tc_offload_check() an optional
parameter with a number of interfaces to probe. Fall back to global
NUM_NETIFS if the parameter is not given.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00
Petr Machata
96fa91d281 selftests: forwarding: lib: Add check_err_fail()
In the scale testing scenarios, one usually has a condition that is
expected to either fail, or pass, depending on which side of the scale
is being tested.

To capture this logic, add a function check_err_fail(), which dispatches
either to check_err() or check_fail(), depending on the value of the
first argument, should_fail.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 22:06:15 +09:00