Commit Graph

10907 Commits

Author SHA1 Message Date
Linus Torvalds
fe0142df64 Merge tag 'xfs-4.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pul xfs updates from Dave Chinner:
 "There's not a huge amount of change in this cycle - Darrick has been
  out of action for a couple of months (hence me sending the last few
  pull requests), so we decided a quiet cycle mainly focussed on bug
  fixes was a good idea. Darrick will take the helm again at the end of
  this merge window.

  FYI, I may be sending another update later in the cycle - there's a
  pending rework of the clone/dedupe_file_range code that fixes numerous
  bugs that is spread amongst the VFS, XFS and ocfs2 code. It has been
  reviewed and tested, Al and I just need to work out the details of the
  merge, so it may come from him rather than me.

  Summary:

   - only support filesystems with unwritten extents

   - add definition for statfs XFS magic number

   - remove unused parameters around reflink code

   - more debug for dangling delalloc extents

   - cancel COW extents on extent swap targets

   - fix quota stats output and clean up the code

   - refactor some of the attribute code in preparation for parent
     pointers

   - fix several buffer handling bugs"

* tag 'xfs-4.20-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (21 commits)
  xfs: cancel COW blocks before swapext
  xfs: clear ail delwri queued bufs on unmount of shutdown fs
  xfs: use offsetof() in place of offset macros for __xfsstats
  xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat
  xfs: fix use-after-free race in xfs_buf_rele
  xfs: Add attibute remove and helper functions
  xfs: Add attibute set and helper functions
  xfs: Add helper function xfs_attr_try_sf_addname
  xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
  xfs: issue log message on user force shutdown
  xfs: fix buffer state management in xrep_findroot_block
  xfs: always assign buffer verifiers when one is provided
  xfs: xrep_findroot_block should reject root blocks with siblings
  xfs: add a define for statfs magic to uapi
  xfs: print dangling delalloc extents
  xfs: fix fork selection in xfs_find_trim_cow_extent
  xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared
  xfs: remove the unused shared argument to xfs_reflink_reserve_cow
  xfs: handle zeroing in xfs_file_iomap_begin_delay
  xfs: remove suport for filesystems without unwritten extent flag
  ...
2018-10-24 17:36:12 +01:00
Linus Torvalds
638820d8da Merge branch 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "In this patchset, there are a couple of minor updates, as well as some
  reworking of the LSM initialization code from Kees Cook (these prepare
  the way for ordered stackable LSMs, but are a valuable cleanup on
  their own)"

* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  LSM: Don't ignore initialization failures
  LSM: Provide init debugging infrastructure
  LSM: Record LSM name in struct lsm_info
  LSM: Convert security_initcall() into DEFINE_LSM()
  vmlinux.lds.h: Move LSM_TABLE into INIT_DATA
  LSM: Convert from initcall to struct lsm_info
  LSM: Remove initcall tracing
  LSM: Rename .security_initcall section to .lsm_info
  vmlinux.lds.h: Avoid copy/paste of security_init section
  LSM: Correctly announce start of LSM initialization
  security: fix LSM description location
  keys: Fix the use of the C++ keyword "private" in uapi/linux/keyctl.h
  seccomp: remove unnecessary unlikely()
  security: tomoyo: Fix obsolete function
  security/capabilities: remove check for -EINVAL
2018-10-24 11:49:35 +01:00
Linus Torvalds
ba9f6f8954 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo updates from Eric Biederman:
 "I have been slowly sorting out siginfo and this is the culmination of
  that work.

  The primary result is in several ways the signal infrastructure has
  been made less error prone. The code has been updated so that manually
  specifying SEND_SIG_FORCED is never necessary. The conversion to the
  new siginfo sending functions is now complete, which makes it
  difficult to send a signal without filling in the proper siginfo
  fields.

  At the tail end of the patchset comes the optimization of decreasing
  the size of struct siginfo in the kernel from 128 bytes to about 48
  bytes on 64bit. The fundamental observation that enables this is by
  definition none of the known ways to use struct siginfo uses the extra
  bytes.

  This comes at the cost of a small user space observable difference.
  For the rare case of siginfo being injected into the kernel only what
  can be copied into kernel_siginfo is delivered to the destination, the
  rest of the bytes are set to 0. For cases where the signal and the
  si_code are known this is safe, because we know those bytes are not
  used. For cases where the signal and si_code combination is unknown
  the bits that won't fit into struct kernel_siginfo are tested to
  verify they are zero, and the send fails if they are not.

  I made an extensive search through userspace code and I could not find
  anything that would break because of the above change. If it turns out
  I did break something it will take just the revert of a single change
  to restore kernel_siginfo to the same size as userspace siginfo.

  Testing did reveal dependencies on preferring the signo passed to
  sigqueueinfo over si->signo, so bit the bullet and added the
  complexity necessary to handle that case.

  Testing also revealed bad things can happen if a negative signal
  number is passed into the system calls. Something no sane application
  will do but something a malicious program or a fuzzer might do. So I
  have fixed the code that performs the bounds checks to ensure negative
  signal numbers are handled"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
  signal: Guard against negative signal numbers in copy_siginfo_from_user32
  signal: Guard against negative signal numbers in copy_siginfo_from_user
  signal: In sigqueueinfo prefer sig not si_signo
  signal: Use a smaller struct siginfo in the kernel
  signal: Distinguish between kernel_siginfo and siginfo
  signal: Introduce copy_siginfo_from_user and use it's return value
  signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
  signal: Fail sigqueueinfo if si_signo != sig
  signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
  signal/unicore32: Use force_sig_fault where appropriate
  signal/unicore32: Generate siginfo in ucs32_notify_die
  signal/unicore32: Use send_sig_fault where appropriate
  signal/arc: Use force_sig_fault where appropriate
  signal/arc: Push siginfo generation into unhandled_exception
  signal/ia64: Use force_sig_fault where appropriate
  signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
  signal/ia64: Use the generic force_sigsegv in setup_frame
  signal/arm/kvm: Use send_sig_mceerr
  signal/arm: Use send_sig_fault where appropriate
  signal/arm: Use force_sig_fault where appropriate
  ...
2018-10-24 11:22:39 +01:00
Linus Torvalds
50b825d7e8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add VF IPSEC offload support in ixgbe, from Shannon Nelson.

 2) Add zero-copy AF_XDP support to i40e, from Björn Töpel.

 3) All in-tree drivers are converted to {g,s}et_link_ksettings() so we
    can get rid of the {g,s}et_settings ethtool callbacks, from Michal
    Kubecek.

 4) Add software timestamping to veth driver, from Michael Walle.

 5) More work to make packet classifiers and actions lockless, from Vlad
    Buslov.

 6) Support sticky FDB entries in bridge, from Nikolay Aleksandrov.

 7) Add ipv6 version of IP_MULTICAST_ALL sockopt, from Andre Naujoks.

 8) Support batching of XDP buffers in vhost_net, from Jason Wang.

 9) Add flow dissector BPF hook, from Petar Penkov.

10) i40e vf --> generic iavf conversion, from Jesse Brandeburg.

11) Add NLA_REJECT netlink attribute policy type, to signal when users
    provide attributes in situations which don't make sense. From
    Johannes Berg.

12) Switch TCP and fair-queue scheduler over to earliest departure time
    model. From Eric Dumazet.

13) Improve guest receive performance by doing rx busy polling in tx
    path of vhost networking driver, from Tonghao Zhang.

14) Add per-cgroup local storage to bpf

15) Add reference tracking to BPF, from Joe Stringer. The verifier can
    now make sure that references taken to objects are properly released
    by the program.

16) Support in-place encryption in TLS, from Vakul Garg.

17) Add new taprio packet scheduler, from Vinicius Costa Gomes.

18) Lots of selftests additions, too numerous to mention one by one here
    but all of which are very much appreciated.

19) Support offloading of eBPF programs containing BPF to BPF calls in
    nfp driver, frm Quentin Monnet.

20) Move dpaa2_ptp driver out of staging, from Yangbo Lu.

21) Lots of u32 classifier cleanups and simplifications, from Al Viro.

22) Add new strict versions of netlink message parsers, and enable them
    for some situations. From David Ahern.

23) Evict neighbour entries on carrier down, also from David Ahern.

24) Support BPF sk_msg verdict programs with kTLS, from Daniel Borkmann
    and John Fastabend.

25) Add support for filtering route dumps, from David Ahern.

26) New igc Intel driver for 2.5G parts, from Sasha Neftin et al.

27) Allow vxlan enslavement to bridges in mlxsw driver, from Ido
    Schimmel.

28) Add queue and stack map types to eBPF, from Mauricio Vasquez B.

29) Add back byte-queue-limit support to r8169, with all the bug fixes
    in other areas of the driver it works now! From Florian Westphal and
    Heiner Kallweit.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2147 commits)
  tcp: add tcp_reset_xmit_timer() helper
  qed: Fix static checker warning
  Revert "be2net: remove desc field from be_eq_obj"
  Revert "net: simplify sock_poll_wait"
  net: socionext: Reset tx queue in ndo_stop
  net: socionext: Add dummy PHY register read in phy_write()
  net: socionext: Stop PHY before resetting netsec
  net: stmmac: Set OWN bit for jumbo frames
  arm64: dts: stratix10: Support Ethernet Jumbo frame
  tls: Add maintainers
  net: ethernet: ti: cpsw: unsync mcast entries while switch promisc mode
  octeontx2-af: Support for NIXLF's UCAST/PROMISC/ALLMULTI modes
  octeontx2-af: Support for setting MAC address
  octeontx2-af: Support for changing RSS algorithm
  octeontx2-af: NIX Rx flowkey configuration for RSS
  octeontx2-af: Install ucast and bcast pkt forwarding rules
  octeontx2-af: Add LMAC channel info to NIXLF_ALLOC response
  octeontx2-af: NPC MCAM and LDATA extract minimal configuration
  octeontx2-af: Enable packet length and csum validation
  octeontx2-af: Support for VTAG strip and capture
  ...
2018-10-24 06:47:44 +01:00
Lionel Landwerlin
cd956bfcd0 drm/i915/perf: add a parameter to control the size of OA buffer
The way our hardware is designed doesn't seem to let us use the
MI_RECORD_PERF_COUNT command without setting up a circular buffer.

In the case where the user didn't request OA reports to be available
through the i915 perf stream, we can set the OA buffer to the minimum
size to avoid consuming memory which won't be used by the driver.

v2: Simplify oa buffer size exponent selection (Chris)
    Reuse vma size field (Lionel)

v3: Restrict size opening parameter to values supported by HW (Chris)

v4: Drop out of date comment (Matt)
    Add debug message when buffer size is rejected (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-5-lionel.g.landwerlin@intel.com
2018-10-23 15:09:25 +01:00
Jiri Kosina
276e722761 Merge branch 'for-4.20/logitech-highres' into for-linus
High-resolution support for hid-logitech
2018-10-23 13:35:22 +02:00
Linus Torvalds
114b5f8f7e Merge tag 'gpio-v4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for the v4.20 series:

  Core changes:

   - A patch series from Hans Verkuil to make it possible to
     enable/disable IRQs on a GPIO line at runtime and drive GPIO lines
     as output without having to put/get them from scratch.

     The irqchip callbacks have been improved so that they can use only
     the fastpatch callbacks to enable/disable irqs like any normal
     irqchip, especially the gpiod_lock_as_irq() has been improved to be
     callable in fastpath context.

     A bunch of rework had to be done to achieve this but it is a big
     win since I never liked to restrict this to slowpath. The only call
     requireing slowpath was try_module_get() and this is kept at the
     .request_resources() slowpath callback. In the GPIO CEC driver this
     is a big win sine a single line is used for both outgoing and
     incoming traffic, and this needs to use IRQs for incoming traffic
     while actively driving the line for outgoing traffic.

   - Janusz Krzysztofik improved the GPIO array API to pass a "cookie"
     (struct gpio_array) and a bitmap for setting or getting multiple
     GPIO lines at once.

     This improvement orginated in a specific need to speed up an OMAP1
     driver and has led to a much better API and real performance gains
     when the state of the array can be used to bypass a lot of checks
     and code when we want things to go really fast.

     The previous code would minimize the number of calls down to the
     driver callbacks assuming the CPU speed was orders of magnitude
     faster than the I/O latency, but this assumption was wrong on
     several platforms: what we needed to do was to profile and improve
     the speed on the hot path of the array functions and this change is
     now completed.

   - Clean out the painful and hard to grasp BNF experiments from the
     device tree bindings. Future approaches are looking into using JSON
     schema for this purpose. (Rob Herring is floating a patch series.)

  New drivers:

   - The RCAR driver now supports r8a774a1 (RZ/G2M).

   - Synopsys GPIO via CREGs driver.

  Major improvements:

   - Modernization of the EP93xx driver to use irqdomain and other
     contemporary concepts.

   - The ingenic driver has been merged into the Ingenic pin control
     driver and removed from the GPIO subsystem.

   - Debounce support in the ftgpio010 driver"

* tag 'gpio-v4.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (116 commits)
  gpio: Clarify kerneldoc on gpiochip_set_chained_irqchip()
  gpio: Remove unused 'irqchip' argument to gpiochip_set_cascaded_irqchip()
  gpio: Drop parent irq assignment during cascade setup
  mmc: pwrseq_simple: Fix incorrect handling of GPIO bitmap
  gpio: fix SNPS_CREG kconfig dependency warning
  gpiolib: Initialize gdev field before is used
  gpio: fix kernel-doc after devres.c file rename
  gpio: fix doc string for devm_gpiochip_add_data() to not talk about irq_chip
  gpio: syscon: Fix possible NULL ptr usage
  gpiolib: Show correct direction from the beginning
  pinctrl: msm: Use init_valid_mask exported function
  gpiolib: Add init_valid_mask exported function
  GPIO: add single-register GPIO via CREG driver
  dt-bindings: Document the Synopsys GPIO via CREG bindings
  gpio: mockup: use device properties instead of platform_data
  gpio: Slightly more helpful debugfs
  gpio: omap: Remove set but not used variable 'dev'
  gpio: omap: drop omap_gpio_list
  Accept partial 'gpio-line-names' property.
  gpio: omap: get rid of the conditional PM runtime calls
  ...
2018-10-23 08:45:05 +01:00
Takashi Iwai
5e3cdecf78 Merge tag 'asoc-v5.0' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v5.0/v4.20

As ever there's a lot of small and driver specific changes going on
here, but we do also have some relatively large changes in the core
thanks to the hard work of Charles and Morimoto-san:

 - More component transitions from Morimoto-san, I think we're about
   finished with this.  Thanks for all the hard work!
 - Morimoto-san also added a bunch of for_each_foo macros
 - A bunch of cleanups and fixes for DAPM from Charles.
 - MCLK support for several different devices, including CS42L51, STM32
   SAI, and MAX98373.
 - Support for Allwinner A64 CODEC analog, Intel boards with DA7219 and
   MAX98927, Meson AXG PDM inputs, Nuvoton NAU8822, Renesas R8A7744 and
   TI PCM3060.
2018-10-22 23:26:37 +02:00
David S. Miller
a19c59cc10 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-10-21

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Implement two new kind of BPF maps, that is, queue and stack
   map along with new peek, push and pop operations, from Mauricio.

2) Add support for MSG_PEEK flag when redirecting into an ingress
   psock sk_msg queue, and add a new helper bpf_msg_push_data() for
   insert data into the message, from John.

3) Allow for BPF programs of type BPF_PROG_TYPE_CGROUP_SKB to use
   direct packet access for __skb_buff, from Song.

4) Use more lightweight barriers for walking perf ring buffer for
   libbpf and perf tool as well. Also, various fixes and improvements
   from verifier side, from Daniel.

5) Add per-symbol visibility for DSO in libbpf and hide by default
   global symbols such as netlink related functions, from Andrey.

6) Two improvements to nfp's BPF offload to check vNIC capabilities
   in case prog is shared with multiple vNICs and to protect against
   mis-initializing atomic counters, from Jakub.

7) Fix for bpftool to use 4 context mode for the nfp disassembler,
   also from Jakub.

8) Fix a return value comparison in test_libbpf.sh and add several
   bpftool improvements in bash completion, documentation of bpf fs
   restrictions and batch mode summary print, from Quentin.

9) Fix a file resource leak in BPF selftest's load_kallsyms()
   helper, from Peng.

10) Fix an unused variable warning in map_lookup_and_delete_elem(),
    from Alexei.

11) Fix bpf_skb_adjust_room() signature in BPF UAPI helper doc,
    from Nicolas.

12) Add missing executables to .gitignore in BPF selftests, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-21 21:11:46 -07:00
John Fastabend
6fff607e2f bpf: sk_msg program helper bpf_msg_push_data
This allows user to push data into a msg using sk_msg program types.
The format is as follows,

	bpf_msg_push_data(msg, offset, len, flags)

this will insert 'len' bytes at offset 'offset'. For example to
prepend 10 bytes at the front of the message the user can,

	bpf_msg_push_data(msg, 0, 10, 0);

This will invalidate data bounds so BPF user will have to then recheck
data bounds after calling this. After this the msg size will have been
updated and the user is free to write into the added bytes. We allow
any offset/len as long as it is within the (data, data_end) range.
However, a copy will be required if the ring is full and its possible
for the helper to fail with ENOMEM or EINVAL errors which need to be
handled by the BPF program.

This can be used similar to XDP metadata to pass data between sk_msg
layer and lower layers.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-20 21:37:11 +02:00
David S. Miller
a4efbaf622 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree:

1) Use lockdep_is_held() in ipset_dereference_protected(), from Lance Roy.

2) Remove unused variable in cttimeout, from YueHaibing.

3) Add ttl option for nft_osf, from Fernando Fernandez Mancera.

4) Use xfrm family to deal with IPv6-in-IPv4 packets from nft_xfrm,
   from Florian Westphal.

5) Simplify xt_osf_match_packet().

6) Missing ct helper alias definition in snmp_trap helper, from Taehee Yoo.

7) Remove unnecessary parameter in nf_flow_table_cleanup(), from Taehee Yoo.

8) Remove unused variable definitions in nft_{dup,fwd}, from Weongyo Jeong.

9) Remove empty net/netfilter/nfnetlink_log.h file, from Taehee Yoo.

10) Revert xt_quota updates remain option due to problems in the listing
    path for 32-bit arches, from Maze.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-20 12:32:44 -07:00
Mauricio Vasquez B
bd513cd08f bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall
The previous patch implemented a bpf queue/stack maps that
provided the peek/pop/push functions.  There is not a direct
relationship between those functions and the current maps
syscalls, hence a new MAP_LOOKUP_AND_DELETE_ELEM syscall is added,
this is mapped to the pop operation in the queue/stack maps
and it is still to implement in other kind of maps.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19 13:24:31 -07:00
Mauricio Vasquez B
f1a2e44a3a bpf: add queue and stack maps
Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs.
These maps support peek, pop and push operations that are exposed to eBPF
programs through the new bpf_map[peek/pop/push] helpers.  Those operations
are exposed to userspace applications through the already existing
syscalls in the following way:

BPF_MAP_LOOKUP_ELEM            -> peek
BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop
BPF_MAP_UPDATE_ELEM            -> push

Queue/stack maps are implemented using a buffer, tail and head indexes,
hence BPF_F_NO_PREALLOC is not supported.

As opposite to other maps, queue and stack do not use RCU for protecting
maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE
argument that is a pointer to a memory zone where to save the value of a
map.  Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not
be passed as an extra argument.

Our main motivation for implementing queue/stack maps was to keep track
of a pool of elements, like network ports in a SNAT, however we forsee
other use cases, like for exampling saving last N kernel events in a map
and then analysing from userspace.

Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19 13:24:31 -07:00
David S. Miller
2e2d6f0342 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/sched/cls_api.c has overlapping changes to a call to
nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
to the 5th argument, and another (from 'net-next') added cb->extack
instead of NULL to the 6th argument.

net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
code which moved (to mr_table_dump)) in 'net-next'.  Thanks to David
Ahern for the heads up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19 11:03:06 -07:00
Paolo Bonzini
e42b4a507e Merge tag 'kvmarm-for-v4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 4.20

- Improved guest IPA space support (32 to 52 bits)
- RAS event delivery for 32bit
- PMU fixes
- Guest entry hardening
- Various cleanups
2018-10-19 15:24:24 +02:00
Pablo Neira Ayuso
af510ebd89 Revert "netfilter: xt_quota: fix the behavior of xt_quota module"
This reverts commit e9837e55b0.

When talking to Maze and Chenbo, we agreed to keep this back by now
due to problems in the ruleset listing path with 32-bit arches.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-19 14:00:34 +02:00
Chunming Zhou
48197bc564 drm: add syncobj timeline support v9
This patch is for VK_KHR_timeline_semaphore extension, semaphore is called syncobj in kernel side:
This extension introduces a new type of syncobj that has an integer payload
identifying a point in a timeline. Such timeline syncobjs support the
following operations:
   * CPU query - A host operation that allows querying the payload of the
     timeline syncobj.
   * CPU wait - A host operation that allows a blocking wait for a
     timeline syncobj to reach a specified value.
   * Device wait - A device operation that allows waiting for a
     timeline syncobj to reach a specified value.
   * Device signal - A device operation that allows advancing the
     timeline syncobj to a specified value.

v1:
Since it's a timeline, that means the front time point(PT) always is signaled before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be
signaled, otherwise keep in list. syncobj wait operation can wait on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
    a. normal syncobj signal op will create a signal PT to tail of signal pt list.
    b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.

v4:
1. fix RB tree loop with while(node=rb_first(...)). (Christian)
2. fix syncobj lifecycle. (Christian)
3. only enable_signaling when there is wait_pt. (Christian)
4. fix timeline path issues.
5. write a timeline test in libdrm

v5: (Christian)
1. semaphore is called syncobj in kernel side.
2. don't need 'timeline' characters in some function name.
3. keep syncobj cb.

v6: (Christian)
1. merge syncobj_timeline to syncobj structure.
2. simplify some check sentences.
3. some misc change.
4. fix CTS failed issue.

v7: (Christian)
1. error handling when creating signal pt.
2. remove timeline naming in func.
3. export flags in find_fence.
4. allow reset timeline.

v8:
1. use wait_event_interruptible without timeout
2. rename _TYPE_INDIVIDUAL to _TYPE_BINARY

v9:
1. rename signal_pt->base to signal_pt->fence_array to avoid misleading
2. improve kerneldoc

individual syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore*
timeline syncobj is tested by ./amdgpu_test -s 9

Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Cc: Christian Konig <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Daniel Rakos <Daniel.Rakos@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/257258/
2018-10-18 13:46:48 +02:00
Adam Borowski
dddde68b8f xfs: add a define for statfs magic to uapi
Needed by userspace programs that call fstatfs().

It'd be natural to publish XFS_SB_MAGIC in uapi, but while these two
have identical values, they have different semantic meaning: one is
an enum cookie meant for statfs, the other a signature of the
on-disk format.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:19 +11:00
Nicolas Dichtel
b55cbc8d9b bpf: fix doc of bpf_skb_adjust_room() in uapi
len_diff is signed.

Fixes: fa15601ab3 ("bpf: add documentation for eBPF helpers (33-41)")
CC: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-17 21:45:50 -07:00
David Howells
f366d322ae UAPI: ndctl: Remove use of PAGE_SIZE
The macro PAGE_SIZE isn't valid outside of the kernel, so it should not
appear in UAPI headers.

Furthermore, the actual machine page size could theoretically change from
an application's point of view if it's running in a container that gets
migrated to another machine (say 4K/ppc64 to 64K/ppc64).

Fixes: f2ba5a5bae ("libnvdimm, namespace: make min namespace size 4K")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-17 13:56:58 -07:00
David Howells
9607871f37 UAPI: ndctl: Fix g++-unsupported initialisation in headers
The following code in the linux/ndctl header file:

	static inline const char *nvdimm_bus_cmd_name(unsigned cmd)
	{
		static const char * const names[] = {
			[ND_CMD_ARS_CAP] = "ars_cap",
			[ND_CMD_ARS_START] = "ars_start",
			[ND_CMD_ARS_STATUS] = "ars_status",
			[ND_CMD_CLEAR_ERROR] = "clear_error",
			[ND_CMD_CALL] = "cmd_call",
		};

		if (cmd < ARRAY_SIZE(names) && names[cmd])
			return names[cmd];
		return "unknown";
	}

is broken in a number of ways:

 (1) ARRAY_SIZE() is not generally defined.

 (2) g++ does not support "non-trivial" array initialisers fully yet.

 (3) Every file that calls this function will acquire a copy of names[].

The same goes for nvdimm_cmd_name().

Fix all three by converting to a switch statement where each case returns a
string.  That way if cmd is a constant, the compiler can trivially reduce it
and, if not, the compiler can use a shared lookup table if it thinks that is
more efficient.

A better way would be to remove these functions and their arrays from the
header entirely.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-10-17 13:56:54 -07:00
Jim Mattson
c4f55198c7 kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD
This is a per-VM capability which can be enabled by userspace so that
the faulting linear address will be included with the information
about a pending #PF in L2, and the "new DR6 bits" will be included
with the information about a pending #DB in L2. With this capability
enabled, the L1 hypervisor can now intercept #PF before CR2 is
modified. Under VMX, the L1 hypervisor can now intercept #DB before
DR6 and DR7 are modified.

When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
generally provide an appropriate payload when injecting a #PF or #DB
exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
checkpoints, this payload is not required.

Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
region while advanced debugging of RTM transactional regions was
enabled. This is the reverse of DR6.RTM, which is cleared in this
scenario.

This capability also enables exception.pending in struct
kvm_vcpu_events, which allows userspace to distinguish between pending
and injected exceptions.

Reported-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17 19:07:44 +02:00
Yonatan Cohen
6f4bc0ea68 IB/mlx5: Allow scatter to CQE without global signaled WRs
Requester scatter to CQE is restricted to QPs configured to signal
all WRs.

This patch adds ability to enable scatter to cqe (force enable)
in the requester without sig_all, for users who do not want all WRs
signaled but rather just the ones whose data found in the CQE.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-17 11:25:41 -04:00
Vitaly Kuznetsov
57b119da35 KVM: nVMX: add KVM_CAP_HYPERV_ENLIGHTENED_VMCS capability
Enlightened VMCS is opt-in. The current version does not contain all
fields supported by nested VMX so we must not advertise the
corresponding VMX features if enlightened VMCS is enabled.

Userspace is given the enlightened VMCS version supported by KVM as
part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to
be advertised to the nested hypervisor, currently done via a cpuid
leaf for Hyper-V.

Suggested-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17 00:30:14 +02:00
Peng Hao
0804c849f1 kvm/x86 : add coalesced pio support
Coalesced pio is based on coalesced mmio and can be used for some port
like rtc port, pci-host config port and so on.

Specially in case of rtc as coalesced pio, some versions of windows guest
access rtc frequently because of rtc as system tick. guest access rtc like
this: write register index to 0x70, then write or read data from 0x71.
writing 0x70 port is just as index and do nothing else. So we can use
coalesced pio to handle this scene to reduce VM-EXIT time.

When starting and closing a virtual machine, it will access pci-host config
port frequently. So setting these port as coalesced pio can reduce startup
and shutdown time.

without my patch, get the vm-exit time of accessing rtc 0x70 and piix 0xcf8
using perf tools: (guest OS : windows 7 64bit)
IO Port Access  Samples Samples%  Time%  Min Time  Max Time  Avg time
0x70:POUT        86     30.99%    74.59%   9us      29us    10.75us (+- 3.41%)
0xcf8:POUT     1119     2.60%     2.12%   2.79us    56.83us 3.41us (+- 2.23%)

with my patch
IO Port Access  Samples Samples%  Time%   Min Time  Max Time   Avg time
0x70:POUT       106    32.02%    29.47%    0us      10us     1.57us (+- 7.38%)
0xcf8:POUT      1065    1.67%     0.28%   0.41us    65.44us   0.66us (+- 10.55%)

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17 00:30:11 +02:00
Vitaly Kuznetsov
214ff83d44 KVM: x86: hyperv: implement PV IPI send hypercalls
Using hypercall for sending IPIs is faster because this allows to specify
any number of vCPUs (even > 64 with sparse CPU set), the whole procedure
will take only one VMEXIT.

Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi
hypercall can't be 'fast' (passing parameters through registers) but
apparently this is not true, Windows always uses it as 'fast' so we need
to support that.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-17 00:29:47 +02:00
Leon Romanovsky
05d940d3a3 RDMA/nldev: Allow IB device rename through RDMA netlink
Provide an option to rename IB device name through RDMA netlink and
limit it to users with ADMIN capability only.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16 13:37:16 -04:00
Xin Long
0ac1077e3a sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead
According to rfc7496 section 4.3 or 4.4:

   sprstat_policy:  This parameter indicates for which PR-SCTP policy
      the user wants the information.  It is an error to use
      SCTP_PR_SCTP_NONE in sprstat_policy.  If SCTP_PR_SCTP_ALL is used,
      the counters provided are aggregated over all supported policies.

We change to dump pr_assoc and pr_stream all status by SCTP_PR_SCTP_ALL
instead, and return error for SCTP_PR_SCTP_NONE, as it also said "It is
an error to use SCTP_PR_SCTP_NONE in sprstat_policy. "

Fixes: 826d253d57 ("sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt")
Fixes: d229d48d18 ("sctp: add SCTP_PR_STREAM_STATUS sockopt for prsctp")
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-16 09:58:49 -07:00
Fernando Fernandez Mancera
a218dc82f0 netfilter: nft_osf: Add ttl option support
Add ttl option support to the nftables "osf" expression.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-10-16 10:01:48 +02:00
Mark Bloch
ba4a411983 RDMA/mlx5: Add support for flow tag to raw create flow
A user can provide a hint which will be attached to the packet and written
to the CQE on receive. This can be used as a way to offload operations
into the HW, for example parsing a packet which is a tunneled packet, and
if so, pass 0x1 as the hint. The software can use that hint to decapsulate
the packet and parse only the inner headers thus saving CPU cycles.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:24:45 -06:00
Justin.Lee1@Dell.com
9771b8ccdf net/ncsi: Extend NC-SI Netlink interface to allow user space to send NC-SI command
The new command (NCSI_CMD_SEND_CMD) is added to allow user space application
to send NC-SI command to the network card.
Also, add a new attribute (NCSI_ATTR_DATA) for transferring request and response.

The work flow is as below.

Request:
User space application
	-> Netlink interface (msg)
	-> new Netlink handler - ncsi_send_cmd_nl()
	-> ncsi_xmit_cmd()

Response:
Response received - ncsi_rcv_rsp()
	-> internal response handler - ncsi_rsp_handler_xxx()
	-> ncsi_rsp_handler_netlink()
	-> ncsi_send_netlink_rsp ()
	-> Netlink interface (msg)
	-> user space application

Command timeout - ncsi_request_timeout()
	-> ncsi_send_netlink_timeout ()
	-> Netlink interface (msg with zero data length)
	-> user space application

Error:
Error detected
	-> ncsi_send_netlink_err ()
	-> Netlink interface (err msg)
	-> user space application

Signed-off-by: Justin Lee <justin.lee1@dell.com>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 22:00:59 -07:00
Maciej W. Rozycki
61414f5ec9 FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter
Add support for the DEC FDDIcontroller 700 (DEFZA), Digital Equipment
Corporation's first-generation FDDI network interface adapter, made for
TURBOchannel and based on a discrete version of what eventually became
Motorola's widely used CAMEL chipset.

The CAMEL chipset is present for example in the DEC FDDIcontroller
TURBOchannel, EISA and PCI adapters (DEFTA/DEFEA/DEFPA) that we support
with the `defxx' driver, however the host bus interface logic and the
firmware API are different in the DEFZA and hence a separate driver is
required.

There isn't much to say about the driver except that it works, but there
is one peculiarity to mention.  The adapter implements two Tx/Rx queue
pairs.

Of these one pair is the usual network Tx/Rx queue pair, in this case
used by the adapter to exchange frames with the ring, via the RMC (Ring
Memory Controller) chip.  The Tx queue is handled directly by the RMC
chip and resides in onboard packet memory.  The Rx queue is maintained
via DMA in host memory by adapter's firmware copying received data
stored by the RMC in onboard packet memory.

The other pair is used to communicate SMT frames with adapter's
firmware.  Any SMT frame received from the RMC via the Rx queue must be
queued back by the driver to the SMT Rx queue for the firmware to
process.  Similarly the firmware uses the SMT Tx queue to supply the
driver with SMT frames that must be queued back to the Tx queue for the
RMC to send to the ring.

This solution was chosen because the designers ran out of PCB space and
could not squeeze in more logic onto the board that would be required to
handle this SMT frame traffic without the need to involve the driver, as
with the later DEFTA/DEFEA/DEFPA adapters.

Finally the driver does some Frame Control byte decoding, so to avoid
magic numbers some macros are added to <linux/if_fddi.h>.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-15 21:46:06 -07:00
Arnd Bergmann
664e68bcab scsi: ufs: fix integer type usage in uapi header
We get a warning from 'make headers_check' about a newly introduced usage
of integer types in the scsi/scsi_bsg_ufs.h uapi header:

usr/include/scsi/scsi_bsg_ufs.h:18: found __[us]{8,16,32,64} type without #include <linux/types.h>

Aside from the missing linux/types.h inclusion, I also noticed that it
uses the wrong types: 'u32' is not available at all in user space, and
'uint32_t' depends on the inclusion of a standard header that we should
not include from kernel headers.

Change the all to __u32 and similar types here.

I also note the usage of '__be32' and '__be16' that seems unfortunate for
a user space API. I wonder if it would be better to define the interface
in terms of a CPU-endian structure and convert it in kernel space.

Fixes: e77044c5a8 ("scsi: ufs-bsg: Add support for uic commands in ufs_bsg_request()")
Fixes: df032bf27a ("scsi: ufs: Add a bsg endpoint that supports UPIUs")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-15 22:44:30 -04:00
Eric Anholt
4fa825bf40 drm/v3d: Add some better documentation of the in_sync arguments.
Since this is UAPI, it's good to document what exactly the guarantees
we're providing are.

Signed-off-by: Eric Anholt <eric@anholt.net>
Link: https://patchwork.freedesktop.org/patch/msgid/20180928232126.4332-3-eric@anholt.net
Reviewed-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-15 13:10:50 -07:00
Dan Schatzberg
5571f1e654 fuse: enable caching of symlinks
FUSE file reads are cached in the page cache, but symlink reads are
not. This patch enables FUSE READLINK operations to be cached which
can improve performance of some FUSE workloads.

In particular, I'm working on a FUSE filesystem for access to source
code and discovered that about a 10% improvement to build times is
achieved with this patch (there are a lot of symlinks in the source
tree).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-10-15 15:43:07 +02:00
David S. Miller
d864991b22 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were easy to resolve using immediate context mostly,
except the cls_u32.c one where I simply too the entire HEAD
chunk.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12 21:38:46 -07:00
David S. Miller
e32cf9a386 Merge tag 'mac80211-next-for-davem-2018-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
Highlights:
 * merge net-next, so I can finish the hwsim workqueue removal
 * fix TXQ NULL pointer issue that was reported multiple times
 * minstrel cleanups from Felix
 * simplify lib80211 code by not using skcipher, note that this
   will conflict with the crypto tree (and this new code here
   should be used)
 * use new netlink policy validation in nl80211
 * fix up SAE (part of WPA3) in client-mode
 * FTM responder support in the stack
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12 10:56:56 -07:00
Nikolay Aleksandrov
9163a0fc1f net: bridge: add support for per-port vlan stats
This patch adds an option to have per-port vlan stats instead of the
default global stats. The option can be set only when there are no port
vlans in the bridge since we need to allocate the stats if it is set
when vlans are being added to ports (and respectively free them
when being deleted). Also bump RTNL_MAX_TYPE as the bridge is the
largest user of options. The current stats design allows us to add
these without any changes to the fast-path, it all comes down to
the per-vlan stats pointer which, if this option is enabled, will
be allocated for each port vlan instead of using the global bridge-wide
one.

CC: bridge@lists.linux-foundation.org
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-12 10:18:58 -07:00
Ankita Bajaj
0d4e14a32d nl80211: Add per peer statistics to compute FCS error rate
Add support for drivers to report the total number of MPDUs received
and the number of MPDUs received with an FCS error from a specific
peer. These counters will be incremented only when the TA of the
frame matches the MAC address of the peer irrespective of FCS
error.

It should be noted that the TA field in the frame might be corrupted
when there is an FCS error and TA matching logic would fail in such
cases. Hence, FCS error counter might not be fully accurate, but it can
provide help in detecting bad RX links in significant number of cases.
This FCS error counter without full accuracy can be used, e.g., to
trigger a kick-out of a connected client with a bad link in AP mode to
force such a client to roam to another AP.

Signed-off-by: Ankita Bajaj <bankita@codeaurora.org>
Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-10-12 12:56:34 +02:00
Gerd Hoffmann
3cdf752506 vfio: add edid api for display (vgpu) devices.
This allows to set EDID monitor information for the vgpu display, for a
more flexible display configuration, using a special vfio region.  Check
the comment describing struct vfio_region_gfx_edid for more details.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-11 10:22:35 -06:00
David S. Miller
49b538e79b Merge tag 'rxrpc-fixes-20181008' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:

====================
rxrpc: Fix packet reception code

Here are a set of patches that prepares for and fix problems in rxrpc's
package reception code.  There serious problems are:

 (A) There's a window between binding the socket and setting the data_ready
     hook in which packets can find their way into the UDP socket's receive
     queues.

 (B) The skb_recv_udp() will return an error (and clear the error state) if
     there was an error on the Tx side.  rxrpc doesn't handle this.

 (C) The rxrpc data_ready handler doesn't fully drain the UDP receive
     queue.

 (D) The rxrpc data_ready handler assumes it is called in a non-reentrant
 state.

The second patch fixes (A) - (C); the third patch renders (B) and (C)
non-issues by using the recap_rcv hook instead of data_ready - and the
final patch fixes (D).  That last is the most complex.

The preparatory patches are:

 (1) Fix some places that are doing things in the wrong net namespace.

 (2) Stop taking the rcu read lock as it's held by the IP input routine in
     the call chain.

 (3) Only end the Tx phase if *we* rotated the final packet out of the Tx
     buffer.

 (4) Don't assume that the call state won't change after dropping the
     call_state lock.

 (5) Only take receive window and MTU suze parameters from an ACK packet if
     it's the latest ACK packet.

 (6) Record connection-level abort information correctly.

 (7) Fix a trace line.

And then there are three main patches - note that these are mixed in with
the preparatory patches somewhat:

 (1) Fix the setup window (A), skb_recv_udp() error check (B) and packet
     drainage (C).

 (2) Switch to using the encap_rcv instead of data_ready to cut out the
     effects of the UDP read queues and get the packets delivered directly.

 (3) Add more locking into the various packet input paths to defend against
     re-entrance (D).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-10 22:27:38 -07:00
Avri Altman
e77044c5a8 scsi: ufs-bsg: Add support for uic commands in ufs_bsg_request()
Make ufshcd_send_uic_cmd() public for that.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-10 23:09:47 -04:00
Avri Altman
df032bf27a scsi: ufs: Add a bsg endpoint that supports UPIUs
For now, just provide an API to allocate and remove ufs-bsg node. We
will use this framework to manage ufs devices by sending UPIU
transactions.

For the time being, implements an empty bsg_request() - will add some
more functionality in coming patches.

Nonetheless, we reveal here the protocol we are planning to use: UFS
Transport Protocol Transactions. UFS transactions consist of packets
called UFS Protocol Information Units (UPIU).

There are UPIU’s defined for UFS SCSI commands, responses, data in and
data out, task management, utility functions, vendor functions,
transaction synchronization and control, and more.

By using UPIUs, we get access to the most fine-grained internals of this
protocol, and able to communicate with the device in ways, that are
sometimes beyond the capacity of the ufs driver.

Moreover and as a result, our core structure - ufs_bsg_node has a pretty
lean structure: using upiu transactions that contains the outmost
detailed info, so we don't really need complex constructs to support it.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-10 23:09:46 -04:00
Avri Altman
a851b2bd36 scsi: uapi: ufs: Make utp_upiu_req visible to user space
in preparation to send UPIU requests via bsg.

Signed-off-by: Avri Altman <avri.altman@wdc.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-10-10 23:09:46 -04:00
David S. Miller
071a234ad7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2018-10-08

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) sk_lookup_[tcp|udp] and sk_release helpers from Joe Stringer which allow
BPF programs to perform lookups for sockets in a network namespace. This would
allow programs to determine early on in processing whether the stack is
expecting to receive the packet, and perform some action (eg drop,
forward somewhere) based on this information.

2) per-cpu cgroup local storage from Roman Gushchin.
Per-cpu cgroup local storage is very similar to simple cgroup storage
except all the data is per-cpu. The main goal of per-cpu variant is to
implement super fast counters (e.g. packet counters), which don't require
neither lookups, neither atomic operations in a fast path.
The example of these hybrid counters is in selftests/bpf/netcnt_prog.c

3) allow HW offload of programs with BPF-to-BPF function calls from Quentin Monnet

4) support more than 64-byte key/value in HW offloaded BPF maps from Jakub Kicinski

5) rename of libbpf interfaces from Andrey Ignatov.
libbpf is maturing as a library and should follow good practices in
library design and implementation to play well with other libraries.
This patch set brings consistent naming convention to global symbols.

6) relicense libbpf as LGPL-2.1 OR BSD-2-Clause from Alexei Starovoitov
to let Apache2 projects use libbpf

7) various AF_XDP fixes from Björn and Magnus
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08 23:42:44 -07:00
Paul Mackerras
901f8c3f6f KVM: PPC: Book3S HV: Add NO_HASH flag to GET_SMMU_INFO ioctl result
This adds a KVM_PPC_NO_HASH flag to the flags field of the
kvm_ppc_smmu_info struct, and arranges for it to be set when
running as a nested hypervisor, as an unambiguous indication
to userspace that HPT guests are not supported.  Reporting the
KVM_CAP_PPC_MMU_HASH_V3 capability as false could be taken as
indicating only that the new HPT features in ISA V3.0 are not
supported, leaving it ambiguous whether pre-V3.0 HPT features
are supported.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-09 16:14:54 +11:00
Paul Mackerras
aa069a9969 KVM: PPC: Book3S HV: Add a VM capability to enable nested virtualization
With this, userspace can enable a KVM-HV guest to run nested guests
under it.

The administrator can control whether any nested guests can be run;
setting the "nested" module parameter to false prevents any guests
becoming nested hypervisors (that is, any attempt to enable the nested
capability on a guest will fail).  Guests which are already nested
hypervisors will continue to be so.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-10-09 16:14:47 +11:00
David S. Miller
9000a457a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for your net-next tree:

1) Support for matching on ipsec policy already set in the route, from
   Florian Westphal.

2) Split set destruction into deactivate and destroy phase to make it
   fit better into the transaction infrastructure, also from Florian.
   This includes a patch to warn on imbalance when setting the new
   activate and deactivate interfaces.

3) Release transaction list from the workqueue to remove expensive
   synchronize_rcu() from configuration plane path. This speeds up
   configuration plane quite a bit. From Florian Westphal.

4) Add new xfrm/ipsec extension, this new extension allows you to match
   for ipsec tunnel keys such as source and destination address, spi and
   reqid. From Máté Eckl and Florian Westphal.

5) Add secmark support, this includes connsecmark too, patches
   from Christian Gottsche.

6) Allow to specify remaining bytes in xt_quota, from Chenbo Feng.
   One follow up patch to calm a clang warning for this one, from
   Nathan Chancellor.

7) Flush conntrack entries based on layer 3 family, from Kristian Evensen.

8) New revision for cgroups2 to shrink the path field.

9) Get rid of obsolete need_conntrack(), as a result from recent
   demodularization works.

10) Use WARN_ON instead of BUG_ON, from Florian Westphal.

11) Unused exported symbol in nf_nat_ipv4_fn(), from Florian.

12) Remove superfluous check for timeout netlink parser and dump
    functions in layer 4 conntrack helpers.

13) Unnecessary redundant rcu read side locks in NAT redirect,
    from Taehee Yoo.

14) Pass nf_hook_state structure to error handlers, patch from
    Florian Westphal.

15) Remove ->new() interface from layer 4 protocol trackers. Place
    them in the ->packet() interface. From Florian.

16) Place conntrack ->error() handling in the ->packet() interface.
    Patches from Florian Westphal.

17) Remove unused parameter in the pernet initialization path,
    also from Florian.

18) Remove additional parameter to specify layer 3 protocol when
    looking up for protocol tracker. From Florian.

19) Shrink array of layer 4 protocol trackers, from Florian.

20) Check for linear skb only once from the ALG NAT mangling
    codebase, from Taehee Yoo.

21) Use rhashtable_walk_enter() instead of deprecated
    rhashtable_walk_init(), also from Taehee.

22) No need to flush all conntracks when only one single address
    is gone, from Tan Hu.

23) Remove redundant check for NAT flags in flowtable code, from
    Taehee Yoo.

24) Use rhashtable_lookup() instead of rhashtable_lookup_fast()
    from netfilter codebase, since rcu read lock side is already
    assumed in this path.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08 21:28:55 -07:00
David Ahern
89d35528d1 netlink: Add new socket option to enable strict checking on dumps
Add a new socket option, NETLINK_DUMP_STRICT_CHK, that userspace
can use via setsockopt to request strict checking of headers and
attributes on dump requests.

To get dump features such as kernel side filtering based on data in
the header or attributes appended to the dump request, userspace
must call setsockopt() for NETLINK_DUMP_STRICT_CHK and a non-zero
value. Since the netlink sock and its flags are private to the
af_netlink code, the strict checking flag is passed to dump handlers
via a flag in the netlink_callback struct.

For old userspace on new kernel there is no impact as all of the data
checks in later patches are wrapped in a check on the new strict flag.

For new userspace on old kernel, the setsockopt will fail and even if
new userspace sets data in the headers and appended attributes the
kernel will silently ignore it. Moving forward when the setsockopt
succeeds, the new userspace on old kernel means the dump request can
pass an attribute the kernel does not understand. The dump will then
fail as the older kernel does not understand it.

New userspace on new kernel setting the socket option gets the benefit
of the improved data dump.

Kernel side the NETLINK_DUMP_STRICT_CHK uapi is converted to a generic
NETLINK_F_STRICT_CHK flag which can potentially be leveraged for tighter
checking on the NEW, DEL, and SET commands.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-08 10:39:04 -07:00
David Howells
5271953cad rxrpc: Use the UDP encap_rcv hook
Use the UDP encap_rcv hook to cut the bit out of the rxrpc packet reception
in which a packet is placed onto the UDP receive queue and then immediately
removed again by rxrpc.  Going via the queue in this manner seems like it
should be unnecessary.

This does, however, require the invention of a value to place in encap_type
as that's one of the conditions to switch packets out to the encap_rcv
hook.  Possibly the value doesn't actually matter for anything other than
sockopts on the UDP socket, which aren't accessible outside of rxrpc
anyway.

This seems to cut a bit of time out of the time elapsed between each
sk_buff being timestamped and turning up in rxrpc (the final number in the
following trace excerpts).  I measured this by making the rxrpc_rx_packet
trace point print the time elapsed between the skb being timestamped and
the current time (in ns), e.g.:

	... 424.278721: rxrpc_rx_packet: ...  ACK 25026

So doing a 512MiB DIO read from my test server, with an unmodified kernel:

	N       min     max     sum		mean    stddev
	27605   2626    7581    7.83992e+07     2840.04 181.029

and with the patch applied:

	N       min     max     sum		mean    stddev
	27547   1895    12165   6.77461e+07     2459.29 255.02

Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-08 15:45:18 +01:00