Commit Graph

2550 Commits

Author SHA1 Message Date
Linus Torvalds
a1e16bc7d5 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "A usual cycle for RDMA with a typical mix of driver and core subsystem
  updates:

   - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
     hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

   - Various rtrs fixes and updates

   - Bug fix for mlx4 CM emulation for virtualization scenarios where
     MRA wasn't working right

   - Use tracepoints instead of pr_debug in the CM code

   - Scrub the locking in ucma and cma to close more syzkaller bugs

   - Use tasklet_setup in the subsystem

   - Revert the idea that 'destroy' operations are not allowed to fail
     at the driver level. This proved unworkable from a HW perspective.

   - Revise how the umem API works so drivers make fewer mistakes using
     it

   - XRC support for qedr

   - Convert uverbs objects RWQ and MW to new the allocation scheme

   - Large queue entry sizes for hns

   - Use hmm_range_fault() for mlx5 On Demand Paging

   - uverbs APIs to inspect the GID table instead of sysfs

   - Move some of the RDMA code for building large page SGLs into
     lib/scatterlist"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
  RDMA/ucma: Fix use after free in destroy id flow
  RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
  RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
  RDMA: Explicitly pass in the dma_device to ib_register_device
  lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
  IB/mlx4: Convert rej_tmout radix-tree to XArray
  RDMA/rxe: Fix bug rejecting all multicast packets
  RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
  RDMA/rxe: Remove duplicate entries in struct rxe_mr
  IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
  IB/rdmavt: Fix sizeof mismatch
  MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
  RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
  RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
  RDMA/uverbs: Expose the new GID query API to user space
  ...
2020-10-17 11:18:18 -07:00
Linus Torvalds
fad70111d5 Merge tag 'afs-fixes-20201016' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull afs updates from David Howells:
 "A collection of fixes to fix afs_cell struct refcounting, thereby
  fixing a slew of related syzbot bugs:

   - Fix the cell tree in the netns to use an rwsem rather than RCU.

     There seem to be some problems deriving from the use of RCU and a
     seqlock to walk the rbtree, but it's not entirely clear what since
     there are several different failures being seen.

     Changing things to use an rwsem instead makes it more robust. The
     extra performance derived from using RCU isn't necessary in this
     case since the only time we're looking up a cell is during mount or
     when cells are being manually added.

   - Fix the refcounting by splitting the usage counter into a memory
     refcount and an active users counter. The usage counter was doing
     double duty, keeping track of whether a cell is still in use and
     keeping track of when it needs to be destroyed - but this makes the
     clean up tricky. Separating these out simplifies the logic.

   - Fix purging a cell that has an alias. A cell alias pins the cell
     it's an alias of, but the alias is always later in the list. Trying
     to purge in a single pass causes rmmod to hang in such a case.

   - Fix cell removal. If a cell's manager is requeued whilst it's
     removing itself, the manager will run again and re-remove itself,
     causing problems in various places. Follow Hillf Danton's
     suggestion to insert a more terminal state that causes the manager
     to do nothing post-removal.

  In additional to the above, two other changes:

   - Add a tracepoint for the cell refcount and active users count. This
     helped with debugging the above and may be useful again in future.

   - Downgrade an assertion to a print when a still-active server is
     seen during purging. This was happening as a consequence of
     incomplete cell removal before the servers were cleaned up"

* tag 'afs-fixes-20201016' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Don't assert on unpurgeable server records
  afs: Add tracing for cell refcount and active user count
  afs: Fix cell removal
  afs: Fix cell purging with aliases
  afs: Fix cell refcounting by splitting the usage counter
  afs: Fix rapid cell addition/removal by not using RCU on cells tree
2020-10-16 15:22:41 -07:00
Linus Torvalds
7a3dadedc8 Merge tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've added new features such as zone capacity for ZNS
  and a new GC policy, ATGC, along with in-memory segment management. In
  addition, we could improve the decompression speed significantly by
  changing virtual mapping method. Even though we've fixed lots of small
  bugs in compression support, I feel that it becomes more stable so
  that I could give it a try in production.

  Enhancements:
   - suport zone capacity in NVMe Zoned Namespace devices
   - introduce in-memory current segment management
   - add standart casefolding support
   - support age threshold based garbage collection
   - improve decompression speed by changing virtual mapping method

  Bug fixes:
   - fix condition checks in some ioctl() such as compression, move_range, etc
   - fix 32/64bits support in data structures
   - fix memory allocation in zstd decompress
   - add some boundary checks to avoid kernel panic on corrupted image
   - fix disallowing compression for non-empty file
   - fix slab leakage of compressed block writes

  In addition, it includes code refactoring for better readability and
  minor bug fixes for compression and zoned device support"

* tag 'f2fs-for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
  f2fs: code cleanup by removing unnecessary check
  f2fs: wait for sysfs kobject removal before freeing f2fs_sb_info
  f2fs: fix writecount false positive in releasing compress blocks
  f2fs: introduce check_swap_activate_fast()
  f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case
  f2fs: handle errors of f2fs_get_meta_page_nofail
  f2fs: fix to set SBI_NEED_FSCK flag for inconsistent inode
  f2fs: reject CASEFOLD inode flag without casefold feature
  f2fs: fix memory alignment to support 32bit
  f2fs: fix slab leak of rpages pointer
  f2fs: compress: fix to disallow enabling compress on non-empty file
  f2fs: compress: introduce cic/dic slab cache
  f2fs: compress: introduce page array slab cache
  f2fs: fix to do sanity check on segment/section count
  f2fs: fix to check segment boundary during SIT page readahead
  f2fs: fix uninit-value in f2fs_lookup
  f2fs: remove unneeded parameter in find_in_block()
  f2fs: fix wrong total_sections check and fsmeta check
  f2fs: remove duplicated code in sanity_check_area_boundary
  f2fs: remove unused check on version_bitmap
  ...
2020-10-16 15:14:43 -07:00
David Howells
7530d3eb3d afs: Don't assert on unpurgeable server records
Don't give an assertion failure on unpurgeable afs_server records - which
kills the thread - but rather emit a trace line when we are purging a
record (which only happens during network namespace removal or rmmod) and
print a notice of the problem.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-16 14:39:34 +01:00
David Howells
dca54a7bbb afs: Add tracing for cell refcount and active user count
Add a tracepoint to log the cell refcount and active user count and pass in
a reason code through various functions that manipulate these counters.

Additionally, a helper function, afs_see_cell(), is provided to log
interesting places that deal with a cell without actually doing any
accounting directly.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-10-16 14:39:21 +01:00
Linus Torvalds
9ff9b0d392 Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
2020-10-15 18:42:13 -07:00
Linus Torvalds
c48b75b727 Merge tag 'sound-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "The amount of changes is smaller at this round (what a surprise), but
  lots of activity is seen. Most of changes are about ASoC driver
  development, especially Intel platforms. Here are some highlights:

  General:
   - Replace all tasklet usages with other alternatives
   - Cleanup of the ASoC error unwinding code
   - Fixes for trivial issues caught by static checker
   - Spell fixes allover the places

  ALSA Core:
   - Lockdep fix for control devices
   - Fix for potential OSS sequencer mutex stalls

  HD-audio and USB-audio:
   - SoundBlaster AE-7 support
   - Changes in quirk table for the rename handling
   - Quirks for HP and ASUS machines, Pioneer DJ DJM-250MK2.

  ASoC:
   - Lots of updates for Intel SOF and SoundWire enablement
   - Replacement of the DSP driver for some older x86 systems; the new
     code was written from scratch, better maintenance expected
   - Helpers for parsing auxiluary devices from the device tree
   - New support for AllWinner A64, Cirrus Logic CS4234, Mediatek MT6359
     Microchip S/PDIF TX and RX controllers, Realtek RT1015P, and Texas
     Instruments J721E, TAS2110, TAS2564 and TAS2764"

* tag 'sound-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (498 commits)
  ALSA: hda/hdmi: fix incorrect locking in hdmi_pcm_close
  ALSA: hda: fix jack detection with Realtek codecs when in D3
  ALSA: fireworks: use semicolons rather than commas to separate statements
  ALSA: hda: use semicolons rather than commas to separate statements
  ALSA: hda/i915 - fix list corruption with concurrent probes
  ASoC: dmaengine: Document support for TX only or RX only streams
  ASoC: mchp-spdiftx: remove 'TX' from playback stream name
  ASoC: ti: davinci-mcasp: Use &pdev->dev for early dev_warn
  ASoC: tas2764: Add the driver for the TAS2764
  dt-bindings: tas2764: Add the TAS2764 binding doc
  ASoC: Intel: catpt: Add explicit DMADEVICES kconfig dependency
  ASoC: Intel: catpt: Fix compilation when CONFIG_MODULES is disabled
  ASoC: stm32: dfsdm: add actual resolution trace
  ASoC: stm32: dfsdm: change rate limits
  ASoC: qcom: sc7180: Add support for audio over DP
  Asoc: qcom: lpass-platform : Increase buffer size
  ASoC: qcom: Add support for lpass hdmi driver
  Asoc: qcom: lpass:Update lpaif_dmactl members order
  Asoc:qcom:lpass-cpu:Update dts property read API
  ASoC: dt-bindings: Add dt binding for lpass hdmi
  ...
2020-10-15 11:07:44 -07:00
Linus Torvalds
55e0500eb5 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "The usual driver updates (ufs, qla2xxx, tcmu, ibmvfc, lpfc, smartpqi,
  hisi_sas, qedi, qedf, mpt3sas) and minor bug fixes.

  There are only three core changes: adding sense codes, cleaning up
  noretry and adding an option for limitless retries"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (226 commits)
  scsi: hisi_sas: Recover PHY state according to the status before reset
  scsi: hisi_sas: Filter out new PHY up events during suspend
  scsi: hisi_sas: Add device link between SCSI devices and hisi_hba
  scsi: hisi_sas: Add check for methods _PS0 and _PR0
  scsi: hisi_sas: Add controller runtime PM support for v3 hw
  scsi: hisi_sas: Switch to new framework to support suspend and resume
  scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq()
  scsi: qedf: Remove redundant assignment to variable 'rc'
  scsi: lpfc: Remove unneeded variable 'status' in lpfc_fcp_cpu_map_store()
  scsi: snic: Convert to use DEFINE_SEQ_ATTRIBUTE macro
  scsi: qla4xxx: Delete unneeded variable 'status' in qla4xxx_process_ddb_changed
  scsi: sun_esp: Use module_platform_driver to simplify the code
  scsi: sun3x_esp: Use module_platform_driver to simplify the code
  scsi: sni_53c710: Use module_platform_driver to simplify the code
  scsi: qlogicpti: Use module_platform_driver to simplify the code
  scsi: mac_esp: Use module_platform_driver to simplify the code
  scsi: jazz_esp: Use module_platform_driver to simplify the code
  scsi: mvumi: Fix error return in mvumi_io_attach()
  scsi: lpfc: Drop nodelist reference on error in lpfc_gen_req()
  scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
  ...
2020-10-14 15:15:35 -07:00
Linus Torvalds
7b540812cc Merge tag 'selinux-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
 "A decent number of SELinux patches for v5.10, twenty two in total. The
  highlights are listed below, but all of the patches pass our test
  suite and merge cleanly.

   - A number of changes to how the SELinux policy is loaded and managed
     inside the kernel with the goal of improving the atomicity of a
     SELinux policy load operation.

     These changes account for the bulk of the diffstat as well as the
     patch count. A special thanks to everyone who contributed patches
     and fixes for this work.

   - Convert the SELinux policy read-write lock to RCU.

   - A tracepoint was added for audited SELinux access control events;
     this should help provide a more unified backtrace across kernel and
     userspace.

   - Allow the removal of security.selinux xattrs when a SELinux policy
     is not loaded.

   - Enable policy capabilities in SELinux policies created with the
     scripts/selinux/mdp tool.

   - Provide some "no sooner than" dates for the SELinux checkreqprot
     sysfs deprecation"

* tag 'selinux-pr-20201012' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: (22 commits)
  selinux: provide a "no sooner than" date for the checkreqprot removal
  selinux: Add helper functions to get and set checkreqprot
  selinux: access policycaps with READ_ONCE/WRITE_ONCE
  selinux: simplify away security_policydb_len()
  selinux: move policy mutex to selinux_state, use in lockdep checks
  selinux: fix error handling bugs in security_load_policy()
  selinux: convert policy read-write lock to RCU
  selinux: delete repeated words in comments
  selinux: add basic filtering for audit trace events
  selinux: add tracepoint on audited events
  selinux: Create new booleans and class dirs out of tree
  selinux: Standardize string literal usage for selinuxfs directory names
  selinux: Refactor selinuxfs directory populating functions
  selinux: Create function for selinuxfs directory cleanup
  selinux: permit removing security.selinux xattr before policy load
  selinux: fix memdup.cocci warnings
  selinux: avoid dereferencing the policy prior to initialization
  selinux: fix allocation failure check on newpolicy->sidtab
  selinux: refactor changing booleans
  selinux: move policy commit after updating selinuxfs
  ...
2020-10-13 16:29:55 -07:00
Linus Torvalds
7cd4ecd917 Merge tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "Here are the driver updates for 5.10.

  A few SCSI updates in here too, in coordination with Martin as they
  depend on core block changes for the shared tag bitmap.

  This contains:

   - NVMe pull requests via Christoph:
      - fix keep alive timer modification (Amit Engel)
      - order the PCI ID list more sensibly (Andy Shevchenko)
      - cleanup the open by controller helper (Chaitanya Kulkarni)
      - use an xarray for the CSE log lookup (Chaitanya Kulkarni)
      - support ZNS in nvmet passthrough mode (Chaitanya Kulkarni)
      - fix nvme_ns_report_zones (Christoph Hellwig)
      - add a sanity check to nvmet-fc (James Smart)
      - fix interrupt allocation when too many polled queues are
        specified (Jeffle Xu)
      - small nvmet-tcp optimization (Mark Wunderlich)
      - fix a controller refcount leak on init failure (Chaitanya
        Kulkarni)
      - misc cleanups (Chaitanya Kulkarni)
      - major refactoring of the scanning code (Christoph Hellwig)

   - MD updates via Song:
      - Bug fixes in bitmap code, from Zhao Heming
      - Fix a work queue check, from Guoqing Jiang
      - Fix raid5 oops with reshape, from Song Liu
      - Clean up unused code, from Jason Yan
      - Discard improvements, from Xiao Ni
      - raid5/6 page offset support, from Yufen Yu

   - Shared tag bitmap for SCSI/hisi_sas/null_blk (John, Kashyap,
     Hannes)

   - null_blk open/active zone limit support (Niklas)

   - Set of bcache updates (Coly, Dongsheng, Qinglang)"

* tag 'drivers-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (78 commits)
  md/raid5: fix oops during stripe resizing
  md/bitmap: fix memory leak of temporary bitmap
  md: fix the checking of wrong work queue
  md/bitmap: md_bitmap_get_counter returns wrong blocks
  md/bitmap: md_bitmap_read_sb uses wrong bitmap blocks
  md/raid0: remove unused function is_io_in_chunk_boundary()
  nvme-core: remove extra condition for vwc
  nvme-core: remove extra variable
  nvme: remove nvme_identify_ns_list
  nvme: refactor nvme_validate_ns
  nvme: move nvme_validate_ns
  nvme: query namespace identifiers before adding the namespace
  nvme: revalidate zone bitmaps in nvme_update_ns_info
  nvme: remove nvme_update_formats
  nvme: update the known admin effects
  nvme: set the queue limits in nvme_update_ns_info
  nvme: remove the 0 lba_shift check in nvme_update_ns_info
  nvme: clean up the check for too large logic block sizes
  nvme: freeze the queue over ->lba_shift updates
  nvme: factor out a nvme_configure_metadata helper
  ...
2020-10-13 13:04:41 -07:00
Linus Torvalds
3ad11d7ac8 Merge tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:

 - Series of merge handling cleanups (Baolin, Christoph)

 - Series of blk-throttle fixes and cleanups (Baolin)

 - Series cleaning up BDI, seperating the block device from the
   backing_dev_info (Christoph)

 - Removal of bdget() as a generic API (Christoph)

 - Removal of blkdev_get() as a generic API (Christoph)

 - Cleanup of is-partition checks (Christoph)

 - Series reworking disk revalidation (Christoph)

 - Series cleaning up bio flags (Christoph)

 - bio crypt fixes (Eric)

 - IO stats inflight tweak (Gabriel)

 - blk-mq tags fixes (Hannes)

 - Buffer invalidation fixes (Jan)

 - Allow soft limits for zone append (Johannes)

 - Shared tag set improvements (John, Kashyap)

 - Allow IOPRIO_CLASS_RT for CAP_SYS_NICE (Khazhismel)

 - DM no-wait support (Mike, Konstantin)

 - Request allocation improvements (Ming)

 - Allow md/dm/bcache to use IO stat helpers (Song)

 - Series improving blk-iocost (Tejun)

 - Various cleanups (Geert, Damien, Danny, Julia, Tetsuo, Tian, Wang,
   Xianting, Yang, Yufen, yangerkun)

* tag 'block-5.10-2020-10-12' of git://git.kernel.dk/linux-block: (191 commits)
  block: fix uapi blkzoned.h comments
  blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue
  blk-mq: get rid of the dead flush handle code path
  block: get rid of unnecessary local variable
  block: fix comment and add lockdep assert
  blk-mq: use helper function to test hw stopped
  block: use helper function to test queue register
  block: remove redundant mq check
  block: invoke blk_mq_exit_sched no matter whether have .exit_sched
  percpu_ref: don't refer to ref->data if it isn't allocated
  block: ratelimit handle_bad_sector() message
  blk-throttle: Re-use the throtl_set_slice_end()
  blk-throttle: Open code __throtl_de/enqueue_tg()
  blk-throttle: Move service tree validation out of the throtl_rb_first()
  blk-throttle: Move the list operation after list validation
  blk-throttle: Fix IO hang for a corner case
  blk-throttle: Avoid tracking latency if low limit is invalid
  blk-throttle: Avoid getting the current time if tg->last_finish_time is 0
  blk-throttle: Remove a meaningless parameter for throtl_downgrade_state()
  block: Remove redundant 'return' statement
  ...
2020-10-13 12:12:44 -07:00
Linus Torvalds
11e3235b43 Merge tag 'for-5.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
 "Mostly core updates with a few user visible bits and fixes.

  Hilights:

   - fsync performance improvements
      - less contention of log mutex (throughput +4%, latency -14%,
        dbench with 32 clients)
      - skip unnecessary commits for link and rename (throughput +6%,
        latency -30%, rename latency -75%, dbench with 16 clients)
      - make fast fsync wait only for writeback (throughput +10..40%,
        runtime -1..-20%, dbench with 1 to 64 clients on various
        file/block sizes)

   - direct io is now implemented using the iomap infrastructure, that's
     the main part, we still have a workaround that requires an iomap
     API update, coming in 5.10

   - new sysfs exports:
      - information about the exclusive filesystem operation status
        (balance, device add/remove/replace, ...)
      - supported send stream version

  Core:

   - use ticket space reservations for data, fair policy using the same
     infrastructure as metadata

   - preparatory work to switch locking from our custom tree locks to
     standard rwsem, now the locking context is propagated to all
     callers, actual switch is expected to happen in the next dev cycle

   - seed device structures are now using list API

   - extent tracepoints print proper tree id

   - unified range checks for extent buffer helpers

   - send: avoid using temporary buffer for copying data

   - remove unnecessary RCU protection from space infos

   - remove unused readpage callback for metadata, enabling several
     cleanups

   - replace indirect function calls for end io hooks and remove
     extent_io_ops completely

  Fixes:

   - more lockdep warning fixes

   - fix qgroup reservation for delayed inode and an occasional
     reservation leak for preallocated files

   - fix device replace of a seed device

   - fix metadata reservation for fallocate that leads to transaction
     aborts

   - reschedule if necessary when logging directory items or when
     cloning lots of extents

   - tree-checker: fix false alert caused by legacy btrfs root item

   - send: fix rename/link conflicts for orphanized inodes

   - properly initialize device stats for seed devices

   - skip devices without magic signature when mounting

  Other:

   - error handling improvements, BUG_ONs replaced by proper handling,
     fuzz fixes

   - various function parameter cleanups

   - various W=1 cleanups

   - error/info messages improved

  Mishaps:

   - commit 62cf539120 ("btrfs: move btrfs_rm_dev_replace_free_srcdev
     outside of all locks") is a rebase leftover after the patch got
     merged to 5.9-rc8 as a466c85edc ("btrfs: move
     btrfs_rm_dev_replace_free_srcdev outside of all locks"), the
     remaining part is trivial and the patch is in the middle of the
     series so I'm keeping it there instead of rebasing"

* tag 'for-5.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (161 commits)
  btrfs: rename BTRFS_INODE_ORDERED_DATA_CLOSE flag
  btrfs: annotate device name rcu_string with __rcu
  btrfs: skip devices without magic signature when mounting
  btrfs: cleanup cow block on error
  btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
  fs: remove no longer used dio_end_io()
  btrfs: return error if we're unable to read device stats
  btrfs: init device stats for seed devices
  btrfs: remove struct extent_io_ops
  btrfs: call submit_bio_hook directly for metadata pages
  btrfs: stop calling submit_bio_hook for data inodes
  btrfs: don't opencode is_data_inode in end_bio_extent_readpage
  btrfs: call submit_bio_hook directly in submit_one_bio
  btrfs: remove extent_io_ops::readpage_end_io_hook
  btrfs: replace readpage_end_io_hook with direct calls
  btrfs: send, recompute reference path after orphanization of a directory
  btrfs: send, orphanize first all conflicting inodes when processing references
  btrfs: tree-checker: fix false alert caused by legacy btrfs root item
  btrfs: use unaligned helpers for stack and header set/get helpers
  btrfs: free-space-cache: use unaligned helpers to access data
  ...
2020-10-13 09:01:36 -07:00
Linus Torvalds
53acd35050 Merge tag 'locks-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking fix from Jeff Layton:
 "Just a single patch to fix up some tracepoint output"

* tag 'locks-v5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
  locks: Remove extra "0x" in tracepoint format specifier
2020-10-12 16:50:18 -07:00
Linus Torvalds
ee4a925107 Merge tag 'x86-paravirt-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt cleanup from Ingo Molnar:
 "Clean up the paravirt code after the removal of 32-bit Xen PV support"

* tag 'x86-paravirt-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/paravirt: Avoid needless paravirt step clearing page table entries
  x86/paravirt: Remove set_pte_at() pv-op
  x86/entry/32: Simplify CONFIG_XEN_PV build dependency
  x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT
  x86/paravirt: Clean up paravirt macros
  x86/paravirt: Remove 32-bit support from CONFIG_PARAVIRT_XXL
2020-10-12 15:15:24 -07:00
Linus Torvalds
edaa5ddf38 Merge tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - reorganize & clean up the SD* flags definitions and add a bunch of
   sanity checks. These new checks caught quite a few bugs or at least
   inconsistencies, resulting in another set of patches.

 - rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ

 - add a new tracepoint to improve CPU capacity tracking

 - improve overloaded SMP system load-balancing behavior

 - tweak SMT balancing

 - energy-aware scheduling updates

 - NUMA balancing improvements

 - deadline scheduler fixes and improvements

 - CPU isolation fixes

 - misc cleanups, simplifications and smaller optimizations

* tag 'sched-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  sched/deadline: Unthrottle PI boosted threads while enqueuing
  sched/debug: Add new tracepoint to track cpu_capacity
  sched/fair: Tweak pick_next_entity()
  rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  rseq/selftests,x86_64: Add rseq_offset_deref_addv()
  rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  sched/fair: Use dst group while checking imbalance for NUMA balancer
  sched/fair: Reduce busy load balance interval
  sched/fair: Minimize concurrent LBs between domain level
  sched/fair: Reduce minimal imbalance threshold
  sched/fair: Relax constraint on task's load during load balance
  sched/fair: Remove the force parameter of update_tg_load_avg()
  sched/fair: Fix wrong cpu selecting from isolated domain
  sched: Remove unused inline function uclamp_bucket_base_value()
  sched/rt: Disable RT_RUNTIME_SHARE by default
  sched/deadline: Fix stale throttling on de-/boosted tasks
  sched/numa: Use runnable_avg to classify node
  sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL
  MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
  sched/topology: Move SD_DEGENERATE_GROUPS_MASK out of linux/sched/topology.h
  ...
2020-10-12 12:56:01 -07:00
Linus Torvalds
6734e20e39 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "There's quite a lot of code here, but much of it is due to the
  addition of a new PMU driver as well as some arm64-specific selftests
  which is an area where we've traditionally been lagging a bit.

  In terms of exciting features, this includes support for the Memory
  Tagging Extension which narrowly missed 5.9, hopefully allowing
  userspace to run with use-after-free detection in production on CPUs
  that support it. Work is ongoing to integrate the feature with KASAN
  for 5.11.

  Another change that I'm excited about (assuming they get the hardware
  right) is preparing the ASID allocator for sharing the CPU page-table
  with the SMMU. Those changes will also come in via Joerg with the
  IOMMU pull.

  We do stray outside of our usual directories in a few places, mostly
  due to core changes required by MTE. Although much of this has been
  Acked, there were a couple of places where we unfortunately didn't get
  any review feedback.

  Other than that, we ran into a handful of minor conflicts in -next,
  but nothing that should post any issues.

  Summary:

   - Userspace support for the Memory Tagging Extension introduced by
     Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.

   - Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
     switching.

   - Fix and subsequent rewrite of our Spectre mitigations, including
     the addition of support for PR_SPEC_DISABLE_NOEXEC.

   - Support for the Armv8.3 Pointer Authentication enhancements.

   - Support for ASID pinning, which is required when sharing
     page-tables with the SMMU.

   - MM updates, including treating flush_tlb_fix_spurious_fault() as a
     no-op.

   - Perf/PMU driver updates, including addition of the ARM CMN PMU
     driver and also support to handle CPU PMU IRQs as NMIs.

   - Allow prefetchable PCI BARs to be exposed to userspace using normal
     non-cacheable mappings.

   - Implementation of ARCH_STACKWALK for unwinding.

   - Improve reporting of unexpected kernel traps due to BPF JIT
     failure.

   - Improve robustness of user-visible HWCAP strings and their
     corresponding numerical constants.

   - Removal of TEXT_OFFSET.

   - Removal of some unused functions, parameters and prototypes.

   - Removal of MPIDR-based topology detection in favour of firmware
     description.

   - Cleanups to handling of SVE and FPSIMD register state in
     preparation for potential future optimisation of handling across
     syscalls.

   - Cleanups to the SDEI driver in preparation for support in KVM.

   - Miscellaneous cleanups and refactoring work"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
  Revert "arm64: initialize per-cpu offsets earlier"
  arm64: random: Remove no longer needed prototypes
  arm64: initialize per-cpu offsets earlier
  kselftest/arm64: Check mte tagged user address in kernel
  kselftest/arm64: Verify KSM page merge for MTE pages
  kselftest/arm64: Verify all different mmap MTE options
  kselftest/arm64: Check forked child mte memory accessibility
  kselftest/arm64: Verify mte tag inclusion via prctl
  kselftest/arm64: Add utilities and a test to validate mte memory
  perf: arm-cmn: Fix conversion specifiers for node type
  perf: arm-cmn: Fix unsigned comparison to less than zero
  arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
  arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
  arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
  arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
  KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
  arm64: Get rid of arm64_ssbd_state
  KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
  KVM: arm64: Get rid of kvm_arm_have_ssbd()
  KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
  ...
2020-10-12 10:00:51 -07:00
Ingo Molnar
b36c830f8c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull v5.10 RCU changes from Paul E. McKenney:

- Debugging for smp_call_function().

- Strict grace periods for KASAN.  The point of this series is to find
  RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
  Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
  further disabled by dfefault.  Finally, the help text includes
  a goodly list of scary caveats.

- New smp_call_function() torture test.

- Torture-test updates.

- Documentation updates.

- Miscellaneous fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-10-09 08:21:56 +02:00
Qu Wenruo
2c53a14dd3 btrfs: use own btree inode io_tree owner id
Btree inode is special compared to all other inode extent io_trees,
although it has a btrfs inode, it doesn't have the track_uptodate bit at
all.

This means a lot of things like extent locking doesn't even need to be
applied to btree io tree.

Since it's so special, adds a new owner value for it to make debuging a
little easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:13:22 +02:00
Nikolay Borisov
acbf1dd0fc btrfs: make ordered extent tracepoint take btrfs_inode
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:12:17 +02:00
Qu Wenruo
437490fed3 btrfs: tracepoints: output proper root owner for trace_find_free_extent()
The current trace event always output result like this:

 find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA)
 find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=4(METADATA)
 find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA)
 find_free_extent: root=2(EXTENT_TREE) len=8192 empty_size=0 flags=1(DATA)
 find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA)
 find_free_extent: root=2(EXTENT_TREE) len=4096 empty_size=0 flags=1(DATA)

T's saying we're allocating data extent for EXTENT tree, which is not
even possible.

It's because we always use EXTENT tree as the owner for
trace_find_free_extent() without using the @root from
btrfs_reserve_extent().

This patch will change the parameter to use proper @root for
trace_find_free_extent():

Now it looks much better:

 find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
 find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA)
 find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=1(DATA)
 find_free_extent: root=5(FS_TREE) len=4096 empty_size=0 flags=1(DATA)
 find_free_extent: root=5(FS_TREE) len=8192 empty_size=0 flags=1(DATA)
 find_free_extent: root=5(FS_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
 find_free_extent: root=7(CSUM_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
 find_free_extent: root=2(EXTENT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)
 find_free_extent: root=1(ROOT_TREE) len=16384 empty_size=0 flags=36(METADATA|DUP)

Reported-by: Hans van Kranenburg <hans@knorrie.org>
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-10-07 12:06:49 +02:00
Mark Brown
fd6b519a30 Merge tag 'v5.9-rc5' into asoc-5.10
Linux 5.9-rc5
2020-10-06 16:19:24 +01:00
Cezary Rojewski
ca756120d4 ASoC: Intel: Remove haswell solution
Newly added catpt solution found in sound/soc/intel/catpt is a direct
replacement to sound/soc/intel/haswell. It covers all features supported
by it and more - by aligning to recommended flows and requirement list
based on Windows driver equivalent. No harm is done to userspace as
catpt - similarly to haswell - loads no extenal topology files while
sharing the exact same ADSP firmware binary.

Given the above, existing haswell code is redundant so remove it.

Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Liam Girdwood <liam.r.girdwood@intel.com>
Link: https://lore.kernel.org/r/20201006064907.16277-2-cezary.rojewski@intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
2020-10-06 15:12:18 +01:00
Vincent Donnefort
51cf18c90c sched/debug: Add new tracepoint to track cpu_capacity
rq->cpu_capacity is a key element in several scheduler parts, such as EAS
task placement and load balancing. Tracking this value enables testing
and/or debugging by a toolkit.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1598605249-72651-1-git-send-email-vincent.donnefort@arm.com
2020-10-03 16:30:52 +02:00
Roman Bolshakov
7010645ba7 scsi: target: core: Add CONTROL field for trace events
trace-cmd report doesn't show events from target subsystem because
scsi_command_size() leaks through event format string:

  [target:target_sequencer_start] function scsi_command_size not defined
  [target:target_cmd_complete] function scsi_command_size not defined

Addition of scsi_command_size() to plugin_scsi.c in trace-cmd doesn't
help because an expression is used inside TP_printk(). trace-cmd event
parser doesn't understand minus sign inside [ ]:

  Error: expected ']' but read '-'

Rather than duplicating kernel code in plugin_scsi.c, provide a dedicated
field for CONTROL byte.

Link: https://lore.kernel.org/r/20200929125957.83069-1-r.bolshakov@yadro.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-02 18:36:19 -04:00
Coly Li
1132e56e78 bcache: add set_uuid in struct cache_set
This patch adds a separated set_uuid[16] in struct cache_set, to store
the uuid of the cache set. This is the preparation to remove the
embedded struct cache_sb from struct cache_set.

Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-02 14:25:30 -06:00
Ido Schimmel
5b88823bfe devlink: Add a tracepoint for trap reports
Add a tracepoint for trap reports so that drop monitor could register
its probe on it. Use trace_devlink_trap_report_enabled() to avoid
wasting cycles setting the trap metadata if the tracepoint is not
enabled.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-30 18:01:26 -07:00
Alexander Graf
1ae099540e KVM: x86: Allow deflecting unknown MSR accesses to user space
MSRs are weird. Some of them are normal control registers, such as EFER.
Some however are registers that really are model specific, not very
interesting to virtualization workloads, and not performance critical.
Others again are really just windows into package configuration.

Out of these MSRs, only the first category is necessary to implement in
kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
certain CPU models and MSRs that contain information on the package level
are much better suited for user space to process. However, over time we have
accumulated a lot of MSRs that are not the first category, but still handled
by in-kernel KVM code.

This patch adds a generic interface to handle WRMSR and RDMSR from user
space. With this, any future MSR that is part of the latter categories can
be handled in user space.

Furthermore, it allows us to replace the existing "ignore_msrs" logic with
something that applies per-VM rather than on the full system. That way you
can run productive VMs in parallel to experimental ones where you don't care
about proper MSR handling.

Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

Message-Id: <20200925143422.21718-3-graf@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:04 -04:00
Tejun Heo
c5a6561b8d iocost: add iocg_forgive_debt tracepoint
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 08:35:02 -06:00
Jens Axboe
ac8f7a0264 Merge branch 'for-5.10/block' into for-5.10/drivers
* for-5.10/block: (140 commits)
  bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag
  bdi: invert BDI_CAP_NO_ACCT_WB
  bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag
  mm: use SWP_SYNCHRONOUS_IO more intelligently
  bdi: remove BDI_CAP_SYNCHRONOUS_IO
  bdi: remove BDI_CAP_CGROUP_WRITEBACK
  block: lift setting the readahead size into the block layer
  md: update the optimal I/O size on reshape
  bdi: initialize ->ra_pages and ->io_pages in bdi_init
  aoe: set an optimal I/O size
  bcache: inherit the optimal I/O size
  drbd: remove dead code in device_to_statistics
  fs: remove the unused SB_I_MULTIROOT flag
  block: mark blkdev_get static
  PM: mm: cleanup swsusp_swap_check
  mm: split swap_type_of
  PM: rewrite is_hibernate_resume_dev to not require an inode
  mm: cleanup claim_swapfile
  ocfs2: cleanup o2hb_region_dev_store
  dasd: cleanup dasd_scan_partitions
  ...
2020-09-24 13:44:39 -06:00
Chuck Lever
721a1d388b SUNRPC: Remove dprintk call sites in RPC queuing functions
Remove redundant call sites or call sites that are already covered
by tracepoints.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
1466c22163 SUNRPC: Clean up RPC scheduler tracepoints
Remove several redundant dprintk call sites, and replace a couple of
potentially useful ones with tracepoints.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
c3adcc7dfb SUNRPC: Replace rpcbind dprintk call sites with tracepoints
In many cases, tracepoints already report these errors. In others,
the dprintks were mainly useful when this code was less mature.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
ac1ae53421 SUNRPC: Hoist trace_xprtrdma_op_setport into generic code
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
e465cc3fa8 SUNRPC: Remove rpcb_getport_async dprintk call sites
In many cases, tracepoints already report these errors. In others,
the dprintks were mainly useful when this code was less mature.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
42ebfc2cbf SUNRPC: Clean up call_bind_status() observability
Time to remove dprintk call sites in here.

Regarding the rpc_bind_status tracepoint: It's friendlier to
administrators if they don't have to look up the error code to
figure out what went wrong. Replace trace_rpc_bind_status with a
set of tracepoints that report more specifically what the problem
was, and what RPC program/version was being queried.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
7c8099f6ad SUNRPC: Trace call_refresh events
Clean up: Replace dprintk call sites.

Note that rpc_call_rpcerror() already has a trace point, so perhaps
adding trace_rpc_refresh_status() isn't necessary. However, it does
report a particular category of error.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
914cdcc78a SUNRPC: Add trace_rpc_timeout_status()
For a long while we've wanted a tracepoint that fires when a major
timeout is reported in the system log. Such a tracepoint can be
attached to other actions that can take place when a timeout is
detected (eg, server or connection health assessment).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
db0a86c426 SUNRPC: Replace connect dprintk call sites with a tracepoint
This trace event can be used to audit transport connections from the
client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:09 -04:00
Chuck Lever
015747d296 SUNRPC: Replace dprintk() call site in xs_nospace()
"no socket space" is an exceptional and infrequent condition
that troubleshooters want to know about.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:08 -04:00
Chuck Lever
9ce07ae5eb SUNRPC: Replace dprintk() call site in xprt_prepare_transmit
Generate a trace event when an RPC request is queued without being
sent immediately.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:08 -04:00
Chuck Lever
09d2ba0cb1 SUNRPC: Update debugging instrumentation in xprt_do_reserve()
Replace a dprintk() with a tracepoint. The tracepoint marks the
point where an RPC request is assigned an XID.

Additional clean up: Remove trace_xprt_enq_xmit, which reports much
the same thing. That tracepoint was added for debugging commit
918f3c1fe8 ("SUNRPC: Improve latency for interactive tasks").

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:08 -04:00
Chuck Lever
7806948753 SUNRPC: Remove debugging instrumentation from xprt_release
These instruments don't appear to add any substantial value.

We already have this at the termination of each RPC:

          iozone-2617  [002]   975.713126: rpc_stats_latency:    task:418@5 xid=0x260eab5d nfsv3 LOOKUP backlog=15 rtt=32 execute=58
          iozone-2617  [002]   975.713127: xprt_release_cong:    task:418@5 snd_task:4294967295 cong=256 cwnd=16384
          iozone-2617  [002]   975.713127: xprt_put_cong:        task:418@5 snd_task:4294967295 cong=0 cwnd=16384

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:08 -04:00
Chuck Lever
06e234c613 SUNRPC: Hoist trace_xprtrdma_op_allocate into generic code
Introduce a tracepoint in call_allocate that reports the exact
sizes in the RPC buffer allocation request and the status of the
result. This helps catch problems with XDR buffer provisioning,
and replaces transport-specific debugging instrumentation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:08 -04:00
Chuck Lever
e4378a0fdd SUNRPC: Remove trace_xprt_complete_rqst()
Request completion is already recorded by an "rpc_task_wakeup
queue=xprt_pending" trace record. A subsequent rpc_xdr_recvfrom
trace record shows the number of bytes received.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-09-21 10:21:08 -04:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
David Howells
96a9c425e2 rxrpc: Fix a missing NULL-pointer check in a trace
Fix the rxrpc_client tracepoint to not dereference conn to get the cid if
conn is NULL, as it does for other fields.

	RIP: 0010:trace_event_raw_event_rxrpc_client+0x7e/0xe0 [rxrpc]
	Call Trace:
	 rxrpc_activate_channels+0x62/0xb0 [rxrpc]
	 rxrpc_connect_call+0x481/0x650 [rxrpc]
	 ? wake_up_q+0xa0/0xa0
	 ? rxrpc_kernel_begin_call+0x12a/0x1b0 [rxrpc]
	 rxrpc_new_client_call+0x2a5/0x5e0 [rxrpc]

Fixes: 245500d853 ("rxrpc: Rewrite the client connection manager")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
2020-09-14 16:18:59 +01:00
Chao Yu
32c0fec1aa f2fs: trace: fix typo
Fixes a typo from 'compreesed' to 'compressed'.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-11 11:11:25 -07:00
Chao Yu
093749e296 f2fs: support age threshold based garbage collection
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.

This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:

1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
 0 means youngest, 100 means oldest, if we set age threshold to 80
 then select dirty segments which has age in range of [80, 100] as
 candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;

2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.

3. merge valid blocks from source victim into target victim with
SSR alloctor.

Test steps:
- create 160 dirty segments:
 * half of them have 128 valid blocks per segment
 * left of them have 384 valid blocks per segment
- run background GC

Benefit: GC count and block movement count both decrease obviously:

- Before:
  - Valid: 86
  - Dirty: 1
  - Prefree: 11
  - Free: 6001 (6001)

GC calls: 162 (BG: 220)
  - data segments : 160 (160)
  - node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
  - data blocks : 40960 (40960)
  - node blocks : 494 (494)

IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments

- After:

  - Valid: 87
  - Dirty: 0
  - Prefree: 4
  - Free: 6008 (6008)

GC calls: 75 (BG: 76)
  - data segments : 74 (74)
  - node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
  - data blocks : 12544 (12544)
  - node blocks : 269 (269)

IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-11 11:11:15 -07:00
David Howells
245500d853 rxrpc: Rewrite the client connection manager
Rewrite the rxrpc client connection manager so that it can support multiple
connections for a given security key to a peer.  The following changes are
made:

 (1) For each open socket, the code currently maintains an rbtree with the
     connections placed into it, keyed by communications parameters.  This
     is tricky to maintain as connections can be culled from the tree or
     replaced within it.  Connections can require replacement for a number
     of reasons, e.g. their IDs span too great a range for the IDR data
     type to represent efficiently, the call ID numbers on that conn would
     overflow or the conn got aborted.

     This is changed so that there's now a connection bundle object placed
     in the tree, keyed on the same parameters.  The bundle, however, does
     not need to be replaced.

 (2) An rxrpc_bundle object can now manage the available channels for a set
     of parallel connections.  The lock that manages this is moved there
     from the rxrpc_connection struct (channel_lock).

 (3) There'a a dummy bundle for all incoming connections to share so that
     they have a channel_lock too.  It might be better to give each
     incoming connection its own bundle.  This bundle is not needed to
     manage which channels incoming calls are made on because that's the
     solely at whim of the client.

 (4) The restrictions on how many client connections are around are
     removed.  Instead, a previous patch limits the number of client calls
     that can be allocated.  Ordinarily, client connections are reaped
     after 2 minutes on the idle queue, but when more than a certain number
     of connections are in existence, the reaper starts reaping them after
     2s of idleness instead to get the numbers back down.

     It could also be made such that new call allocations are forced to
     wait until the number of outstanding connections subsides.

Signed-off-by: David Howells <dhowells@redhat.com>
2020-09-08 21:11:43 +01:00
Steven Price
4beba9486a mm: Add PG_arch_2 page flag
For arm64 MTE support it is necessary to be able to mark pages that
contain user space visible tags that will need to be saved/restored e.g.
when swapped out.

To support this add a new arch specific flag (PG_arch_2). This flag is
only available on 64-bit architectures due to the limited number of
spare page flags on the 32-bit ones.

Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: use CONFIG_64BIT for guarding this new flag]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2020-09-04 12:46:06 +01:00