We blindly assume that we can just take the rtnl lock and that will
prevent races with downing this interface. Unfortunately, that's not
the case. In ipoib_mcast_stop_thread() we will call flush_workqueue()
in an attempt to clear out all remaining instances of ipoib_join_task.
But, since this task is put on the same workqueue as the join task,
the flush_workqueue waits on this thread too. But this thread is
deadlocked on the rtnl lock. The better thing here is to use trylock
and loop on that until we either get the lock or we see that
FLAG_OPER_UP has been cleared, in which case we don't need to do
anything anyway and we just return.
While investigating which flag should be used, FLAG_ADMIN_UP or
FLAG_OPER_UP, it was determined that FLAG_OPER_UP was the more
appropriate flag to use. However, there was a mix of these two flags in
use in the existing code. So while we check for that flag here as part
of this race fix, also cleanup the two places that had used the less
appropriate flag for their tests.
Signed-off-by: Doug Ledford <dledford@redhat.com>
The ipoib_mcast_flush_dev routine is called with the rtnl_lock held and
needs to keep it held. It also needs to call flush_workqueue() to flush
out any outstanding work. In the past, we've had to try and make sure
that we didn't flush out any outstanding join completions because they
also wanted to grab rtnl_lock() and that would deadlock. It turns out
that the only thing in the join completion handler that needs this lock
can be safely moved to our carrier_on_task, thereby reducing the
potential for the join completion code and the flush code to deadlock
against each other.
Signed-off-by: Doug Ledford <dledford@redhat.com>
In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly
still on it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Create a an ipoib_flush_ah and ipoib_stop_ah routines to use at
appropriate times to flush out all remaining ah entries before we shut
the device down.
Because neighbors and mcast entries can each have a reference on any
given ah, we must make sure to free all of those first before our ah
will actually have a 0 refcount and be able to be reaped.
This factoring is needed in preparation for having per-device work
queues. The original per-device workqueue code resulted in the following
error message:
<ibdev>: ib_dealloc_pd failed
That error was tracked down to this issue. With the changes to which
workqueues were flushed when, there were no flushes of the per device
workqueue after the last ah's were freed, resulting in an attempt to
dealloc the pd with outstanding resources still allocated. This code
puts the explicit flushes in the needed places to avoid that problem.
Signed-off-by: Doug Ledford <dledford@redhat.com>
In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057ab5
("IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic") will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).
This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().
There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.
Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org> # 8494057ab5 ("IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic")
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.
This should not be allowed: it's not expected for a memory
region to have a size equal to 0.
This patch adds a check to explicitly refuse to register
a 0-sized region.
Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
libpci 3.3.0 introduced an additional member in the pci_filter struct
which needs to be initialized to -1 to get the same behavior as before
the API change. The libpci internal helpers got updated accordingly,
but as the cpupower pci helpers initialized the struct themselves the
behavior changed.
Use the libpci helper pci_filter_init() to fix this and guard against
similar breakages in the future.
This fixes probing of the AMD fam12h/14h cpuidle monitor on systems
with libpci >= 3.3.0.
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Acked-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Change the default mode to be HOST assigned instead of SM assigned. This is
the expected operational mode, because it doesn't depend on SM availability.
As PF generates random GUIDs as the initial admin values, this gives
out of the box experience.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Return the admin alias GUID value upon a GET request via HOST. We do this so
that the GUID value requested by the admin is returned even if the SM has not
yet approved this GUID (e.g. the SM is down).
Note that this does not create a problem, since the virtual port will remain
down until the SM does ACK the requested GUID value.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There might be cases that PF doesn't get a "reset" command upon slave down
(e.g. virsh destroy). In these cases, however, an FLR event is issued.
Therefore, when the PF receives an FLR event for a slave, it should also
generate a shutdown event on the PF for that slave, to let the PF upper
layers (mlx4_ib, eth) perform any required cleanup/actions associated
with slave shutdown.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Request GIDs from the SM on demand, i.e., when a VF actually needs them,
and release them when the GIDs are no longer in use.
In cloud environments, this is useful for GID migrations, in which a
GID is assigned to a VF on the destination HCA, while the VF on the
source HCA is shutdown (but the GID was not administratively released).
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Change the init flow to ask GUIDs only for active VFs. This is done for
both SM & HOST modes so that there is no need any more to maintain the
ownership record type.
In case SM mode is used, the initial value will be 0, ask the SM to assign,
for the HOST mode the initial value will be the HOST generated GUID.
This will enable out of the box experience for both probed and attached VFs.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Set the admin alias GUID per the administrator's request via the sysfs
mechanism into the core layer.
The "get" request returns the current value. However, if the administrator
requests the SM to assign a new value by requesting 0, the SM assigned
GUID is returned.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To have out of the box experience, the PF generates random GUIDs who
serve as the initial admin values.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Manages alias GUIDs per VF per port in the core layer.
This is a pre-step for managing alias GUIDs in a mode that the admin
GUID is returned via ib_query_gid() regardless of whether the SM
has approved it or not.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If the SM rejects an alias GUID request the PF driver keeps trying to acquire
the specified GUID indefinitely, utilizing an exponential backoff scheme.
Retrying is managed per GUID entry. Each entry that wasn't applied holds its
next retry information. Retry requests to the SM consist of records of 8
consecutive GUIDS. Each record that contains GUIDs requiring retries holds its
next time-to-run based on the retry information of all its GUID entries. The
record having the lowest retry time will run first when that retry time
arrives.
Since the method (SET or DELETE) as sent to the SM applies to all the GUIDs in
the record, we must handle SET requests and DELETE requests in separate SM
messages (one for SETs and the other for DELETEs).
To avoid race conditions where a GUID entry request (set or delete) was
modified after the SM request was sent, we save the method and the requested
indices as part of the callback's context -- thus, only the requested indexes
are evaluated when the response is received.
When an GUID entry is approved we turn off its retry-required bit, this
prevents redundant SM retries from occurring on that record.
The port down event should be sent only when previously it was up. Likewise,
the port up event should be sent only if previously the port was down.
Synchronization was added around the flows that change entries and record state
to prevent race conditions.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
As pointed out by Stephen Rothwell, commit e52117638b ("ARM: dts:
omap3: Add DT entries for OMAP 3 ISP") conflicts with b8845074cf
("ARM: dts: omap3: add minimal l4 bus layout with control module support")
in non-obvious ways, causing a build failure when both patches
are present.
This merges the two branches that introduce the respective changes
into the next/late branch to resolve the way that Stephen suggested,
as confirmed by Tony.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.org/lkml/2015/4/6/436
Acked-by: Tony Lindgren <tony@atomide.com>
these should be used on objects already in top layer
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
library helpers called by filesystem drivers on their own inodes
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
... except where that code acts as a filesystem driver, rather than
working with dentries given to it.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
most of the ->d_inode uses there refer to the same inode IO would
go to, i.e. d_backing_inode()
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
socket inodes and sunrpc filesystems - inodes owned by that code
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
places where we are dealing with S_ISSOCK file creation/lookups.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
relayfs and tracefs are dealing with inodes of their own;
those two act as filesystem drivers
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix up some ->d_inode accesses in the chelsio driver.
(1) FILE_DATA() should just be replaced with file_inode().
(2) set_debugfs_file_size() should be removed and debugfs_create_file_size()
should be used to create the file.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cachefiles should perform fs modifications (eg. vfs_unlink()) on the top layer
only and should not attempt to alter the lower layer.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
AF_UNIX sockets should call mknod on the top layer only and should not attempt
to modify the lower layer in a layered filesystem such as overlayfs.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix inconsistent use of file_inode() vs file->f_path.dentry->d_inode.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make pathwalk use d_is_reg() rather than S_ISREG() to determine whether to
honour O_TRUNC. Since this occurs after complete_walk(), the dentry type
field cannot change and the inode pointer cannot change as we hold a ref on
the dentry, so this should be safe.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fix up debugfs to use d_is_dir(dentry) in place of
S_ISDIR(dentry->d_inode->i_mode).
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Where we have:
if (!dentry->d_inode || d_is_negative(dentry)) {
type constructions in pathwalk we should be able to eliminate the check of
d_inode and rely solely on the result of d_is_negative() or d_is_positive().
What we do have to take care to do is to read d_inode after calling a
d_is_xxx() typecheck function to get the barriering right.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Don't use d_inode as a variable name as it now masks a function name.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Impose ordering on accesses of d_inode and d_flags to avoid the need to do
this:
if (!dentry->d_inode || d_is_negative(dentry)) {
when this:
if (d_is_negative(dentry)) {
should suffice.
This check is especially problematic if a dentry can have its type field set
to something other than DENTRY_MISS_TYPE when d_inode is NULL (as in
unionmount).
What we really need to do is stick a write barrier between setting d_inode and
setting d_flags and a read barrier between reading d_flags and reading
d_inode.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Supply two functions to test whether a filesystem's own dentries are positive
or negative (d_really_is_positive() and d_really_is_negative()).
The problem is that the DCACHE_ENTRY_TYPE field of dentry->d_flags may be
overridden by the union part of a layered filesystem and isn't thus
necessarily indicative of the type of dentry.
Normally, this would involve a negative dentry (ie. ->d_inode == NULL) having
->d_layer.lower pointed to a lower layer dentry, DCACHE_PINNING_LOWER set and
the DCACHE_ENTRY_TYPE field set to something other than DCACHE_MISS_TYPE - but
it could also involve, say, a DCACHE_SPECIAL_TYPE being overridden to
DCACHE_WHITEOUT_TYPE if a 0,0 chardev is detected in the top layer.
However, inside a filesystem, when that fs is looking at its own dentries, it
probably wants to know if they are really negative or not - and doesn't care
about the fallthrough bits used by the union.
To this end, a filesystem should normally use d_really_is_positive/negative()
when looking at its own dentries rather than d_is_positive/negative() and
should use d_inode() to get at the inode.
Anyone looking at someone else's dentries (this includes pathwalk) should use
d_is_xxx() and d_backing_inode().
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Current -next fails to link an ARM allmodconfig because drivers that use
the core recovery functions can be built as modules but those functions
are not exported:
ERROR: "i2c_generic_gpio_recovery" [drivers/i2c/busses/i2c-davinci.ko] undefined!
ERROR: "i2c_generic_scl_recovery" [drivers/i2c/busses/i2c-davinci.ko] undefined!
ERROR: "i2c_recover_bus" [drivers/i2c/busses/i2c-davinci.ko] undefined!
Add exports to fix this.
Fixes: 5f9296ba21 (i2c: Add bus recovery infrastructure)
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Pull kconfig updates from Michal Marek:
"Here is the kconfig stuff for v4.1-rc1:
- fixes for mergeconfig (used by make kvmconfig/tinyconfig)
- header cleanup
- make -s *config is silent now"
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kconfig: Do not print status messages in make -s mode
kconfig: Simplify Makefile
kbuild: add generic mergeconfig target, %.config
merge_config.sh: rename MAKE to RUNMAKE
merge_config.sh: improve indentation
kbuild: mergeconfig: remove redundant $(objtree)
kbuild: mergeconfig: move an error check to merge_config.sh
kbuild: mergeconfig: fix "jobserver unavailable" warning
kconfig: Remove unnecessary prototypes from headers
kconfig: Remove dead code
kconfig: Get rid of the P() macro in headers
kconfig: fix a misspelling in scripts/kconfig/merge_config.sh
Pull kbuild updates from Michal Marek:
"Here is the first round of kbuild changes for v4.1-rc1:
- kallsyms fix for ARM and cleanup
- make dep(end) removed (developers have no sense of nostalgia these
days...)
- include Makefiles by relative path
- stop useless rebuilds of asm-offsets.h and bounds.h"
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
Kbuild: kallsyms: drop special handling of pre-3.0 GCC symbols
Kbuild: kallsyms: ignore veneers emitted by the ARM linker
kbuild: ia64: use $(src)/Makefile.gate rather than particular path
kbuild: include $(src)/Makefile rather than $(obj)/Makefile
kbuild: use relative path more to include Makefile
kbuild: use relative path to include Makefile
kbuild: do not add $(bounds-file) and $(offsets-file) to targets
kbuild: remove warning about "make depend"
kbuild: Don't reset timestamps in include/generated if not needed
Pull security subsystem updates from James Morris:
"Highlights for this window:
- improved AVC hashing for SELinux by John Brooks and Stephen Smalley
- addition of an unconfined label to Smack
- Smack documentation update
- TPM driver updates"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (28 commits)
lsm: copy comm before calling audit_log to avoid race in string printing
tomoyo: Do not generate empty policy files
tomoyo: Use if_changed when generating builtin-policy.h
tomoyo: Use bin2c to generate builtin-policy.h
selinux: increase avtab max buckets
selinux: Use a better hash function for avtab
selinux: convert avtab hash table to flex_array
selinux: reconcile security_netlbl_secattr_to_sid() and mls_import_netlbl_cat()
selinux: remove unnecessary pointer reassignment
Smack: Updates for Smack documentation
tpm/st33zp24/spi: Add missing device table for spi phy.
tpm/st33zp24: Add proper wait for ordinal duration in case of irq mode
smack: Fix gcc warning from unused smack_syslog_lock mutex in smackfs.c
Smack: Allow an unconfined label in bringup mode
Smack: getting the Smack security context of keys
Smack: Assign smack_known_web as default smk_in label for kernel thread's socket
tpm/tpm_infineon: Use struct dev_pm_ops for power management
MAINTAINERS: Add Jason as designated reviewer for TPM
tpm: Update KConfig text to include TPM2.0 FIFO chips
tpm/st33zp24/dts/st33zp24-spi: Add dts documentation for st33zp24 spi phy
...
Non interleaved dualpoint v2 devices have separate pointstick button bits,
document this.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
This change allows atmel_mxt_ts to bind to ACPI-enumerated devices in
Google Pixel 2 (2015).
While newer version of ACPI standard allow use of device-tree-like
properties in device descriptions, the version of ACPI implemented in
Google BIOS does not support them, and we have to resort to DMI data to
specify exact characteristics of the devices (touchpad vs. touchscreen,
GPIO to button mapping, etc).
Pixel 1 continues to use i2c devices and platform data created by
chromeos-laptop driver, since ACPI does not enumerate them.
Reviewed-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Tested-by: Javier Martinez Canillas <javier.martinez@collabora.co.uk>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Pull crypto update from Herbert Xu:
"Here is the crypto update for 4.1:
New interfaces:
- user-space interface for AEAD
- user-space interface for RNG (i.e., pseudo RNG)
New hashes:
- ARMv8 SHA1/256
- ARMv8 AES
- ARMv8 GHASH
- ARM assembler and NEON SHA256
- MIPS OCTEON SHA1/256/512
- MIPS img-hash SHA1/256 and MD5
- Power 8 VMX AES/CBC/CTR/GHASH
- PPC assembler AES, SHA1/256 and MD5
- Broadcom IPROC RNG driver
Cleanups/fixes:
- prevent internal helper algos from being exposed to user-space
- merge common code from assembly/C SHA implementations
- misc fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (169 commits)
crypto: arm - workaround for building with old binutils
crypto: arm/sha256 - avoid sha256 code on ARMv7-M
crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer
crypto: x86/sha256_ssse3 - move SHA-224/256 SSSE3 implementation to base layer
crypto: x86/sha1_ssse3 - move SHA-1 SSSE3 implementation to base layer
crypto: arm64/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer
crypto: arm64/sha1-ce - move SHA-1 ARMv8 implementation to base layer
crypto: arm/sha2-ce - move SHA-224/256 ARMv8 implementation to base layer
crypto: arm/sha256 - move SHA-224/256 ASM/NEON implementation to base layer
crypto: arm/sha1-ce - move SHA-1 ARMv8 implementation to base layer
crypto: arm/sha1_neon - move SHA-1 NEON implementation to base layer
crypto: arm/sha1 - move SHA-1 ARM asm implementation to base layer
crypto: sha512-generic - move to generic glue implementation
crypto: sha256-generic - move to generic glue implementation
crypto: sha1-generic - move to generic glue implementation
crypto: sha512 - implement base layer for SHA-512
crypto: sha256 - implement base layer for SHA-256
crypto: sha1 - implement base layer for SHA-1
crypto: api - remove instance when test failed
crypto: api - Move alg ref count init to crypto_check_alg
...
In flush_busy_ctxs() and blk_mq_hctx_has_pending(), regardless of how many
ctxs assigned to one hctx, they will all loop hctx->ctx_map.map_size
times. Here hctx->ctx_map.map_size is a const ALIGN(nr_cpu_ids, 8) / 8.
Especially, flush_busy_ctxs() is in hot code path. And it's unnecessary.
Change ->map_size to contain the actually mapped software queues, so we
only loop for as many iterations as we have to.
And remove cpumask setting and nr_ctx count in blk_mq_init_cpu_queues()
since they are all re-done in blk_mq_map_swqueue().
blk_mq_map_swqueue().
Signed-off-by: Chong Yuan <chong.yuan@memblaze.com>
Reviewed-by: Wenbo Wang <wenbo.wang@memblaze.com>
Updated by me for formatting and commenting.
Signed-off-by: Jens Axboe <axboe@fb.com>
exit_aio() currently serializes killing io contexts. Each context
killing ends up having to do percpu_ref_kill(), which in turns has
to wait for an RCU grace period. This can take a long time, depending
on the number of contexts. And there's no point in doing them serially,
when we could be waiting for all of them in one fell swoop.
This patches makes my fio thread offload test case exit 0.2s instead
of almost 6s.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>