In dccp_feat_init, when ccid_get_builtin_ccids failsto alloc
memory for rx.val, it should free tx.val before returning an
error.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch "dccp: fix a memleak that dccp_ipv6 doesn't put reqsk
properly" fixed reqsk refcnt leak for dccp_ipv6. The same issue
exists on dccp_ipv4.
This patch is to fix it for dccp_ipv4.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In dccp_v6_conn_request, after reqsk gets alloced and hashed into
ehash table, reqsk's refcnt is set 3. one is for req->rsk_timer,
one is for hlist, and the other one is for current using.
The problem is when dccp_v6_conn_request returns and finishes using
reqsk, it doesn't put reqsk. This will cause reqsk refcnt leaks and
reqsk obj never gets freed.
Jianlin found this issue when running dccp_memleak.c in a loop, the
system memory would run out.
dccp_memleak.c:
int s1 = socket(PF_INET6, 6, IPPROTO_IP);
bind(s1, &sa1, 0x20);
listen(s1, 0x9);
int s2 = socket(PF_INET6, 6, IPPROTO_IP);
connect(s2, &sa1, 0x20);
close(s1);
close(s2);
This patch is to put the reqsk before dccp_v6_conn_request returns,
just as what tcp_conn_request does.
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"qd.id" comes directly from the copy_from_user() on the line before so
we should verify that it's within bounds.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
FCOE offloading failed with:
[qed_sp_fcoe_func_start:150(sp-0-3b:00.02)]Cannot satisfy CQ amount. CQs
requested 8, CQs available 6. Aborting function start
[qed_fcoe_start:821()]Failed to start fcoe
[__qedf_probe:3041]:6: Cannot start FCoE function.
The reason is a newly introduced check in the qed main part. This change
also provides the information about how many CQs are available, so we
simply limit the number of requested CQs..
Fixes: 3c5da94278 ("qed: Share additional information with qedf")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The CPU hotplug related code of this driver can be simplified by:
1) Consolidating the callbacks into a single state. The CPU thread can be
torn down on the CPU which goes offline. There is no point in delaying
that to the CPU dead state
2) Let the core code invoke the online/offline callbacks and remove the
extra for_each_online_cpu() loops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The CPU hotplug related code of this driver can be simplified by:
1) Consolidating the callbacks into a single state. The CPU thread can be
torn down on the CPU which goes offline. There is no point in delaying
that to the CPU dead state
2) Let the core code invoke the online/offline callbacks and remove the
extra for_each_online_cpu() loops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The BNX2I module init/exit code installs/removes the hotplug callbacks with
the cpu hotplug lock held. This worked with the old CPU locking
implementation which allowed recursive locking, but with the new percpu
rwsem based mechanism this is not longer allowed.
Use the _cpuslocked() variants to fix this.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The BNX2FC module init/exit code installs/removes the hotplug callbacks with
the cpu hotplug lock held. This worked with the old CPU locking
implementation which allowed recursive locking, but with the new percpu
rwsem based mechanism this is not longer allowed.
Use the _cpuslocked() variants to fix this.
Reported-by: kernel test robot <fengguang.wu@intel.com>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
bnx2fc_process_new_cqes() has protection against CPU hotplug, which relies
on the per cpu thread pointer. This protection is racy because it happens
only partially with the per cpu fp_work_lock held.
If the CPU is unplugged after the lock is dropped, the wakeup code can
dereference a NULL pointer or access freed and potentially reused memory.
Restructure the code so the thread check and wakeup happens with the
fp_work_lock held.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The rework of the exynos DRM clock handling introduced
warnings for configurations that have CONFIG_PM disabled:
drivers/gpu/drm/exynos/exynos_hdmi.c:736:13: error: 'hdmi_clk_disable_gates' defined but not used [-Werror=unused-function]
static void hdmi_clk_disable_gates(struct hdmi_context *hdata)
^~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/exynos/exynos_hdmi.c:717:12: error: 'hdmi_clk_enable_gates' defined but not used [-Werror=unused-function]
static int hdmi_clk_enable_gates(struct hdmi_context *hdata)
The problem is that the PM functions themselves are inside of
an #ifdef, but some functions they call are not.
This patch removes the #ifdef and instead marks the PM functions
as __maybe_unused, which is a more reliable way to get it right.
Link: https://patchwork.kernel.org/patch/8436281/
Fixes: 9be7e98984 ("drm/exynos/hdmi: clock code re-factoring")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
If the s5p-cec driver is a module and the drm exynos driver is built-in, then
the CEC core will be a module also, causing the CEC notifier to fail (will be
compiled as empty functions).
To prevent this select CEC_CORE if CEC_NOTIFIER is set to ensure the CEC core
is also built into the kernel.
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
The "Fixes" patch was incorrectly merged, as a result PHY is prematurely
powered off and for example Odroid-U3 cannot disable TV power domain
when HDMI cable is unplugged.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 625e63e2 ("drm/exynos/hdmi: fix pipeline disable order")
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
This patch moves drm_bridge_add call into probe.
It doesn't need to call drm_bridge_add call every time
bind callback is called.
Changelog v2
- moved drm_bridge_remove call into remove callback.
- corrected description.
Suggested-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Reviewed-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Remove the error handling of bridge_node because the bridge_node is
optional.
For example, In case of Exynos SoC, a bridge device such as mDNIe and
MIC could be placed between Display Controller and MIPI DSI device but
the bridge device is optional.
Signed-off-by: Hoegeun Kwon <hoegeun.kwon@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.
File size before:
text data bss dec hex filename
12294 1192 0 13486 34ae drivers/gpu/drm/exynos/exynos_hdmi.o
File size after constify hdmi_match_types.
text data bss dec hex filename
13318 176 0 13494 34b6 drivers/gpu/drm/exynos/exynos_hdmi.o
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
num_ioctls is already assigned when declaring the exynos_drm_driver
structure. No need to duplicate it here.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
The buffer passed to bpf_obj_get_info_by_fd() should be initialized
to zeros. Kernel will enforce that to guarantee we can safely extend
info structures in the future.
Making the bpf_obj_get_info_by_fd() call in libbpf perform the zeroing
is problematic, however, since some members of the info structures
may need to be initialized by the callers (for instance pointers
to buffers to which kernel is to dump translated and jited images).
Remove the zeroing and fix up the in-tree callers before any kernel
has been released with this code.
As Daniel points out this seems to be the intended operation anyway,
since commit 95b9afd398 ("bpf: Test for bpf ID") is itself setting
the buffer pointers before calling bpf_obj_get_info_by_fd().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apparently netpoll_setup() assumes that netpoll.dev_name is a pointer
when checking if the device name is set:
if (np->dev_name) {
...
However the field is a character array, therefore the condition always
yields true. Check instead whether the first byte of the array has a
non-zero value.
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-07-25
This series contains updates to i40e and i40evf only.
Gustavo Silva fixes a variable assignment, where the incorrect variable
was being used to store the error parameter.
Carolyn provides a fix for a problem found in systems when entering S4
state, by ensuring that the misc vector's IRQ is disabled as well.
Jake removes the single-threaded restriction on the module workqueue,
which was causing issues with events such as CORER. Does some future
proofing, by changing how the driver displays the UDP tunnel type.
Paul adds a retry in releasing resources if the admin queue times out
during the first attempt to release the resources.
Jesse fixes up references to 32bit timspec, since there are a small set
of errors on 32 bit, so we need to be using the right calls for dealing
with timespec64 variables. Cleaned up code indentation and corrected
an "if" conditional check, as well as making the code flow more clear.
Cast or changed the types to remove warnings for comparing signed and
unsigned types. Adds missing includes in i40evf, which were being used
but were not being directly included.
Daniel Borkmann fixes i40e to fill the XDP prog_id with the id just like
other XDP enabled drivers, so that on dump we can retrieve the attached
program based on the id and dump BPF insns, opcodes, etc back to user
space.
Tushar Dave adds le32_to_cpu while evaluating the hardware descriptor
fields, since they are in little-endian format. Also removed
unnecessary "__packed" to a couple of i40evf structures.
Stefan Assmann fixes an issue when an administratively set MAC was set
and should now be switched back to 00:00:00:00:00:00, the pf_set_mac
flag is not being toggled back to false.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
_CPC is a optinal object for processor device so it's
fine for processor devices in DSDT without CPPC data,
but when booting the system with CPPC enabled in the
kernel but without its support in the firmware, I got
lots of warnings on a 64 core system:
[ 6.346016] acpi ACPI0007:00: CPPC data invalid or not present
[ 6.346028] acpi ACPI0007:01: CPPC data invalid or not present
[ 6.346039] acpi ACPI0007:02: CPPC data invalid or not present
[ 6.346050] acpi ACPI0007:03: CPPC data invalid or not present
[ 6.346063] acpi ACPI0007:04: CPPC data invalid or not present
...
[ 6.346737] acpi ACPI0007:3f: CPPC data invalid or not present
This isn't much useful and a little bit noise, so
switch the dev_warn() to dev_dbg().
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2017-07-25
This series contains updates to ixgbe only.
Tony provides all of the changes in the series, starting with adding a
check to ensure that adding a MAC filter was successful, before setting the
MACVLAN. In order to receive notifications of link configurations of the
external PHY and support the configuration of the internal iXFI link on
X552 devices, Tony enables LASI interrupts. Update the iXFI driver code
flow, since the MAC register NW_MNG_IF_SEL fields have been redefined for
X553 devices, so add MAC checks for iXFI flows. Added additional checks
for flow control autonegotiation, since it is not support for X553 fiber
and XFI devices.
v2: removed unnecessary parens noticed by David Miller in patch 6 of the
series.
v3: dropped patch 6 of the original series, while we work out a more
generic solution for malicious driver detection (MDD) support.
v4: updated patch 1 of the series with the comments from Joe Perches which
were:
- switched logic to return on error
- return 0 on success
- declare retval as an integer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Three misc amd fixes.
* 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/powerplay: fix AVFS voltage offset for Vega10
drm/amdgpu/gfx9: simplify and fix GRBM index selection
drm/amdgpu: Fix blocking in RCU critical section(v2)
Backmerge drm-next with -rc2 in it to pull in a couple stm patches that
were previously incorrectly applied to -misc-next. By picking them up in
the correct manner, git will hopefully fix any errant trees that are out
in the wild.
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Hi all,
After merging the drm-misc tree, today's linux-next build (x86_64
allmodconfig) failed like this:
drivers/staging/vboxvideo/vbox_drv.c:235:2: error: unknown field 'set_busid' specified in initializer
.set_busid = drm_pci_set_busid,
^
drivers/staging/vboxvideo/vbox_drv.c:235:15: error: 'drm_pci_set_busid' undeclared here (not in a function)
.set_busid = drm_pci_set_busid,
^
drivers/staging/vboxvideo/vbox_drv.c: In function 'vbox_init':
drivers/staging/vboxvideo/vbox_drv.c:273:9: error: implicit declaration of function 'drm_pci_init' [-Werror=implicit-function-declaration]
return drm_pci_init(&driver, &vbox_pci_driver);
^
drivers/staging/vboxvideo/vbox_drv.c: In function 'vbox_exit':
drivers/staging/vboxvideo/vbox_drv.c:278:2: error: implicit declaration of function 'drm_pci_exit' [-Werror=implicit-function-declaration]
drm_pci_exit(&driver, &vbox_pci_driver);
^
Caused by commits
5c484cee7e ("drm: Remove drm_driver->set_busid hook")
10631d724d ("drm/pci: Deprecate drm_pci_init/exit completely")
interacting with commit
dd55d44f40 ("staging: vboxvideo: Add vboxvideo to drivers/staging")
from the staging.current tree.
I have applied the following merge fix patch - please check that it
is correct.
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 19 Jul 2017 11:41:01 +1000
Subject: [PATCH] drm: fixes for staging due to API changes in the drm core
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Use [ ! -z "$VAR" ] instead of [ "$VAR" ] to check
whether the given string variable is not zero-length
since it obviously shows what it means.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <srostedt@goodmis.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Output logs only to console if "-" is given to --logdir
option. In this case, ftracetest doesn't record any log
on the disk, and all logs immediately shown (including
all command logs.) Since there is no "tee" in the middle
of command and console, it outputs the log really soon.
This option is useful only when the console is logged.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <srostedt@goodmis.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Add 3-level verbosity for showing traced command log
on console immediately. Since some test cases can cause
kernel pacic if there is a probrem (like regression etc.),
we can not know which command caused the problem without
traced command log. This verbosity (-vvv) solves that
because it shows the log on console immediately. User
can get continuous command/error log.
Note that this is a kind of kernel debug mode, if you
don't see any kernel related issue, you don't need this
verbosity.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <srostedt@goodmis.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Add --fail-unsupported option to fail the test result if
ftracetest gets UNSUPPORTED result. UNSUPPORTED usually
happens when the kernel is old (e.g. stable tree) or some
kernel feature is disabled.
However, if newer kernel has any bug or regression, it
can make test results in UNSUPPORTED too. This option
can detect such kernel regression.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <srostedt@goodmis.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Do not return failure exit code (1) for unsupported testcases,
since it is expected for stable kernels.
Previously, ftracetest is expected to run only on current
release for avoiding regressions. However, nowadays we run
it on stable kernels. This means some test cases must return
unsupported result. In such case, we should NOT exit
ftracetest with error status for unsupported results so that
kselftest (upper tests wrapper) shows it passed correctly.
Note that we continue to treat unresolved results as failure,
if test writers would like to notice user that the test result
should be reviewed, they can use exit_unresolved.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Steven Rostedt (VMware) <srostedt@goodmis.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
The other patches contain a lot of information, so adding this
information in a separate patch. It adds my copyright and a brief
explanation of how the bitmap allocator works. There is a minor typo as
well in the prior explanation so that is fixed.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The simple, and expensive, way to find a free area is to iterate over
the entire bitmap until an area is found that fits the allocation size
and alignment. This patch makes use of an iterate that find an area to
check by using the block level contig hints. It will only return an area
that can fit the size and alignment request. If the request can fit
inside a block, it returns the first_free bit to start checking from to
see if it can be fulfilled prior to the contig hint. The pcpu_alloc_area
check has a bound of a block size added in case it is wrong.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The largest free region will either be a block level contig hint or an
aggregate over the left_free and right_free areas of blocks. This is a
much smaller set of free areas that need to be checked than a full
traverse.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The bitmap allocator must keep metadata consistent. The easiest way is
to scan after every allocation for each affected block and the entire
chunk. This is rather expensive.
The free path can take advantage of current contig hints to prevent
scanning within the start and end block. If a scan is needed, it can
be done by scanning backwards from the start and forwards from the end
to identify the entire free area this can be combined with. The blocks
can then be updated by some basic checks rather than complete block
scans.
A chunk scan happens when the freed area makes a page free, a block
free, or spans across blocks. This is necessary as the contig hint at
this point could span across blocks. The check uses the minimum of page
size and the block size to allow for variable sized blocks. There is a
tradeoff here with not updating after every free. It is possible a
contig hint in one block can be merged with the contig hint in the next
block. This means the contig hint can be off by up to a page. However,
if the chunk's contig hint is contained in one block, the contig hint
will be accurate.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Metadata is kept per block to keep track of where the contig hints are.
Scanning can be avoided when the contig hints are not broken. In that
case, left and right contigs have to be managed manually.
This patch changes the allocation path hint updating to only scan when
contig hints are broken.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch makes the contig hint starting offset optimization from the
previous patch as honest as it can be. For both chunk and block starting
offsets, make sure it keeps the starting offset with the best alignment.
The block skip optimization is added in a later patch when the
pcpu_find_block_fit iterator is swapped in.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch adds chunk->contig_bits_start to keep track of the contig
hint's offset and the check to skip the chunk if it does not fit. If
the chunk's contig hint starting offset cannot satisfy an allocation,
the allocator assumes there is enough memory pressure in this chunk to
either use a different chunk or create a new one. This accepts a less
tight packing for a smoother latency curve.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch adds first_bit to keep track of the first free bit in the
bitmap. This hint helps prevent scanning of fully allocated blocks.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch introduces the bitmap metadata blocks and adds the skeleton
of the code that will be used to maintain these blocks. Each chunk's
bitmap is made up of full metadata blocks. These blocks maintain basic
metadata to help prevent scanning unnecssarily to update hints. Full
scanning methods are used for the skeleton and will be replaced in the
coming patches. A number of helper functions are added as well to do
conversion of pages to blocks and manage offsets. Comments will be
updated as the final version of each function is added.
There exists a relationship between PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE,
the region size, and unit_size. Every chunk's region (including offsets)
is page aligned at the beginning to preserve alignment. The end is
aligned to LCM(PAGE_SIZE, PCPU_BITMAP_BLOCK_SIZE) to ensure that the end
can fit with the populated page map which is by page and every metadata
block is fully accounted for. The unit_size is already page aligned, but
must also be aligned with PCPU_BITMAP_BLOCK_SIZE to ensure full metadata
blocks.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The percpu memory allocator is experiencing scalability issues when
allocating and freeing large numbers of counters as in BPF.
Additionally, there is a corner case where iteration is triggered over
all chunks if the contig_hint is the right size, but wrong alignment.
This patch replaces the area map allocator with a basic bitmap allocator
implementation. Each subsequent patch will introduce new features and
replace full scanning functions with faster non-scanning options when
possible.
Implementation:
This patchset removes the area map allocator in favor of a bitmap
allocator backed by metadata blocks. The primary goal is to provide
consistency in performance and memory footprint with a focus on small
allocations (< 64 bytes). The bitmap removes the heavy memmove from the
freeing critical path and provides a consistent memory footprint. The
metadata blocks provide a bound on the amount of scanning required by
maintaining a set of hints.
In an effort to make freeing fast, the metadata is updated on the free
path if the new free area makes a page free, a block free, or spans
across blocks. This causes the chunk's contig hint to potentially be
smaller than what it could allocate by up to the smaller of a page or a
block. If the chunk's contig hint is contained within a block, a check
occurs and the hint is kept accurate. Metadata is always kept accurate
on allocation, so there will not be a situation where a chunk has a
later contig hint than available.
Evaluation:
I have primarily done testing against a simple workload of allocation of
1 million objects (2^20) of varying size. Deallocation was done by in
order, alternating, and in reverse. These numbers were collected after
rebasing ontop of a80099a152. I present the worst-case numbers here:
Area Map Allocator:
Object Size | Alloc Time (ms) | Free Time (ms)
----------------------------------------------
4B | 310 | 4770
16B | 557 | 1325
64B | 436 | 273
256B | 776 | 131
1024B | 3280 | 122
Bitmap Allocator:
Object Size | Alloc Time (ms) | Free Time (ms)
----------------------------------------------
4B | 490 | 70
16B | 515 | 75
64B | 610 | 80
256B | 950 | 100
1024B | 3520 | 200
This data demonstrates the inability for the area map allocator to
handle less than ideal situations. In the best case of reverse
deallocation, the area map allocator was able to perform within range
of the bitmap allocator. In the worst case situation, freeing took
nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator
dramatically improves the consistency of the free path. The small
allocations performed nearly identical regardless of the freeing
pattern.
While it does add to the allocation latency, the allocation scenario
here is optimal for the area map allocator. The area map allocator runs
into trouble when it is allocating in chunks where the latter half is
full. It is difficult to replicate this, so I present a variant where
the pages are second half filled. Freeing was done sequentially. Below
are the numbers for this scenario:
Area Map Allocator:
Object Size | Alloc Time (ms) | Free Time (ms)
----------------------------------------------
4B | 4118 | 4892
16B | 1651 | 1163
64B | 598 | 285
256B | 771 | 158
1024B | 3034 | 160
Bitmap Allocator:
Object Size | Alloc Time (ms) | Free Time (ms)
----------------------------------------------
4B | 481 | 67
16B | 506 | 69
64B | 636 | 75
256B | 892 | 90
1024B | 3262 | 147
The data shows a parabolic curve of performance for the area map
allocator. This is due to the memmove operation being the dominant cost
with the lower object sizes as more objects are packed in a chunk and at
higher object sizes, the traversal of the chunk slots is the dominating
cost. The bitmap allocator suffers this problem as well. The above data
shows the inability to scale for the allocation path with the area map
allocator and that the bitmap allocator demonstrates consistent
performance in general.
The second problem of additional scanning can result in the area map
allocator completing in 52 minutes when trying to allocate 1 million
4-byte objects with 8-byte alignment. The same workload takes
approximately 16 seconds to complete for the bitmap allocator.
V2:
Fixed a bug in pcpu_alloc_first_chunk end_offset was setting the bitmap
using bytes instead of bits.
Added a comment to pcpu_cnt_pop_pages to explain bitmap_weight.
Signed-off-by: Dennis Zhou <dennisszhou@gmail.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
While not a crash test, this does provide two tight atomic_t and
refcount_t loops for performance comparisons:
cd /sys/kernel/debug/provoke-crash
perf stat -B -- cat <(echo ATOMIC_TIMING) > DIRECT
perf stat -B -- cat <(echo REFCOUNT_TIMING) > DIRECT
Looking a CPU cycles is the best way to example the fast-path (rather
than instruction counts, since conditional jumps will be executed but
will be negligible due to branch-prediction).
Signed-off-by: Kees Cook <keescook@chromium.org>
The existing REFCOUNT_* LKDTM tests were designed only for testing a narrow
portion of CONFIG_REFCOUNT_FULL. This moves the tests to their own file and
expands their testing to poke each boundary condition.
Since the protections (CONFIG_REFCOUNT_FULL and x86-fast) use different
saturation values and reach-zero behavior, those have to be build-time
set so the tests can actually validate things are happening at the
right places.
Notably, the x86-fast protection will fail REFCOUNT_INC_ZERO and
REFCOUNT_ADD_ZERO since those conditions are not checked (only overflow
is critical to protecting refcount_t). CONFIG_REFCOUNT_FULL will warn for
each REFCOUNT_*_NEGATIVE test since it provides zero-pinning behaviors
(which allows it to pass REFCOUNT_INC_ZERO and REFCOUNT_ADD_ZERO).
Signed-off-by: Kees Cook <keescook@chromium.org>
for_each_compatible_node performs an of_node_get on each iteration, so a
return from the loop requires an of_node_put.
The semantic patch that fixes this problem is as follows
(http://coccinelle.lip6.fr):
// <smpl>
@@
local idexpression n;
expression e,e1,e2;
statement S;
iterator i1;
iterator name for_each_compatible_node;
@@
for_each_compatible_node(n,e1,e2) {
...
(
of_node_put(n);
|
e = n
|
return n;
|
i1(...,n,...) S
|
+ of_node_put(n);
? return ...;
)
...
}
// </smpl>
Additionally, call of_node_put on the previous value of np, obtained from
of_find_compatible_node, that is no longer accessible at the point of the
for_each_compatible_node.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit bd8b244174 ("NFS: Store the raw NFS access mask in the inode's
access cache") changed how the access results are stored after an
access() call. An NFS v4 OPEN might have access bits returned with the
opendata, so we should use the NFS4_ACCESS values when determining the
return value in nfs4_opendata_access().
Fixes: bd8b244174 ("NFS: Store the raw NFS access mask in the inode's
access cache")
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Tested-by: Takashi Iwai <tiwai@suse.de>
Make iowait_boost and iowait_boost_max as unsigned int since its unit
is kHz and this is consistent with struct cpufreq_policy. Also change
the local variables in sugov_iowait_boost() to match this.
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>