When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
the following dump occurs.
BUG: kernel NULL pointer dereference, address: 0000000000000020
[...]
Oops: 0000 [#1] SMP
[...]
Workqueue: events kfree_rcu_work
RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
[...]
Call Trace:
kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
kfree_bulk include/linux/slab.h:413 [inline]
kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
process_one_work+0x207/0x530 kernel/workqueue.c:2276
worker_thread+0x320/0x610 kernel/workqueue.c:2422
kthread+0x13d/0x160 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
When kmalloc_node() a large memory, page is allocated, not slab, so when
freeing memory via kfree_rcu(), this large memory should not be used by
memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
slab.
Using page_objcgs_check() instead of page_objcgs() in
memcg_slab_free_hook() to fix this bug.
Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
Fixes: 270c6a7146 ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter reports:
The patch 2d146aa3aa: "mm: memcontrol: switch to rstat" from Apr
29, 2021, leads to the following static checker warning:
kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
warn: sleeping in atomic context
mm/memcontrol.c
3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
3573 {
3574 unsigned long val;
3575
3576 if (mem_cgroup_is_root(memcg)) {
3577 cgroup_rstat_flush(memcg->css.cgroup);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is from static analysis and potentially a false positive. The
problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
which holds an rcu_read_lock(). And the cgroup_rstat_flush() function
can sleep.
3578 val = memcg_page_state(memcg, NR_FILE_PAGES) +
3579 memcg_page_state(memcg, NR_ANON_MAPPED);
3580 if (swap)
3581 val += memcg_page_state(memcg, MEMCG_SWAP);
3582 } else {
3583 if (!swap)
3584 val = page_counter_read(&memcg->memory);
3585 else
3586 val = page_counter_read(&memcg->memsw);
3587 }
3588 return val;
3589 }
__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
thresholding code is invoked during stat changes, and those contexts
have irqs disabled as well. If the lock breaking occurs inside the
flush function, it will result in a sleep from an atomic context.
Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.
Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org
Fixes: 2d146aa3aa ("mm: memcontrol: switch to rstat")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Chris Down <chris@chrisdown.name>
Reviewed-by: Rik van Riel <riel@surriel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For punch holes in EOF blocks, fallocate used buffer write to zero the
EOF blocks in last cluster. But since ->writepage will ignore EOF
pages, those zeros will not be flushed.
This "looks" ok as commit 6bba4471f0 ("ocfs2: fix data corruption by
fallocate") will zero the EOF blocks when extend the file size, but it
isn't. The problem happened on those EOF pages, before writeback, those
pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
set, when writeback run by write_cache_pages(), DIRTY flag on the page
was cleared, but DIRTY flag on the buffer_head not.
When next write happened to those EOF pages, since buffer_head already
had DIRTY flag set, it would not mark page DIRTY again. That made
writeback ignore them forever. That will cause data corruption. Even
directio write can't work because it will fail when trying to drop pages
caches before direct io, as it found the buffer_head for those pages
still had DIRTY flag set, then it will fall back to buffer io mode.
To make a summary of the issue, as writeback ingores EOF pages, once any
EOF page is generated, any write to it will only go to the page cache,
it will never be flushed to disk even file size extends and that page is
not EOF page any more. The fix is to avoid zero EOF blocks with buffer
write.
The following code snippet from qemu-img could trigger the corruption.
656 open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
...
660 fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
660 fallocate(11, 0, 2275868672, 327680) = 0
658 pwrite64(11, "
Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
STRING_SELFTEST is presented in the "Library routines" menu. Move it in
Kernel hacking > Kernel Testing and Coverage > Runtime Testing together
with other similar tests found in lib/
--- Runtime Testing
<*> Test functions located in the hexdump module at runtime
<*> Test string functions (NEW)
<*> Test functions located in the string_helpers module at runtime
<*> Test strscpy*() family of functions at runtime
<*> Test kstrto*() family of functions at runtime
<*> Test printf() family of functions at runtime
<*> Test scanf() family of functions at runtime
Link: https://lkml.kernel.org/r/20210719185158.190371-1-mcroce@linux.microsoft.com
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Cc: Peter Rosin <peda@axentia.se>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The only real platform with DISPLAY_VER == 10 is GLK. We don't need to
handle CNL explicitly in intel_ddi.c.
A lot of special code for CNL can be removed. There were some
__cnl.*() functions that were created to share the implementation
between ICL and CNL. Those are now embedded in the only caller, in ICL.
Remove code and rename functions/macros accordingly to use ICL prefix
for those that are still needed.
Verified with:
make EXTRA_CFLAGS=-Wunused drivers/gpu/drm/i915/display/intel_dpll_mgr.o
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210729233934.2059489-1-lucas.demarchi@intel.com
The transition to the managed PCM buffers allowed the dynamically
buffer allocation, while the driver code still assumes the fixed
preallocation buffer and sets up the DMA stuff at the open call.
This needs to be moved to hw_params after the buffer allocation and
setup. Also, the reference to the buffer address has to be corrected
to runtime->dma_addr.
Fixes: b3c0ae75f5 ("ASoC: kirkwood: Use managed DMA buffer allocation")
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210728112353.6675-6-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
Along with the transition to the managed PCM buffers, the driver now
accepts the dynamically allocated buffer, while it still kept the
reference to the old preallocated buffer address. This patch corrects
to the right reference via runtime->dma_addr.
(Although this might have been already buggy before the cleanup with
the managed buffer, let's put Fixes tag to point that; it's a corner
case, after all.)
Fixes: d55894bc27 ("ASoC: uniphier: Use managed buffer allocation")
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210728112353.6675-5-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
PCM buffers might be allocated dynamically when the buffer
preallocation failed or a larger buffer is requested, and it's not
guaranteed that substream->dma_buffer points to the actually used
buffer. The driver needs to refer to substream->runtime->dma_addr
instead for the buffer address.
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210728112353.6675-4-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
PCM buffers might be allocated dynamically when the buffer
preallocation failed or a larger buffer is requested, and it's not
guaranteed that substream->dma_buffer points to the actually used
buffer. The address should be retrieved from runtime->dma_addr,
instead of substream->dma_buffer (and shouldn't use virt_to_phys).
Also, remove the line overriding runtime->dma_area superfluously,
which was already set up at the PCM buffer allocation.
Cc: Cezary Rojewski <cezary.rojewski@intel.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Link: https://lore.kernel.org/r/20210728112353.6675-3-tiwai@suse.de
Signed-off-by: Mark Brown <broonie@kernel.org>
ecap0 can be configured to use pad ECAP0_IN_APWM_OUT (D18) which has a
signal connected to Pin 1 of J12 on EVM. Add support for adding this
pinmux so that pwm can be observed on pin 1 of Header J12
Also mark all un-used epwm and ecap pwm nodes as disabled.
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
Signed-off-by: Nishanth Menon <nm@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Link: https://lore.kernel.org/r/20210721113625.17299-4-lokeshvutla@ti.com
No need for multiple spaces in variable declaration (the code does not
use them in other places). No functional change.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Several functions receive pointers to u8, sk_buff or other structs but
do not modify the contents so make them const. This allows doing the
same for local variables and in total makes the code a little bit safer.
This makes const also data passed as "unsigned long opt" argument to
nci_request() function. Usual flow for such functions is:
1. Receive "u8 *" and store it (the pointer) in a structure
allocated on stack (e.g. struct nci_set_config_param),
2. Call nci_request() or __nci_request() passing a callback function an
the pointer to the structure via an "unsigned long opt",
3. nci_request() calls the callback which dereferences "unsigned long
opt" in a read-only way.
This converts all above paths to use proper pointer to const data, so
entire flow is safer.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Several functions receive pointers to u8, char or sk_buff but do not
modify the contents so make them const. This allows doing the same for
local variables and in total makes the code a little bit safer.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>