generic/530 on a machine with enough ram and a non-preemptible
kernel can run the AGI processing phase of log recovery enitrely out
of cache. This means it never blocks on locks, never waits for IO
and runs entirely through the unlinked lists until it either
completes or blocks and hangs because it has run out of log space.
It runs out of log space because the background CIL push is
scheduled but never runs. queue_work() queues the CIL work on the
current CPU that is busy, and the workqueue code will not run it on
any other CPU. Hence if the unlinked list processing never yields
the CPU voluntarily, the push work is delayed indefinitely. This
results in the CIL aggregating changes until all the log space is
consumed.
When the log recoveyr processing evenutally blocks, the CIL flushes
but because the last iclog isn't submitted for IO because it isn't
full, the CIL flush never completes and nothing ever moves the log
head forwards, or indeed inserts anything into the tail of the log,
and hence nothing is able to get the log moving again and recovery
hangs.
There are several problems here, but the two obvious ones from
the trace are that:
a) log recovery does not yield the CPU for over 4 seconds,
b) binding CIL pushes to a single CPU is a really bad idea.
This patch addresses just these two aspects of the problem, and are
suitable for backporting to work around any issues in older kernels.
The more fundamental problem of preventing the CIL from consuming
more than 50% of the log without committing will take more invasive
and complex work, so will be done as followup work.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The code in xlog_wait uses the spinlock to make adding the task to
the wait queue, and setting the task state to UNINTERRUPTIBLE atomic
with respect to the waker.
Doing the wakeup after releasing the spinlock opens up the following
race condition:
Task 1 task 2
add task to wait queue
wake up task
set task state to UNINTERRUPTIBLE
This issue was found through code inspection as a result of kworkers
being observed stuck in UNINTERRUPTIBLE state with an empty
wait queue. It is rare and largely unreproducable.
Simply moving the spin_unlock to after the wake_up_all results
in the waker not being able to see a task on the waitqueue before
it has set its state to UNINTERRUPTIBLE.
This bug dates back to the conversion of this code to generic
waitqueue infrastructure from a counting semaphore back in 2008
which didn't place the wakeups consistently w.r.t. to the relevant
spin locks.
[dchinner: Also fix a similar issue in the shutdown path on
xc_commit_wait. Update commit log with more details of the issue.]
Fixes: d748c62367 ("[XFS] Convert l_flushsema to a sv_t")
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
In the situation where the log is full and the CIL has not recently
flushed, the AIL push threshold is throttled back to the where the
last write of the head of the log was completed. This is stored in
log->l_last_sync_lsn. Hence if the CIL holds > 25% of the log space
pinned by flushes and/or aggregation in progress, we can get the
situation where the head of the log lags a long way behind the
reservation grant head.
When this happens, the AIL push target is trimmed back from where
the reservation grant head wants to push the log tail to, back to
where the head of the log currently is. This means the push target
doesn't reach far enough into the log to actually move the tail
before the transaction reservation goes to sleep.
When the CIL push completes, it moves the log head forward such that
the AIL push target can now be moved, but that has no mechanism for
puhsing the log tail. Further, if the next tail movement of the log
is not large enough wake the waiter (i.e. still not enough space for
it to have a reservation granted), we don't wake anything up, and
hence we do not update the AIL push target to take into account the
head of the log moving and allowing the push target to be moved
forwards.
To avoid this particular condition, if we fail to wake the first
waiter on the grant head because we don't have enough space,
push on the AIL again. This will pick up any movement of the log
head and allow the push target to move forward due to completion of
CIL pushing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
If the CONFIG_BUG is enabled, BUG is executed and then system is crashed.
However, the bailout for mount is no longer proceeding.
Using WARN_ON_ONCE rather than BUG can prevent this situation.
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Using the helper blk_queue_required_elevator_features(), set the
elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request
queue of SCSI ZBC disks.
This feature requirement can always be satisfied as the mq-deadline
elevator is always selected for in-kernel compilation when
CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Using the helper blk_queue_required_elevator_features(), set the
elevator feature ELEVATOR_F_ZBD_SEQ_WRITE as required for the request
queue of null_blk devices created with zoned mode enabled.
This feature requirement can always be satisfied as the mq-deadline
elevator is always selected for in-kernel compilation when
CONFIG_BLK_DEV_ZONED (zoned block device support) is enabled.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
the only information known about the device is the number of hardware
queues as the block device scan by the device driver is not completed
yet for most drivers. The device type and elevator required features
are not set yet, preventing to correctly select the default elevator
most suitable for the device.
This currently affects all multi-queue zoned block devices which default
to the "none" elevator instead of the required "mq-deadline" elevator.
These drives currently include host-managed SMR disks connected to a
smartpqi HBA and null_blk block devices with zoned mode enabled.
Upcoming NVMe Zoned Namespace devices will also be affected.
Fix this by adding the boolean elevator_init argument to
blk_mq_init_allocated_queue() to control the execution of
elevator_init_mq(). Two cases exist:
1) elevator_init = false is used for calls to
blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this
case, a call to elevator_init_mq() is added to __device_add_disk(),
resulting in the delayed initialization of the queue elevator
after the device driver finished probing the device information. This
effectively allows elevator_init_mq() access to more information
about the device.
2) elevator_init = true preserves the current behavior of initializing
the elevator directly from blk_mq_init_allocated_queue(). This case
is used for the special request based DM devices where the device
gendisk is created before the queue initialization and device
information (e.g. queue limits) is already known when the queue
initialization is executed.
Additionally, to make sure that the elevator initialization is never
done while requests are in-flight (there should be none when the device
driver calls device_add_disk()), freeze and quiesce the device request
queue before calling blk_mq_init_sched() in elevator_init_mq().
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
For block devices that do not specify required features, preserve the
current default elevator selection (mq-deadline for single queue
devices, none for multi-queue devices). However, for devices specifying
required features (e.g. zoned block devices ELEVATOR_F_ZBD_SEQ_WRITE
feature), select the first available elevator providing the required
features.
In all cases, default to "none" if no elevator is available or if the
initialization of the default elevator fails.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Introduce the definition of elevator features through the
elevator_features flags in the elevator_type structure. Each flag can
represent a feature supported by an elevator. The first feature defined
by this patch is support for zoned block device sequential write
constraint with the flag ELEVATOR_F_ZBD_SEQ_WRITE, which is implemented
by the mq-deadline elevator using zone write locking.
Other possible features are IO priorities, write hints, latency targets
or single-LUN dual-actuator disks (for which the elevator could maintain
one LBA ordered list per actuator).
The required_elevator_features field is also added to the request_queue
structure to allow a device driver to specify elevator feature flags
that an elevator must support for the correct operation of the device
(e.g. device drivers for zoned block devices can have the
ELEVATOR_F_ZBD_SEQ_WRITE flag as a required feature).
The helper function blk_queue_required_elevator_features() is
defined for setting this new field.
With these two new fields in place, the elevator functions
elevator_match() and elevator_find() are modified to allow a user to set
only an elevator with a set of features that satisfies the device
required features. Elevators not matching the device requirements are
not shown in the device sysfs queue/scheduler file to prevent their use.
The "none" elevator can always be selected as before.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the default elevator chosen is mq-deadline, elevator_init_mq() may
return an error if mq-deadline initialization fails, leading to
blk_mq_init_allocated_queue() returning an error, which in turn will
cause the block device initialization to fail and the device not being
exposed.
Instead of taking such extreme measure, handle mq-deadline
initialization failures in the same manner as when mq-deadline is not
available (no module to load), that is, default to the "none" scheduler.
With this change, elevator_init_mq() return type can be changed to void.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of checking a queue tag_set BLK_MQ_F_NO_SCHED flag before
calling elevator_init_mq() to make sure that the queue supports IO
scheduling, use the elevator.c function elv_support_iosched() in
elevator_init_mq(). This does not introduce any functional change but
ensure that elevator_init_mq() does the right thing based on the queue
settings.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Don't populate the array seq on the stack but instead make it
static const. Makes the object code smaller by 30 bytes.
Before:
text data bss dec hex filename
22284 3184 0 25468 637c drivers/input/joystick/sidewinder.o
After:
text data bss dec hex filename
22158 3280 0 25438 635e drivers/input/joystick/sidewinder.o
(gcc version 9.2.1, amd64)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
We need to reset input device's timestamp on input_sync(), otherwise
drivers not using input_set_timestamp() will end up with a stale
timestamp after their clients consume first input event.
Fixes: 3b51c44bd6 ("Input: allow drivers specify timestamp for input events")
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Namespaces created with PFN_MODE_PMEM mode stores struct page in the reserve
block area. We need to make sure we account for the right struct page
size while doing this. Instead of directly depending on sizeof(struct page)
which can change based on different kernel config option, use the max struct
page size (64) while calculating the reserve block area. This makes sure pmem
device can be used across kernels built with different configs.
If the above assumption of max struct page size change, we need to update the
reserve block allocation space for new namespaces created.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/20190905154603.10349-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In order to support marking namespaces with unsupported feature/versions
disabled, nvdimm core should advance the namespace seed on these
probe failures. Otherwise, these failed namespaces will be considered a
seed namespace and will be wrongly used while creating new namespaces.
Add -EOPNOTSUPP as return from pmem probe callback to indicate a namespace
initialization failures due to pfn superblock feature/version mismatch.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/20190905154603.10349-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The nd_region_probe_success() helper collides seed management with
nvdimm->busy tracking. Given the 'busy' increment is handled internal to the
nd_region driver 'probe' path move the decrement to the 'remove' path.
With that cleanup the routine can be renamed to the more descriptive
nd_region_advance_seeds().
The change is prompted by an incoming need to optionally advance the
seeds on other events besides 'probe' success.
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/20190905154603.10349-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
When 'pci=resource_alignment=' is specified on the command line, there is
no trailing new line. Then, when it's read through the corresponding sysfs
attribute, there will be no newline and a cat command will not show
correctly in a shell. If the parameter is set through sysfs a new line will
be stored and it will 'cat' correctly.
To solve this, append a new line character in the show function if one does
not already exist.
Link: https://lore.kernel.org/r/20190822161013.5481-4-logang@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add 3 counters per priority to ethtool using PPCNT:
1) rx_prio[p]_buf_discard - the number of packets discarded by device
due to lack of per host receive buffers
2) rx_prio[p]_cong_discard - the number of packets discarded by device
due to per host congestion
3) rx_prio[p]_marked - the number of packets ECN marked by device due
to per host congestion
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Map capability bit indicating that HCA supports port buffer's congestion
counters. Also map registers with the corresponding counters.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix to return negative error code -ENOMEM from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 4ec9e7b026 ("net/mlx5: DR, Expose steering domain functionality")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The memory return by kzalloc() has already be set to zero, so
remove useless memset(0).
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Don't clear MLX5E_SQ_STATE_ENABLED on error in mlx5e_open_txqsq and
mlx5e_open_icosq, because it's not set there, and is 0 by default.
Fixes: acc6c5953a ("net/mlx5e: Split open/close channels to stages")
Fixes: 9d18b5144a ("net/mlx5e: Split open/close ICOSQ into stages")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
PTR_ERR_OR_ZERO contains if(IS_ERR(...)) + PTR_ERR. It is better
to use it directly. hence just replace it.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The error return from a call to mlx5_flow_namespace_set_peer is not
being assigned to variable err and hence the error check following
the call is currently not working. Fix this by assigning ret as
intended.
Addresses-Coverity: ("Logically dead code")
Fixes: 8463daf17e ("net/mlx5: Add support to use SMFS in switchdev mode")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There is a spelling mistake in a NL_SET_ERR_MSG_MOD error message.
Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This function was removed in the cited commit below.
Fixes: 13e509a4c1 ("net/mlx5e: Remove leftover code from the PF netdev being uplink rep")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
mlx5 is dependent on IPv6 tristate since we use ipv6's nd_tbl directly,
alternatively we can use ipv6_stub->nd_tbl and remove the dependency.
Reported-by: Walter Harms <wharms@bfs.de>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When MLX5_CORE=y and PCI_HYPERV_INTERFACE=m, below errors are found:
drivers/net/ethernet/mellanox/mlx5/core/en_main.o: In function `mlx5e_nic_enable':
en_main.c:(.text+0xb649): undefined reference to `mlx5e_hv_vhca_stats_create'
drivers/net/ethernet/mellanox/mlx5/core/en_main.o: In function `mlx5e_nic_disable':
en_main.c:(.text+0xb8c4): undefined reference to `mlx5e_hv_vhca_stats_destroy'
Fix this by making MLX5_CORE imply PCI_HYPERV_INTERFACE.
Fixes: cef35af34d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cited patch have an issue in WARN_ON_ONCE check, with wrong address ranges
are compared. Fix that by changing pointer types from u64* to void*. This
will also make code simpler to read.
In addition mlx5e_hv_vhca_fill_ring_stats can get void pointer, so remove
the unnecessary casting when calling it.
Found by static checker:
drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c:41 mlx5e_hv_vhca_fill_stats()
warn: potential pointer math issue ('buf' is a u64 pointer)
Fixes: cef35af34d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If a request_key authentication token key gets revoked, there's a window in
which request_key_auth_describe() can see it with a NULL payload - but it
makes no check for this and something like the following oops may occur:
BUG: Kernel NULL pointer dereference at 0x00000038
Faulting instruction address: 0xc0000000004ddf30
Oops: Kernel access of bad area, sig: 11 [#1]
...
NIP [...] request_key_auth_describe+0x90/0xd0
LR [...] request_key_auth_describe+0x54/0xd0
Call Trace:
[...] request_key_auth_describe+0x54/0xd0 (unreliable)
[...] proc_keys_show+0x308/0x4c0
[...] seq_read+0x3d0/0x540
[...] proc_reg_read+0x90/0x110
[...] __vfs_read+0x3c/0x70
[...] vfs_read+0xb4/0x1b0
[...] ksys_read+0x7c/0x130
[...] system_call+0x5c/0x70
Fix this by checking for a NULL pointer when describing such a key.
Also make the read routine check for a NULL pointer to be on the safe side.
[DH: Modified to not take already-held rcu lock and modified to also check
in the read routine]
Fixes: 04c567d931 ("[PATCH] Keys: Fix race between two instantiators of a key")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PCIe spec doesn't mention "green LEDs" or "amber LEDs". Replace those
terms with "Power Indicator" and "Attention Indicator" so the comments
match the spec language. Comment changes only.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add pciehp_set_indicators() to set power and attention indicators with a
single register write.
This is a minor optimization because we frequently set both indicators and
this can do it with a single command. It also reduces the number of
interfaces related to the indicators and makes them more discoverable
because callers use the PCI_EXP_SLTCTL_ATTN_IND_* and
PCI_EXP_SLTCTL_PWR_IND_* definitions directly.
[bhelgaas: extend commit log, s/PCI_EXP_SLTCTL_.*_IND_NONE/INDICATOR_NOOP/
so they don't look like things defined by the spec, add function doc, mask
commands to make it obvious we only send valid commands
(pcie_do_write_cmd() does mask it, but requires more effort to verify)]
Link: https://lore.kernel.org/r/20190903111021.1559-2-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
I have fixed various bugs, and these drivers are (I hope) pretty
stable now. Remove all dev_dbg() for code clean-up.
If I end up with debugging the drivers again, I will locally revert
this commit. I no longer need the debug code in upstream.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Replace the chain of platform_get_resource() and devm_ioremap_resource()
with devm_platform_ioremap_resource().
This allows to remove the local variable for (struct resource *), and
have one function call less.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Error was detected by PVS-Studio:
V522 Dereferencing of the null pointer 'led_cdev->trigger' might take place.
Fixes: 2282e125a4 ("leds: triggers: let struct led_trigger::activate() return an error code")
Signed-off-by: Oleh Kravchenko <oleg@kaa.org.ua>
Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jacek Anaszewski <jacek.anaszewski@gmail.com>
The behaviour of the EEPROM in the case where we only send an 8bit
address to a 16bit address EEPROM is not defined. Added comment about
that the slave-eeprom might behave differently from how an actual device
does (only one model measured).
Reported-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Björn Ardö <bjorn.ardo@axis.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
We can currently get "Unable to handle kernel paging request at
virtual address" for invalid clocks with dts node but no driver:
(__clk_get_hw) from [<c0138ebc>] (ti_sysc_find_one_clockdomain+0x18/0x34)
(ti_sysc_find_one_clockdomain) from [<c0138f0c>] (ti_sysc_clkdm_init+0x34/0xdc)
(ti_sysc_clkdm_init) from [<c0584660>] (sysc_probe+0xa50/0x10e8)
(sysc_probe) from [<c065c6ac>] (platform_drv_probe+0x58/0xa8)
Let's add IS_ERR checks to ti_sysc_clkdm_init() as And let's start treating
clk_get() with -ENOENT as a proper error. If the clock name is specified
in device tree we must succeed with clk_get() to continue. For modules with
no clock names specified in device tree we will just ignore the clocks.
Fixes: 2b2f7def05 ("bus: ti-sysc: Add support for missing clockdomain handling")
Acked-by: Roger Quadros <rogerq@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The following crash was observed:
Unable to handle kernel NULL pointer dereference at 0000000000000158
Internal error: Oops: 96000004 [#1] SMP
pc : resend_irqs+0x68/0xb0
lr : resend_irqs+0x64/0xb0
...
Call trace:
resend_irqs+0x68/0xb0
tasklet_action_common.isra.6+0x84/0x138
tasklet_action+0x2c/0x38
__do_softirq+0x120/0x324
run_ksoftirqd+0x44/0x60
smpboot_thread_fn+0x1ac/0x1e8
kthread+0x134/0x138
ret_from_fork+0x10/0x18
The reason for this is that the interrupt resend mechanism happens in soft
interrupt context, which is a asynchronous mechanism versus other
operations on interrupts. free_irq() does not take resend handling into
account. Thus, the irq descriptor might be already freed before the resend
tasklet is executed. resend_irqs() does not check the return value of the
interrupt descriptor lookup and derefences the return value
unconditionally.
1):
__setup_irq
irq_startup
check_irq_resend // activate softirq to handle resend irq
2):
irq_domain_free_irqs
irq_free_descs
free_desc
call_rcu(&desc->rcu, delayed_free_desc)
3):
__do_softirq
tasklet_action
resend_irqs
desc = irq_to_desc(irq)
desc->handle_irq(desc) // desc is NULL --> Ooops
Fix this by adding a NULL pointer check in resend_irqs() before derefencing
the irq descriptor.
Fixes: a4633adcdb ("[PATCH] genirq: add genirq sw IRQ-retrigger")
Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1630ae13-5c8e-901e-de09-e740b6a426a7@huawei.com