Johannes Berg says:
====================
One batch of changes, containing:
* hwsim improvements from Jouni and myself, to be able to
test more scenarios easily
* some more HE (802.11ax) support
* some initial S1G (sub 1 GHz) work for fractional MHz channels
* some (action) frame registration updates to help DPP support
* along with other various improvements/fixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The function _sprd_pll_recalc_rate() defines return value to unsigned
long, but it would return a negative value when malloc fail, changing
to return its parent_rate makes more sense, since if the callback
.recalc_rate() is not set, the framework returns the parent_rate as
well.
Fixes: 3e37b00558 ("clk: sprd: add adjustable pll support")
Signed-off-by: Chunyan Zhang <chunyan.zhang@unisoc.com>
Link: https://lkml.kernel.org/r/20200519030036.1785-2-zhang.lyra@gmail.com
Reviewed-by: Baolin Wang <baolin.wang7@gmail.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Clock Generation Unit(CGU) is a new clock controller IP of a forthcoming
Intel network processor SoC named Lightning Mountain(LGM). It provides
programming interfaces to control & configure all CPU & peripheral clocks.
Add common clock framework based clock controller driver for CGU.
Signed-off-by: Rahul Tanwar <rahul.tanwar@linux.intel.com>
Link: https://lkml.kernel.org/r/42a4f71847714df482bacffdcd84341a4052800b.1587102634.git.rahul.tanwar@linux.intel.com
[sboyd@kernel.org: Kill init function to alloc and cleanup newline]
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Define f2fs_listxattr and to NULL when CONFIG_F2FS_FS_XATTR is not
enabled, then we can remove many ugly ifdef macros in the code.
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The commit 8c47b6ff29 ("KVM: PPC: Book3S HV: Check caller of H_SVM_*
Hcalls") added checks of secure bit of SRR1 to filter out the Hcall
reserved to the Ultravisor.
However, the Hcall H_SVM_INIT_ABORT is made by the Ultravisor passing the
context of the VM calling UV_ESM. This allows the Hypervisor to return to
the guest without going through the Ultravisor. Thus the Secure bit of SRR1
is not set in that particular case.
In the case a regular VM is calling H_SVM_INIT_ABORT, this hcall will be
filtered out in kvmppc_h_svm_init_abort() because kvm->arch.secure_guest is
not set in that case.
Fixes: 8c47b6ff29 ("KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls")
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
It is unsafe to traverse kvm->arch.spapr_tce_tables and
stt->iommu_tables without the RCU read lock held. Also, add
cond_resched_rcu() in places with the RCU read lock held that could take
a while to finish.
arch/powerpc/kvm/book3s_64_vio.c:76 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
no locks held by qemu-kvm/4265.
stack backtrace:
CPU: 96 PID: 4265 Comm: qemu-kvm Not tainted 5.7.0-rc4-next-20200508+ #2
Call Trace:
[c000201a8690f720] [c000000000715948] dump_stack+0xfc/0x174 (unreliable)
[c000201a8690f770] [c0000000001d9470] lockdep_rcu_suspicious+0x140/0x164
[c000201a8690f7f0] [c008000010b9fb48] kvm_spapr_tce_release_iommu_group+0x1f0/0x220 [kvm]
[c000201a8690f870] [c008000010b8462c] kvm_spapr_tce_release_vfio_group+0x54/0xb0 [kvm]
[c000201a8690f8a0] [c008000010b84710] kvm_vfio_destroy+0x88/0x140 [kvm]
[c000201a8690f8f0] [c008000010b7d488] kvm_put_kvm+0x370/0x600 [kvm]
[c000201a8690f990] [c008000010b7e3c0] kvm_vm_release+0x38/0x60 [kvm]
[c000201a8690f9c0] [c0000000005223f4] __fput+0x124/0x330
[c000201a8690fa20] [c000000000151cd8] task_work_run+0xb8/0x130
[c000201a8690fa70] [c0000000001197e8] do_exit+0x4e8/0xfa0
[c000201a8690fb70] [c00000000011a374] do_group_exit+0x64/0xd0
[c000201a8690fbb0] [c000000000132c90] get_signal+0x1f0/0x1200
[c000201a8690fcc0] [c000000000020690] do_notify_resume+0x130/0x3c0
[c000201a8690fda0] [c000000000038d64] syscall_exit_prepare+0x1a4/0x280
[c000201a8690fe20] [c00000000000c8f8] system_call_common+0xf8/0x278
====
arch/powerpc/kvm/book3s_64_vio.c:368 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by qemu-kvm/4264:
#0: c000201ae2d000d8 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0xdc/0x950 [kvm]
#1: c000200c9ed0c468 (&kvm->srcu){....}-{0:0}, at: kvmppc_h_put_tce+0x88/0x340 [kvm]
====
arch/powerpc/kvm/book3s_64_vio.c:108 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by qemu-kvm/4257:
#0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm]
====
arch/powerpc/kvm/book3s_64_vio.c:146 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by qemu-kvm/4257:
#0: c000200b1b363a40 (&kv->lock){+.+.}-{3:3}, at: kvm_vfio_set_attr+0x598/0x6c0 [kvm]
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
structure. For historical reasons, many kvm-related function parameters
retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
patch does a unified cleanup of these remaining redundant parameters.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The 'kvm_run' field already exists in the 'vcpu' structure, which
is the same structure as the 'kvm_run' in the 'vcpu_arch' and
should be deleted.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
The newly introduced ibm,secure-memory nodes supersede the
ibm,uv-firmware's property secure-memory-ranges.
Firmware will no more expose the secure-memory-ranges property so first
read the new one and if not found rollback to the older one.
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Free function kfree() already does NULL check, so the additional
check is unnecessary, just remove it.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
iSCSI suffers from a deadlock in case a management command submitted via
the netlink socket sleeps on an allocation while holding the rx_queue_mutex
if that allocation causes a memory reclaim that writebacks to a failed
iSCSI device. The recovery procedure can never make progress to recover
the failed disk or abort outstanding IO operations to complete the reclaim
(since rx_queue_mutex is locked), thus locking the system.
Nevertheless, just marking all allocations under rx_queue_mutex as GFP_NOIO
(or locking the userspace process with something like PF_MEMALLOC_NOIO) is
not enough, since the iSCSI command code relies on other subsystems that
try to grab locked mutexes, whose threads are GFP_IO, leading to the same
deadlock. One instance where this situation can be observed is in the
backtraces below, stitched from multiple bugs reports, involving the kobj
uevent sent when a session is created.
The root of the problem is not the fact that iSCSI does GFP_IO allocations,
that is acceptable. The actual problem is that rx_queue_mutex has a very
large granularity, covering every unrelated netlink command execution at
the same time as the error recovery path.
The proposed fix leverages the recently added mechanism to stop failed
connections from the kernel, by enabling it to execute even though a
management command from the netlink socket is being run (rx_queue_mutex is
held), provided that the command is known to be safe. It splits the
rx_queue_mutex in two mutexes, one protecting from concurrent command
execution from the netlink socket, and one protecting stop_conn from racing
with other connection management operations that might conflict with it.
It is not very pretty, but it is the simplest way to resolve the deadlock.
I considered making it a lock per connection, but some external mutex would
still be needed to deal with iscsi_if_destroy_conn.
The patch was tested by forcing a memory shrinker (unrelated, but used
bufio/dm-verity) to reclaim iSCSI pages every time
ISCSI_UEVENT_CREATE_SESSION happens, which is reasonable to simulate
reclaims that might happen with GFP_KERNEL on that path. Then, a faulty
hung target causes a connection to fail during intensive IO, at the same
time a new session is added by iscsid.
The following stacktraces are stiches from several bug reports, showing a
case where the deadlock can happen.
iSCSI-write
holding: rx_queue_mutex
waiting: uevent_sock_mutex
kobject_uevent_env+0x1bd/0x419
kobject_uevent+0xb/0xd
device_add+0x48a/0x678
scsi_add_host_with_dma+0xc5/0x22d
iscsi_host_add+0x53/0x55
iscsi_sw_tcp_session_create+0xa6/0x129
iscsi_if_rx+0x100/0x1247
netlink_unicast+0x213/0x4f0
netlink_sendmsg+0x230/0x3c0
iscsi_fail iscsi_conn_failure
waiting: rx_queue_mutex
schedule_preempt_disabled+0x325/0x734
__mutex_lock_slowpath+0x18b/0x230
mutex_lock+0x22/0x40
iscsi_conn_failure+0x42/0x149
worker_thread+0x24a/0xbc0
EventManager_
holding: uevent_sock_mutex
waiting: dm_bufio_client->lock
dm_bufio_lock+0xe/0x10
shrink+0x34/0xf7
shrink_slab+0x177/0x5d0
do_try_to_free_pages+0x129/0x470
try_to_free_mem_cgroup_pages+0x14f/0x210
memcg_kmem_newpage_charge+0xa6d/0x13b0
__alloc_pages_nodemask+0x4a3/0x1a70
fallback_alloc+0x1b2/0x36c
__kmalloc_node_track_caller+0xb9/0x10d0
__alloc_skb+0x83/0x2f0
kobject_uevent_env+0x26b/0x419
dm_kobject_uevent+0x70/0x79
dev_suspend+0x1a9/0x1e7
ctl_ioctl+0x3e9/0x411
dm_ctl_ioctl+0x13/0x17
do_vfs_ioctl+0xb3/0x460
SyS_ioctl+0x5e/0x90
MemcgReclaimerD"
holding: dm_bufio_client->lock
waiting: stuck io to finish (needs iscsi_fail thread to progress)
schedule at ffffffffbd603618
io_schedule at ffffffffbd603ba4
do_io_schedule at ffffffffbdaf0d94
__wait_on_bit at ffffffffbd6008a6
out_of_line_wait_on_bit at ffffffffbd600960
wait_on_bit.constprop.10 at ffffffffbdaf0f17
__make_buffer_clean at ffffffffbdaf18ba
__cleanup_old_buffer at ffffffffbdaf192f
shrink at ffffffffbdaf19fd
do_shrink_slab at ffffffffbd6ec000
shrink_slab at ffffffffbd6ec24a
do_try_to_free_pages at ffffffffbd6eda09
try_to_free_mem_cgroup_pages at ffffffffbd6ede7e
mem_cgroup_resize_limit at ffffffffbd7024c0
mem_cgroup_write at ffffffffbd703149
cgroup_file_write at ffffffffbd6d9c6e
sys_write at ffffffffbd6662ea
system_call_fastpath at ffffffffbdbc34a2
Link: https://lore.kernel.org/r/20200520022959.1912856-1-krisman@collabora.com
Reported-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently UFS host driver promises VCC supply if UFS device needs to do
WriteBooster flush during runtime suspend.
However the UFS specification mentions:
"While the flushing operation is in progress, the device is in Active power
mode."
Therefore UFS host driver needs to promise more: Keep UFS device as "Active
power mode", otherwise UFS device shall not do any flush if device enters
Sleep or PowerDown power mode. Similarly, the same promises shall be
applied if device needs urgent BKOP during runtime suspend.
Fix this by not changing device power mode if WriteBooster flush or urgent
BKOP is required in ufshcd_suspend().
Now, if device finishes its job but is not resumed for a very long time,
system will have unnecessary power drain because VCC is still supplied. A
method to re-check the threshold of keeping VCC supply is required to fix
the power drain. However, the threshold re-check needs to re-activate the
link first because the decision depends on the latest device status.
Also introduce a delayed work to force runtime resume after a certain delay
during runtime suspend. This makes threshold re-check happen natually in
the entry of the next runtime-suspend. The device can continue its
WriteBooster flush or urgent BKOP jobs soon after resumed if device has no
upcoming requests and link enters hibern8 state either by Auto-Hibern8 or
hibern8 during clk-gating scheme. This solution not only prevents power
drain but also makes as much use of time as possible for device's
background jobs.
Link: https://lore.kernel.org/r/20200522083212.4008-5-stanley.chu@mediatek.com
Reviewed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Older firmware version sets BIT(13) in clkflag to mark a
divider as fractional divider. Updated firmware version sets BIT(4)
in type flags to mark a divider as fractional divider since
BIT(13) is defined as CLK_DUTY_CYCLE_PARENT in the common clk
framework flags.
To support both old and new firmware version, consider BIT(13) from
clkflag and BIT(4) from type_flag to check if divider is fractional
or not.
To maintain compatibility BIT(13) of clkflag in firmware will not be
used in future for any purpose and will be marked as unused.
Signed-off-by: Tejas Patel <tejas.patel@xilinx.com>
Signed-off-by: Rajan Vaja <rajan.vaja@xilinx.com>
Signed-off-by: Jolly Shah <jolly.shah@xilinx.com>
Link: https://lkml.kernel.org/r/1584048699-24186-3-git-send-email-jolly.shah@xilinx.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
zynqmp_get_divider2_val() calculates, divider value of type DIV2 clock,
considering best possible combination of DIV1 and DIV2.
To find best possible values of DIV1 and DIV2, DIV1's parent rate
should be consider and not DIV2's parent rate since it would rate of
div1 clock. Consider a below topology,
out_clk->div2_clk->div1_clk->fixed_parent
where out_clk = (fixed_parent/div1_clk) / div2_clk, so parent clock
of div1_clk (i.e. out_clk) should be divided by div1_clk and div2_clk.
Existing code divides parent rate of div2_clk's clock instead of
div1_clk's parent rate, which is wrong.
Fix the same by considering div1's parent clock rate.
Fixes: 4ebd92d2e2 ("clk: zynqmp: Fix divider calculation")
Signed-off-by: Tejas Patel <tejas.patel@xilinx.com>
Signed-off-by: Jolly Shah <jolly.shah@xilinx.com>
Link: https://lkml.kernel.org/r/1583185843-20707-3-git-send-email-jolly.shah@xilinx.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
ufs_qcom_dump_dbg_regs() uses usleep_range, a sleeping function, but can be
called from atomic context in the following flow:
ufshcd_intr -> ufshcd_sl_intr -> ufshcd_check_errors ->
ufshcd_print_host_regs -> ufshcd_vops_dbg_register_dump ->
ufs_qcom_dump_dbg_regs
This causes a boot crash on the Lenovo Miix 630 when the interrupt is
handled on the idle thread.
Fix the issue by switching to udelay().
Link: https://lore.kernel.org/r/20200525204125.46171-1-jeffrey.l.hugo@gmail.com
Fixes: 9c46b86762 ("scsi: ufs-qcom: dump additional testbus registers")
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Jeffrey Hugo <jeffrey.l.hugo@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For non RDPQ mode, the driver allocates a single contiguous block of memory
pool for all reply descriptor post queues and passes down a single address
in the ReplyDescriptorPostQueueAddress field of the IOC Init Request
Message to the firmware. So reply_post queue will have only one entry which
holds the address of this single contiguous block of memory pool.
While allocating the reply descriptor post queue pool, driver should loop
only once in non-RDPQ mode. But the driver is looping for
ioc->reply_queue_count number of times even though reply_post queue's queue
depth is only one in non-RDPQ mode. This leads to 'BUG: KASAN:
use-after-free in base_alloc_rdpq_dma_pool'.
The fix is to loop only once while allocating memory for the reply
descriptor post queue in non-RDPQ mode
Fixes: 8012209eb2 ("scsi: mpt3sas: Handle RDPQ DMA allocation in same 4G region")
Link: https://lore.kernel.org/r/20200522103558.5710-1-suganath-prabu.subramani@broadcom.com
Reported-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are four different callback functions that are used for the
clk_register callback that all have different second parameter types.
bcm2835_register_pll -> struct bcm2835_pll_data
bcm2835_register_pll_divider -> struct bcm2835_pll_divider_data
bcm2835_register_clock -> struct bcm2835_clock_data
bcm2835_register_date -> struct bcm2835_gate_data
These callbacks are cast to bcm2835_clk_register so that there is no
error about incompatible pointer types. Unfortunately, this is a control
flow integrity violation, which verifies that the callback function's
types match the prototypes exactly before jumping.
[ 0.857913] CFI failure (target: 0xffffff9334a81820):
[ 0.857977] WARNING: CPU: 3 PID: 35 at kernel/cfi.c:29 __cfi_check_fail+0x50/0x58
[ 0.857985] Modules linked in:
[ 0.858007] CPU: 3 PID: 35 Comm: kworker/3:1 Not tainted 4.19.123-v8-01301-gdbb48f16956e4-dirty #1
[ 0.858015] Hardware name: Raspberry Pi 3 Model B Rev 1.2 (DT)
[ 0.858031] Workqueue: events 0xffffff9334a925c8
[ 0.858046] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 0.858058] pc : __cfi_check_fail+0x50/0x58
[ 0.858070] lr : __cfi_check_fail+0x50/0x58
[ 0.858078] sp : ffffff800814ba90
[ 0.858086] x29: ffffff800814ba90 x28: 000fffffffdfff3d
[ 0.858101] x27: 00000000002000c2 x26: ffffff93355fdb18
[ 0.858116] x25: 0000000000000000 x24: ffffff9334a81820
[ 0.858131] x23: ffffff93357f3580 x22: ffffff9334af1000
[ 0.858146] x21: a79b57e88f8ebc81 x20: ffffff93357f3580
[ 0.858161] x19: ffffff9334a81820 x18: fffffff679769070
[ 0.858175] x17: 0000000000000000 x16: 0000000000000000
[ 0.858190] x15: 0000000000000004 x14: 000000000000003c
[ 0.858205] x13: 0000000000003044 x12: 0000000000000000
[ 0.858220] x11: b57e91cd641bae00 x10: b57e91cd641bae00
[ 0.858235] x9 : b57e91cd641bae00 x8 : b57e91cd641bae00
[ 0.858250] x7 : 0000000000000000 x6 : ffffff933591d4e5
[ 0.858264] x5 : 0000000000000000 x4 : 0000000000000000
[ 0.858279] x3 : ffffff800814b718 x2 : ffffff9334a84818
[ 0.858293] x1 : ffffff9334bba66c x0 : 0000000000000029
[ 0.858308] Call trace:
[ 0.858321] __cfi_check_fail+0x50/0x58
[ 0.858337] __cfi_check+0x3ab3c/0x4467c
[ 0.858351] bcm2835_clk_probe+0x210/0x2dc
[ 0.858369] platform_drv_probe+0xb0/0xfc
[ 0.858380] really_probe+0x4a0/0x5a8
[ 0.858391] driver_probe_device+0x68/0x104
[ 0.858403] __device_attach_driver+0x100/0x148
[ 0.858418] bus_for_each_drv+0xb0/0x12c
[ 0.858431] __device_attach.llvm.17225159516306086099+0xc0/0x168
[ 0.858443] bus_probe_device+0x44/0xfc
[ 0.858455] deferred_probe_work_func+0xa0/0xe0
[ 0.858472] process_one_work+0x210/0x538
[ 0.858485] worker_thread+0x2e8/0x478
[ 0.858500] kthread+0x154/0x164
[ 0.858515] ret_from_fork+0x10/0x18
To fix this, change the second parameter of all functions void * and use
a local variable with the correct type so that everything works
properly. With this, the only use of bcm2835_clk_register is in struct
bcm2835_clk_desc so we can just remove it and use the type directly.
Fixes: 56eb3a2ed9 ("clk: bcm2835: remove use of BCM2835_CLOCK_COUNT in driver")
Link: https://github.com/ClangBuiltLinux/linux/issues/1028
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200516080806.1459784-2-natechancellor@gmail.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
bcm2835_register_gate is used as a callback for the clk_register member
of bcm2835_clk_desc, which expects a struct clk_hw * return type but
bcm2835_register_gate returns a struct clk *.
This discrepancy is hidden by the fact that bcm2835_register_gate is
cast to the typedef bcm2835_clk_register by the _REGISTER macro. This
turns out to be a control flow integrity violation, which is how this
was noticed.
Change the return type of bcm2835_register_gate to be struct clk_hw *
and use clk_hw_register_gate to do so. This should be a non-functional
change as clk_register_gate calls clk_hw_register_gate anyways but this
is needed to avoid issues with further changes.
Fixes: b19f009d45 ("clk: bcm2835: Migrate to clk_hw based registration and OF APIs")
Link: https://github.com/ClangBuiltLinux/linux/issues/1028
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Link: https://lkml.kernel.org/r/20200516080806.1459784-1-natechancellor@gmail.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
For rx filter 'HWTSTAMP_FILTER_PTP_V2_EVENT', it should be
PTP v2/802.AS1, any layer, any kind of event packet, but HW only
take timestamp snapshot for below PTP message: sync, Pdelay_req,
Pdelay_resp.
Then it causes below issue when test E2E case:
ptp4l[2479.534]: port 1: received DELAY_REQ without timestamp
ptp4l[2481.423]: port 1: received DELAY_REQ without timestamp
ptp4l[2481.758]: port 1: received DELAY_REQ without timestamp
ptp4l[2483.524]: port 1: received DELAY_REQ without timestamp
ptp4l[2484.233]: port 1: received DELAY_REQ without timestamp
ptp4l[2485.750]: port 1: received DELAY_REQ without timestamp
ptp4l[2486.888]: port 1: received DELAY_REQ without timestamp
ptp4l[2487.265]: port 1: received DELAY_REQ without timestamp
ptp4l[2487.316]: port 1: received DELAY_REQ without timestamp
Timestamp snapshot dependency on register bits in received path:
SNAPTYPSEL TSMSTRENA TSEVNTENA PTP_Messages
01 x 0 SYNC, Follow_Up, Delay_Req,
Delay_Resp, Pdelay_Req, Pdelay_Resp,
Pdelay_Resp_Follow_Up
01 0 1 SYNC, Pdelay_Req, Pdelay_Resp
For dwmac v5.10a, enabling all events by setting register
DWC_EQOS_TIME_STAMPING[SNAPTYPSEL] to 2’b01, clearing bit [TSEVNTENA]
to 0’b0, which can support all required events.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
nexthops: Fix 2 fundamental flaws with nexthop groups
Nik's torture tests have exposed 2 fundamental mistakes with the initial
nexthop code for groups. First, the nexthops entries and num_nh in the
nh_grp struct should not be modified once the struct is set under rcu.
Doing so has major affects on the datapath seeing valid nexthop entries.
Second, the helpers in the header file were convenient for not repeating
code, but they cause datapath walks to potentially see 2 different group
structs after an rcu replace, disrupting a walk of the path objects.
This second problem applies solely to IPv4 as I re-used too much of the
existing code in walking legs of a multipath route.
Patches 1 is refactoring change to simplify the overhead of reviewing and
understanding the change in patch 2 which fixes the update of nexthop
groups when a compnent leg is removed.
Patches 3-5 address the second problem. Patch 3 inlines the multipath
check such that the mpath lookup and subsequent calls all use the same
nh_grp struct. Patches 4 and 5 fix datapath uses of fib_info_num_path
with iterative calls to fib_info_nhc.
fib_info_num_path can be used in control plane path in a 'for loop' with
subsequent fib_info_nhc calls to get each leg since the nh_grp struct is
only changed while holding the rtnl; the combination can not be used in
the data plane with external nexthops as it involves repeated dereferences
of nh_grp struct which can change between calls.
Similarly, nexthop_is_multipath can be used for branching decisions in
the datapath since the nexthop type can not be changed (a group can not
be converted to standalone and vice versa).
Patch set developed in coordination with Nikolay Aleksandrov. He did a
lot of work creating a good reproducer, discussing options to fix it
and testing iterations.
I have adapted Nik's commands into additional tests in the nexthops
selftest script which I will send against -next.
v2
- fixed whitespace errors
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to the last path, need to fix fib_info_nh_uses_dev for
external nexthops to avoid referencing multiple nh_grp structs.
Move the device check in fib_info_nh_uses_dev to a helper and
create a nexthop version that is called if the fib_info uses an
external nexthop.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FIB lookups can return an entry that references an external nexthop.
While walking the nexthop struct we do not want to make multiple calls
into the nexthop code which can result in 2 different structs getting
accessed - one returning the number of paths the rest of the loop
seeing a different nh_grp struct. If the nexthop group shrunk, the
result is an attempt to access a fib_nh_common that does not exist for
the new nh_grp struct but did for the old one.
To fix that move the device evaluation code to a helper that can be
used for inline fib_nh path as well as external nexthops.
Update the existing check for fi->nh in fib_table_lookup to call a
new helper, nexthop_get_nhc_lookup, which walks the external nexthop
with a single rcu dereference.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I got too fancy consolidating checks on multipath type. The result
is that path lookups can access 2 different nh_grp structs as exposed
by Nik's torture tests. Expand nexthop_is_multipath within nexthop.h to
avoid multiple, nh_grp dereferences and make decisions based on the
consistent struct.
Only 2 places left using nexthop_is_multipath are within IPv6, both
only check that the nexthop is a multipath for a branching decision
which are acceptable.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We must avoid modifying published nexthop groups while they might be
in use, otherwise we might see NULL ptr dereferences. In order to do
that we allocate 2 nexthoup group structures upon nexthop creation
and swap between them when we have to delete an entry. The reason is
that we can't fail nexthop group removal, so we can't handle allocation
failure thus we move the extra allocation on creation where we can
safely fail and return ENOMEM.
Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>