To fix below kernel-doc warnings this patch does the following
->Replaced Return\Returns with 'Return:' keyword.
->Added 'Return' description For __init of_fpga_region_init()' API.
->Added description for 'child_regions_with_firmware()' API.
warning: No description found for return value of
'of_fpga_region_find'.
warning: No description found for return value of
'of_fpga_region_get_bridges'.
warning: missing initial short description on line:
* child_regions_with_firmware
warning: No description found for return value of
'child_regions_with_firmware'.
warning: No description found for return value of
'of_fpga_region_notify_pre_apply'.
warning: No description found for return value of
'of_fpga_region_notify'.
warning: No description found for return value of
'of_fpga_region_init'.
Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-6-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
The pages in the file mapping maybe reclaimed and reused by other
subsystems and the page->private maybe used as flags field or
something else, if later that pages are used by page caches again
the page->private maybe not cleared as expected.
Here will check the PG_private bit instead of the folio->private.
Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Currently when we create a file, we spin up an xattr buffer to send
along with the create request. If we end up doing an async create
however, then we currently pass down a zero-length xattr buffer.
Fix the code to send down the xattr buffer in req->r_pagelist. If the
xattrs span more than a page, however give up and don't try to do an
async create.
Cc: stable@vger.kernel.org
URL: https://bugzilla.redhat.com/show_bug.cgi?id=2063929
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Reported-by: John Fortin <fortinj66@gmail.com>
Reported-by: Sri Ramanujam <sri@ramanujam.io>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Gerhard Engleder says:
====================
ptp: Support hardware clocks with additional free running cycle counter
ptp vclocks require a clock with free running time for the timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.
If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized while
vclocks are in use.
The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.
One year ago I thought for two time domains within a TSN network also
two physical clocks are required. This would lead to new kernel
interfaces for asking for the second clock, ... . But actually for a
time triggered system like TSN there can be only one time domain that
controls the system itself. All other time domains belong to other
layers, but not to the time triggered system itself. So other time
domains can be based on a free running counter if similar mechanisms
like 2 step synchroisation are used.
Synchronisation was tested with two time domains between two directly
connected hosts. Each host run two ptp4l instances, the first used the
physical clock and the second used the virtual clock. I used my FPGA
based network controller as network device. ptp4l was used in
combination with the virtual clock support patches from Miroslav
Lichvar.
v4:
- if_index of 0 is invalid (Jonathan Lemon)
- set if_index to 0 in the SOF_TIMESTAMPING_RAW_HARDWARE block (Jonathan
Lemon)
- add helper function for netdev_get_tstamp() call (Jonathan Lemon)
- update SKBTX_ANY_TSTAMP (Paolo Abeni)
- use separate bits for new tx_flags (Richard Cochran)
v3:
- optimize ptp_convert_timestamp (Richard Cochran)
- call dev_get_by_napi_id() only if needed (Richard Cochran)
- use non-negated logical test (Richard Cochran)
- add comment for skipped output (Richard Cochran)
- add comment for SKBTX_HW_TSTAMP_USE_CYCLES masking (Richard Cochran)
v2:
- rename ptp_clock cycles to has_cycles (Richard Cochran)
- call it free running cycle counter (Richard Cochran)
- update struct skb_shared_hwtstamps kdoc (Richard Cochran)
- optimize timestamp address/cookie processing path (Richard Cochran,
Vinicius Costa Gomes)
v1:
- complete rework based on suggestions (Richard Cochran)
====================
Link: https://lore.kernel.org/r/20220506200142.3329-1-gerhard@engleder-embedded.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The TSN endpoint Ethernet MAC supports a free running counter
additionally to its clock. This free running counter can be read and
hardware timestamps are supported. As the name implies, this counter
cannot be set and its frequency cannot be adjusted.
Add free running cycle counter support based on this free running
counter to physical clock. This also requires hardware time stamps
based on that free running counter.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ptp_convert_timestamp() is called in the RX path of network messages.
The current implementation takes ~5000ns on 1.2GHz A53. This is too much
for the hot path of packet processing.
Introduce hash table for fast vclock lookup in ptp_convert_timestamp().
The execution time of ptp_convert_timestamp() is reduced to ~700ns on
1.2GHz A53.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
If a physical clock supports a free running cycle counter, then
timestamps shall be based on this time too. For TX it is known in
advance before the transmission if a timestamp based on the free running
cycle counter is needed. For RX it is impossible to know which timestamp
is needed before the packet is received and assigned to a socket.
Support late timestamp determination by a network device. Therefore, an
address/cookie is stored within the new netdev_data field of struct
skb_shared_hwtstamps. This address/cookie is provided to a new network
device function called ndo_get_tstamp(), which returns a timestamp based
on the normal/adjustable time or based on the free running cycle
counter. If function is not supported, then timestamp handling is not
changed.
This mechanism is intended for RX, but TX use is also possible.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
a field of the argument with the type struct skb_shared_hwtstamps *. So
a pointer to the hwtstamp field of this structure is sufficient.
Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
This allows to add additional timestamp manipulation stages before the
call of ptp_convert_timestamp().
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The free running cycle counter of physical clocks called cycles shall be
used for hardware timestamps to enable synchronisation.
Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to
provide a TX timestamp based on cycles if cycles are supported.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
ptp vclocks require a free running time for their timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.
If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized
while vclocks are in use.
The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.
Introduce support for a free running cycle counter called cycles to
physical clocks. Rework ptp vclocks to use this free running cycle
counter. Default implementation is based on time of physical clock.
Thus, behavior of ptp vclocks based on physical clocks without free
running cycle counter is identical to previous behavior.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.
Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag
For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag
The following files are exposed:
- state: Returns "active" or "disabled". If "active" it means hardware
lag is active.
- members: Returns the BDFs of all the members of lag object.
- type: Returns the type of the lag currently configured. Valid only
if hardware lag is active.
* "roce" - Members are bare metal PFs.
* "switchdev" - Members are in switchdev mode.
* "multipath" - ECMP offloads.
- port_sel_mode: Returns the egress port selection method, valid
only if hardware lag is active.
* "queue_affinity" - Egress port is selected by
the QP/SQ affinity.
* "hash" - Egress port is selected by hash done on
each packet. Controlled by: xmit_hash_policy of the
bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
hardware lag is active.
* "shared_fdb" - "on" or "off", if "on" single FDB is used.
- mapping: Returns the mapping which is used to select egress port.
Valid only if hardware lag is active.
If @port_sel_mode is "hash" returns the active egress ports.
The hash result will select only active ports.
if @port_sel_mode is "queue_affinity" returns the mapping
between the configured port affinity of the QP/SQ and actual
egress port. For example:
* 1:1 - Mapping means if the configured affinity is port 1
traffic will egress via port 1.
* 1:2 - Mapping means if the configured affinity is port 1
traffic will egress via port 2. This can happen
if port 1 is down or in active/backup mode and port 1
is backup.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When in hardware lag and the NIC has more than 2 ports when one port
goes down need to distribute the traffic between the remaining
active ports.
For better spread in such cases instead of using 1-to-1 mapping and only
4 slots in the hash, use many.
Each port will have many slots that point to it. When a port goes down
go over all the slots that pointed to that port and spread them between
the remaining active ports. Once the port comes back restore the default
mapping.
We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots.
Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port.
The native mapping is such that:
port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1]
which means if a packet is hased into one of this slots it will hit the
wire via port 1.
port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2]
which means if a packet is hased into one of this slots it will hit the
wire via port2.
and this mapping is the same of the rest of the ports.
On a failover, lets say port 2 goes down (port 1, 3, 4 are still up).
the new mapping for port 2 will be:
port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4]
which means the mapping was changed from the native mapping to a mapping
that consists of only the active ports.
With this if a port goes down the traffic will be split between the
active ports randomly
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
to support NICs with 4 ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Refactor the entire lag code to use ldev->ports instead of hard-coded
defines (like MLX5_MAX_PORTS) for its operations.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Downstream patches will add support for lag over 4 ports.
In that mode we will only use hash as the uplink selection method.
Using hash instead of queue affinity (before this patch) offers key
advantages like:
- Align ports selection method with the method used by the bond device
- Better packets distribution where a single queue can transmit from
multiple ports (with queue affinity a queue is bound to a single port
regardless of the packet being sent).
- In case of failover we traffic is split between multiple ports and not
a single one like in queue affinity.
Going forward it was decided that queue affinity will be deprecated
as using hash provides a better user experience which means on 4 ports
HCAs hash will always be used.
Future work will add hash support for 2 ports HCAs as well.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
E-Switch currently doesn't support more than 2 E-Switch managers
being aggregated under a single hardware lag. Have specific checks
to disallow creating lag when the code doesn't support it.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Store the number of lag ports inside the lag object. Lag object is a single
shared object managing the lag state of multiple mlx5 devices on the same
physical HCA.
Downstream patches will allow hardware lag to be created over devices with
more than 2 ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
When search for a peer lag device we can filter based on that
device's capabilities.
Downstream patch will be less strict when filtering compatible devices
and remove the limitation where we require exact MLX5_MAX_PORTS and
change it to a range.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Use a lag specific lock instead of depending on external locks to
synchronise the lag creation/destruction.
With this, taking E-Switch mode lock is no longer needed for syncing
lag logic.
Cleanup any dead code that is left over and don't export functions that
aren't used outside the E-Switch core code.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There is no need to expose E-Switch function for something that can be
checked with already present API inside lag code.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Devcom API is intended to be used between 2 devices only add this
implied assumption into the code and check when it's no true.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, health recovery will reload driver to recover it from fatal
errors. During the driver's load process, it would wait for FW to set the
pre-init bit for up to 120 seconds, beyond this threshold it would abort
the load process. In some cases, such as a FW upgrade on the DPU, this
timeout period is insufficient, and the user has no way to recover the
host device.
To solve this issue, introduce a new FW pre-init timeout for health
recovery, which is set to 2 hours.
The timeout for devlink reload and probe will use the original one because
they are user triggered flows, and therefore should not have a
significantly long timeout, during which the user command would hang.
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently, removing a device needs to get the driver interface lock before
doing any cleanup. If the driver is waiting in a loop for FW init, there
is no way to cancel the wait, instead the device cleanup waits for the
loop to conclude and release the lock.
To allow immediate response to remove device commands, check the TEARDOWN
flag while waiting for FW init, and exit the loop if it has been set.
Signed-off-by: Gavin Li <gavinl@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The SMB2 Write packet contains data that is to be written
to a file or to a pipe. Depending on the client, there may
be padding between the header and the data field.
Currently, the length is validated only in the case padding
is present.
Since the DataOffset field always points to the beginning
of the data, there is no need to have a special case for
padding. By removing this, the length is validated in both
cases.
Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
The issue happens in a specific path in smb_check_perm_dacl(). When
"id" and "uid" have the same value, the function simply jumps out of
the loop without decrementing the reference count of the object
"posix_acls", which is increased by get_acl() earlier. This may
result in memory leaks.
Fix it by decreasing the reference count of "posix_acls" before
jumping to label "check_access_bits".
Fixes: 777cad1604 ("ksmbd: remove select FS_POSIX_ACL in Kconfig")
Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
When I refactored this Makefile, I accidentally changed the CONFIG
option.
Fixes: b52455a73d ("crypto: vmx - Align the short log with Makefile cleanups")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a wrapper that converts back from the folio to the page. This
entire file needs to be converted to use folios, but that's a
task for a different set of patches.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
I suspect this isn't actually needed and that releasepage will have
done the job, but convert it for now and we can delete it later.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
All but two of the callers already have a folio; pass a folio into
try_to_free_buffers(). This removes the last user of cancel_dirty_page()
so remove that wrapper function too.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Also convert it to return a bool since it's called from release_folio().
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Saves 671 bytes from an allmodconfig build (!)
Function old new delta
release_buffer_page 1617 946 -671
Total: Before=67656, After=66985, chg -0.99%
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
All users are now converted to release_folio
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Use folios throughout the release_folio path.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Use folios throughout the release_folio path.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Use folios throughout the release_folio path.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>