The MFD_OMAP_USB_HOST uses Common Clock Framework thus it cannot be
built on platforms without it (e.g. compile test on MIPS with LANTIQ):
mips-linux-ld: drivers/mfd/omap-usb-host.o: in function `usbhs_omap_probe':
omap-usb-host.c:(.text+0x940): undefined reference to `clk_set_parent'
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Change the detection order to priorize DMI table entries over available
ACPI entries.
This makes it more easy for product developers to patch product specific
handling into the driver.
Furthermore it allows to simplify the implementation a bit and
especially to remove the need to force synchronous probing.
Signed-off-by: Michael Brunner <michael.brunner@kontron.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Correct power-off programming sequence in order to fix shutting down
devices which are using TPS65910 PMIC.
In accordance to the TPS65910 datasheet, the PMIC's state-machine
transitions into the OFF state only when DEV_OFF bit of DEVCTRL_REG is
set. The ON / SLEEP states also should be cleared, otherwise PMIC won't
get into a proper state on shutdown. Devices like Nexus 7 tablet and Ouya
game console are shutting down properly now.
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Zack Pearsall <zpearsall@yahoo.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
drivers/mfd/altera-sysmgr.c:155:36-39: WARNING: Suspicious code. resource_size is maybe missing with res
Generated by: scripts/coccinelle/api/resource_size.cocci
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
SPI bus number must be assigned dynamically for each device, otherwise it
will crash when multiple devices are plugged to system.
Reported-and-tested-by: syzbot+c60ddb60b685777d9d59@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Antti Palosaari <crope@iki.fi>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
When a packet is fragmented by batman-adv, the original batman-adv header
is not modified. Only a new fragmentation is inserted between the original
one and the ethernet header. The code must therefore make sure that it has
a writable region of this size in the skbuff head.
But it is not useful to always reallocate the skbuff by this size even when
there would be more than enough headroom still in the skb. The reallocation
is just to costly during in this codepath.
Fixes: ee75ed8887 ("batman-adv: Fragment and send skbs larger than mtu")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
The batadv net_device is trying to propagate the needed_headroom and
needed_tailroom from the lower devices. This is needed to avoid cost
intensive reallocations using pskb_expand_head during the transmission.
But the fragmentation code split the skb's without adding extra room at the
end/beginning of the various fragments. This reduced the performance of
transmissions over complex scenarios (batadv on vxlan on wireguard) because
the lower devices had to perform the reallocations at least once.
Fixes: ee75ed8887 ("batman-adv: Fragment and send skbs larger than mtu")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
If a batman-adv packets has to be fragmented, then the original batman-adv
packet header is not stripped away. Instead, only a new header is added in
front of the packet after it was split.
This size must be considered to avoid cost intensive reallocations during
the transmission through the various device layers.
Fixes: 7bca68c784 ("batman-adv: Add lower layer needed_(head|tail)room to own ones")
Reported-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
In msi2500_stop_streaming(), there is an if statement on line 870 to
check whether dev->udev is NULL:
if (dev->udev)
When dev->udev is NULL, it is used on line 877:
msi2500_ctrl_msg(dev, CMD_STOP_STREAMING, 0)
usb_control_msg(dev->udev, usb_sndctrlpipe(dev->udev, 0), ...)
Thus, a possible null-pointer dereference may occur.
To fix this bug, dev->udev is checked before calling msi2500_ctrl_msg().
This bug is found by a static analysis tool STCheck written by us.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Each iteration of for_each_child_of_node puts the previous node, but in
the case of a goto from the middle of the loop, there is no put, thus
causing a memory leak. Hence add a new label that puts the last used
node, and edit the goto statements in the middle of the loop to first go
to the new label.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
Acked-by: Patrice Chotard <patrice.chotard@st.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
As reported on:
https://lore.kernel.org/linux-media/20190627222020.45909-1-willemdebruijn.kernel@gmail.com/
if gp8psk_usb_in_op() returns an error, the status var is not
initialized. Yet, this var is used later on, in order to
identify:
- if the device was already started;
- if firmware has loaded;
- if the LNBf was powered on.
Using status = 0 seems to ensure that everything will be
properly powered up.
So, instead of the proposed solution, let's just set
status = 0.
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Rework the setting of DMA cache parameters, program more appropriate
values and explicitly set sharability domain.
Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by
an equivalent 'dma_set_mask_and_coherent()' which is much less verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by
an equivalent 'dma_set_mask_and_coherent()' which is much less verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by
an equivalent 'dma_set_mask_and_coherent()' which is much less verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break statements instead of
letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Acked-by: Gilad Ben-Yossef <gilad@benyossef.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
WireGuard and IPsec both typically operate on input blocks that are
~1420 bytes in size, given the default Ethernet MTU of 1500 bytes and
the overhead of the VPN metadata.
Many aead and sckipher implementations are optimized for power-of-2
block sizes, and whether they perform well when operating on 1420
byte blocks cannot be easily extrapolated from the performance on
power-of-2 block size. So let's add 1420 bytes explicitly, and round
it up to the next blocksize multiple of the algo in question if it
does not support 1420 byte blocks.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When working on crypto algorithms, being able to run tcrypt quickly
without booting an entire Linux installation can be very useful. For
instance, QEMU/kvm can be used to boot a kernel from the command line,
and having tcrypt.ko builtin would allow tcrypt to be executed to run
benchmarks, or to run tests for algorithms that need to be instantiated
from templates, without the need to make it past the point where the
rootfs is mounted.
So let's relax the requirement that tcrypt can only be built as a module
when CONFIG_EXPERT is enabled.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit c4741b2305 ("crypto: run initcalls for generic implementations
earlier") converted tcrypt.ko's module_init() to subsys_initcall(), but
this was unintentional: tcrypt.ko currently cannot be built into the core
kernel, and so the subsys_initcall() gets converted into module_init()
under the hood. Given that tcrypt.ko does not implement a generic version
of a crypto algorithm that has to be available early during boot, there
is no point in running the tcrypt init code earlier than implied by
module_init().
However, for crypto development purposes, we will lift the restriction
that tcrypt.ko must be built as a module, and when builtin, it makes sense
for tcrypt.ko (which does its work inside the module init function) to run
as late as possible. So let's switch to late_initcall() instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move HiSilicon TRNG V2 driver into 'drivers/crypto/hisilicon/trng'
with some updating on 'MAINTAINERS'.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds support for pseudo random number generator(PRNG)
in Crypto subsystem.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Driver of HiSilicon true random number generator(TRNG)
is removed from 'drivers/char/hw_random'.
Both 'Kunpeng 920' and 'Kunpeng 930' chips have TRNG,
however, PRNG is only supported by 'Kunpeng 930'.
So, this driver is moved to 'drivers/crypto/hisilicon/trng/'
in the next to enable the two's TRNG better.
Signed-off-by: Weili Qian <qianweili@huawei.com>
Reviewed-by: Zaibo Xu <xuzaibo@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Condition !A || A && B is equivalent to !A || B.
Generated by: scripts/coccinelle/misc/excluded_middle.cocci
Fixes: b76f0ea013 ("coccinelle: misc: add excluded_middle.cocci script")
CC: Denis Efremov <efremov@linux.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Partial hash was being copied into the final result buffer without the
entire message block processed. Depending on how the end user processes
this result buffer, errors vary from result buffer corruption to result
buffer poisoing. Fix this issue by ensuring that only the final hash value
is copied into the result buffer.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Wiring the SIMD code into the generic driver has the unfortunate side
effect that the tcrypt testing code cannot distinguish them, and will
therefore not use the latter to fuzz test the former, as it does for
other algorithms.
So let's refactor the code a bit so we can register two implementations:
aegis128-generic and aegis128-simd.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of calculating the tag and returning it to the caller on
decryption, use a SIMD compare and min across vector to perform
the comparison. This is slightly more efficient, and removes the
need on the caller's part to wipe the tag from memory if the
decryption failed.
While at it, switch to unsigned int when passing cryptlen and
assoclen - we don't support input sizes where it matters anyway.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Avoid copying the tail block via a stack buffer if the total size
exceeds a single AEGIS block. In this case, we can use overlapping
loads and stores and NEON permutation instructions instead, which
leads to a modest performance improvement on some cores (< 5%),
and is slightly cleaner. Note that we still need to use a stack
buffer if the entire input is smaller than 16 bytes, given that
we cannot use 16 byte NEON loads and stores safely in this case.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The AEGIS spec mentions explicitly that the security guarantees hold
only if the resulting plaintext and tag of a failed decryption are
withheld. So ensure that we abide by this.
While at it, drop the unused struct aead_request *req parameter from
crypto_aegis128_process_crypt().
Reviewed-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
With commit df6c2ec872 ("ARM: OMAP2+: Drop legacy remaining legacy
platform data for am4") we moved am4 to boot with simple-pm-bus using
genpd and devicetree based data. But I forgot to test am4 only build
that still has few references to the old platform data left, and cause
undefined reference errors with omap_hwmod_set_postsetup_state and
omap_hwmod_for_each. We can just drop the related calls for am4 now,
and also drop the references to unused struct wkup_m3_platform_data.
Reported-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
"Daniel T. says:
====================
Numerous refactoring that rewrites BPF programs written with bpf_load
to use the libbpf loader was finally completed, resulting in BPF
programs using bpf_load within the kernel being completely no longer
present.
This patchset refactors remaining bpf programs with libbpf and
completely removes bpf_load, an outdated bpf loader that is difficult
to keep up with the latest kernel BPF and causes confusion.
Changes in v2:
- drop 'move tracing helpers to trace_helper' patch
- add link pinning to prevent cleaning up on process exit
- add static at global variable and remove unused variable
- change to destroy link even after link__pin()
- fix return error code on exit
- merge commit with changing Makefile
Changes in v3:
- cleanup bpf_link, bpf_object and cgroup fd both on success and error
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Numerous refactoring that rewrites BPF programs written with bpf_load
to use the libbpf loader was finally completed, resulting in BPF
programs using bpf_load within the kernel being completely no longer
present.
This commit removes bpf_load, an outdated bpf loader that is difficult
to keep up with the latest kernel BPF and causes confusion.
Also, this commit removes the unused trace_helper and bpf_load from
samples/bpf target objects from Makefile.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-8-danieltimlee@gmail.com
Currently, lwt_len_hist's map lwt_len_hist_map is uses pinning, and the
map isn't cleared on test end. This leds to reuse of that map for
each test, which prevents the results of the test from being accurate.
This commit fixes the problem by removing of pinned map from bpffs.
Also, this commit add the executable permission to shell script
files.
Fixes: f74599f7c5 ("bpf: Add tests and samples for LWT-BPF")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-7-danieltimlee@gmail.com
This commit refactors the existing program with libbpf bpf loader.
Since the kprobe, tracepoint and raw_tracepoint bpf program can be
attached with single bpf_program__attach() interface, so the
corresponding function of libbpf is used here.
Rather than specifying the number of cpus inside the code, this commit
uses the number of available cpus with _SC_NPROCESSORS_ONLN.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-6-danieltimlee@gmail.com
This commit refactors the existing ibumad program with libbpf bpf
loader. Attach/detach of Tracepoint bpf programs has been managed
with the generic bpf_program__attach() and bpf_link__destroy() from
the libbpf.
Also, instead of using the previous BPF MAP definition, this commit
refactors ibumad MAP definition with the new BTF-defined MAP format.
To verify that this bpf program works without an infiniband device,
try loading ib_umad kernel module and test the program as follows:
# modprobe ib_umad
# ./ibumad
Moreover, TRACE_HELPERS has been removed from the Makefile since it is
not used on this program.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-5-danieltimlee@gmail.com
This commit refactors the existing kprobe program with libbpf bpf
loader. To attach bpf program, this uses generic bpf_program__attach()
approach rather than using bpf_load's load_bpf_file().
To attach bpf to perf_event, instead of using previous ioctl method,
this commit uses bpf_program__attach_perf_event since it manages the
enable of perf_event and attach of BPF programs to it, which is much
more intuitive way to achieve.
Also, explicit close(fd) has been removed since event will be closed
inside bpf_link__destroy() automatically.
Furthermore, to prevent conflict of same named uprobe events, O_TRUNC
flag has been used to clear 'uprobe_events' interface.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-4-danieltimlee@gmail.com
This commit refactors the existing cgroup program with libbpf bpf
loader. The original test_cgrp2_sock2 has keeped the bpf program
attached to the cgroup hierarchy even after the exit of user program.
To implement the same functionality with libbpf, this commit uses the
BPF_LINK_PINNING to pin the link attachment even after it is closed.
Since this uses LINK instead of ATTACH, detach of bpf program from
cgroup with 'test_cgrp2_sock' is not used anymore.
The code to mount the bpf was added to the .sh file in case the bpff
was not mounted on /sys/fs/bpf. Additionally, to fix the problem that
shell script cannot find the binary object from the current path,
relative path './' has been added in front of binary.
Fixes: 554ae6e792 ("samples/bpf: add userspace example for prohibiting sockets")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-3-danieltimlee@gmail.com
This commit refactors the existing cgroup programs with libbpf
bpf loader. Since bpf_program__attach doesn't support cgroup program
attachment, this explicitly attaches cgroup bpf program with
bpf_program__attach_cgroup(bpf_prog, cg1).
Also, to change attach_type of bpf program, this uses libbpf's
bpf_program__set_expected_attach_type helper to switch EGRESS to
INGRESS. To keep bpf program attached to the cgroup hierarchy even
after the exit, this commit uses the BPF_LINK_PINNING to pin the link
attachment even after it is closed.
Besides, this program was broken due to the typo of BPF MAP definition.
But this commit solves the problem by fixing this from 'queue_stats' map
struct hvm_queue_stats -> hbm_queue_stats.
Fixes: 36b5d47113 ("selftests/bpf: samples/bpf: Split off legacy stuff from bpf_helpers.h")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20201124090310.24374-2-danieltimlee@gmail.com
When eswitch manager is running on ECPF, host PF should be treated
as non eswitch manager port, similar to other VF vports.
Fail to do so, results in firmware treating PF's vport as ECPF
vport for eswitch ACL tables.
Non zero check to figure out if a given vport is other vport or not
is not sufficient becase PF vport number = 0 on ECPF.
Hence, create esw acl tables with an attribute of other vport.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently ECPF enables external host PF too early in the initialization
sequence for Ethernet links when ECPF is eswitch manager.
Due to this, when external host PF driver is loaded, host PF's HCA CAP has
inner_ip_version supported by NIC RX flow table.
This capability is later updated by firmware after ECPF driver enables
ENCAP/DECAP as eswitch manager.
This results into a timing race condition, where CREATE_TIR command
fails with a below syndrome on host PF.
mlx5_cmd_check:775:(pid 510): CREATE_TIR(0x900) op_mod(0x0) failed,
status bad parameter(0x3), syndrome (0x562b00)
Hence, enable the external host PF after necessary eswitch and per vport
initialization is completed.
Continue to enable host PF when eswitch manager capability is off for a
ECPF.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
To match the hardware spec, rename peer_pf to host_pf.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Subsequent patch implements helper API which has mlx5_core_dev
as const pointer, make its caller API too const *.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Bodong Wang <bodong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Export
mlx5_create_flow_table()
mlx5_create_flow_group()
mlx5_destroy_flow_group().
These symbols are required by the VDPA implementation to create rules
that consume VDPA specific traffic.
We do not deal with putting the prototypes in a header file since they
already exist in include/linux/mlx5/fs.h.
Signed-off-by: Eli Cohen <elic@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Expose other function ifc bits to enable setting HCA caps on behalf of
other function.
In addition, expose vhca_resource_manager bit to control whether the
other function functionality is supported by firmware.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Expose FW indication that it supports stateless offloads for IP over IP
tunneled packets per direction. In some HW like ConnectX-4 IP-in-IP
support is not symmetric, it supports steering on the inner header but
it doesn't TX-Checksum and TSO. Add IP-in-IP capability per direction to
cover this case as well.
Note: only if both indications are turned on, the global
tunnel_stateless_ip_over_ip is on too.
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>