Pull RAS updates from Borislav Petkov:
"The latest meager RAS updates:
- Enable processing of action-optional MCEs which have the Overflow
bit set (Tony Luck)
- -Wmissing-prototypes warning fix and a build fix (Valdis
Klētnieks)"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
RAS: Build debugfs.o only when enabled in Kconfig
RAS: Fix prototype warnings
x86/mce: Don't check for the overflow bit on action optional machine checks
so that the work can scale better.
* New driver for Mellanox' BlueField SoC DDR controller (Shravan Kumar Ramani)
* AMD Rome support in amd64_edac (Yazen Ghannam and Isaac Vaughn)
* Misc fixes, cleanups and code improvements
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl17RrMACgkQEsHwGGHe
VUpt1g//WHs2FkGvwzHFmNHaUpATWLVKJ3azrNs+q1Av53xGngg/V8qTewwcQ1tx
OdgCqHj6PIMAoJbI2IIMIrjdMfM7EFXKgtk1kSut2uBowODkt29Ccj1ncdQ6uZOa
OSBIt7fXcsQQoX+0/SbutR3GB0gH6/rJQ+LFZix1g1o0pUfqgPGf1ypZ+gaSc8zr
bq+Ke8mXwaw13qy06eXAoOb32sAOfHqHjQUs53ecA+shmhua6AhRwuH+LcQyORoI
RdIKLIKb6yw/Vri3PdrokVKZo4OPa8hbdSlC9JPhZ3SfAMlmrDfr7OiVqneKfBJz
wRBfVVNNozbIHu5lv7ZxVaDAUh6VnWhhW036HxiMSG6NdjalVb9HB8xpKNbBlDxy
L7q0Y5CLEcRQpkmO/S0SK8KA4Vr8w8Vw/rYVRO9ypM6LEz+uoyMIZ5+mJjiCnjYo
sClPz5TqqgxCieId0tyAs+IZ1iXItRpEogBTXmvgIrrnWXkhQmo6Jhvd7VGDM6bT
3RwrkWXApzxdLQSv/JqRC7N8mdII0WwDI6AjBUoC7cOeBaMQcHrcD3sLYgjWL1Up
IQgSfHdR13S2p4Og1Phni/gSOnruCzCJQI1aCC8jjj92f1z8a/T0rjyMkE/qpZ+X
tPOqNGkXfw86LzBDFFIOtGI6v8fdVX+Di/pTQid5hm3YdPMe1D8=
=lEnD
-----END PGP SIGNATURE-----
Merge tag 'edac_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov:
"The new thing this time around is that we have three maintainers now
and a new, old repo. New because it is new for the EDAC tree which is
hosted there from now on and old because it is Tony's and mine's old
RAS repo which we still use occasionally when the stuff isn't in tip.
Summary:
- EDAC tree has three maintainers and one new designated reviewer
now, so that the work can scale better.
- New driver for Mellanox' BlueField SoC DDR controller (Shravan
Kumar Ramani)
- AMD Rome support in amd64_edac (Yazen Ghannam and Isaac Vaughn)
- Misc fixes, cleanups and code improvements"
* tag 'edac_for_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/amd64: Add PCI device IDs for family 17h, model 70h
MAINTAINERS: Add Robert as a EDAC reviewer
EDAC/mc_sysfs: Make debug messages consistent
EDAC/mc_sysfs: Remove pointless gotos
EDAC: Prefer 'unsigned int' to bare use of 'unsigned'
EDAC/amd64: Support asymmetric dual-rank DIMMs
EDAC/amd64: Cache secondary Chip Select registers
EDAC/amd64: Decode syndrome before translating address
EDAC/amd64: Find Chip Select memory size using Address Mask
EDAC/amd64: Initialize DIMM info for systems with more than two channels
EDAC/amd64: Recognize DRAM device type ECC capability
EDAC/amd64: Support more than two controllers for chip selects handling
EDAC/mc: Cleanup _edac_mc_free() code
EDAC, pnd2: Fix ioremap() size in dnv_rd_reg()
EDAC, mellanox: Add ECC support for BlueField DDR4
EDAC/altera: Use the proper type for the IRQ status bits
EDAC/mc: Fix grain_bits calculation
edac: altera: Move Stratix10 SDRAM ECC to peripheral
MAINTAINERS: update EDAC entry to reflect current tree and maintainers
-----BEGIN PGP SIGNATURE-----
iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXW0lVyAcamFya2tvLnNh
a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0sjdAQD3lpIS7YeT37Bu
QVdUkRObzkN3gWvv2oZkGXPg72843AD+KH0GL+M9SfBrYO/1StBJascarOHIIqUt
/1uqUgL7fAI=
=IV4V
-----END PGP SIGNATURE-----
Merge tag 'tpmdd-next-20190902' of git://git.infradead.org/users/jjs/linux-tpmdd
Pull tpm updates from Jarkko Sakkinen:
"A new driver for fTPM living inside ARM TEE was added this round.
In addition to that, there are three bug fixes and one clean up"
* tag 'tpmdd-next-20190902' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm/tpm_ftpm_tee: Document fTPM TEE driver
tpm/tpm_ftpm_tee: A driver for firmware TPM running inside TEE
tpm: Remove a deprecated comments about implicit sysfs locking
tpm_tis_core: Set TPM_CHIP_FLAG_IRQ before probing for interrupts
tpm_tis_core: Turn on the TPM before probing IRQ's
MAINTAINERS: fix style in KEYS-TRUSTED entry
Ido Schimmel says:
====================
mlxsw: spectrum_buffers: Add the ability to query the CPU port's shared buffer
Shalom says:
While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.
Patch #1 prevents changing the CPU port's threshold and binding.
Patch #2 registers the CPU port with devlink.
Patch #3 adds the ability to query the CPU port's shared buffer quotas and
occupancy.
v3:
Patch #2:
* Remove unnecessary wrapping
v2:
Patch #1:
* s/0/MLXSW_PORT_CPU_PORT/
* Assign "mlxsw_sp->ports[MLXSW_PORT_CPU_PORT]" at the end of
mlxsw_sp_cpu_port_create() to avoid NULL assignment on error path
* Add common functions for mlxsw_core_port_init/fini()
Patch #2:
* Move "changing CPU port's threshold and binding" check to a separate
patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While debugging packet loss towards the CPU, it is useful to be able to
query the CPU port's shared buffer quotas and occupancy.
Since the CPU port has no ingress buffers, all the shared buffers ingress
information will be cleared.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register CPU port with devlink.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Next patch is going to register the CPU port with devlink, but only so
that the CPU port's shared buffer configuration and occupancy could be
queried.
Prevent changing CPU port's shared buffer threshold and binding
configuration.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arthur Kiyanovski says:
====================
net: ena: implement adaptive interrupt moderation using dim
In this patchset we replace our adaptive interrupt moderation
implementation with the dim library implementation.
The dim library showed great improvement in throughput, latency
and CPU usage in different scenarios on ARM CPUs.
This patchset also includes a few bug fixes to the parts of the
old implementation of adaptive interrupt moderation that were left.
Changes from V1 patchset:
Removed stray empty lines from patches 01/11, 09/11.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ena_dev->intr_moder_rx/tx_interval save the intervals received from the
user after dividing them by ena_dev->intr_delay_resolution. Therefore
when intr_delay_resolution changes, the code needs to first mutiply
intr_moder_rx/tx_interval by the previous intr_delay_resolution to get
the value originally given by the user, and only then divide it by the
new intr_delay_resolution.
Current code does not first multiply intr_moder_rx/tx_interval by the old
intr_delay_resolution. This commit fixes it.
Also initialize ena_dev->intr_delay_resolution to be 1.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nonadaptive interrupt moderation intervals are assigned the value set
by the user in ethtool -C divided by ena_dev->intr_delay_resolution.
Therefore when the user tries to get the nonadaptive interrupt moderation
intervals with ethtool -c the code needs to multiply the saved value
by ena_dev->intr_delay_resolution.
The current code erroneously divides instead of multiplying in ethtool -c.
This patch fixes this.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation always updates the interrupt register with
the smoothed_interval of the rx_ring. However this should be
done only in case of adaptive interrupt moderation. If non-adaptive
interrupt moderation is used, the non-adaptive interrupt moderation
interval should be used. This commit fixes that.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove previous implementation of adaptive rx interrupt moderation
from ena_com files.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Deleted unused 4 fields from struct ena_adapter and their only user
ena_restore_ethtool_params().
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Out of the fields {per_napi_bytes, per_napi_packets} in struct ena_ring,
only rx_ring->per_napi_packets are used to determine if napi did work
for dim.
This commit removes all other uses of these fields.
2. Remove ena_ring->moder_tbl_idx, which is not used by dim.
3. Remove all calls to ena_com_destroy_interrupt_moderation(), since all it
did was to destroy the interrupt moderation table, which is removed as
part of removing old interrupt moderation code.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove code duplication in:
ena_com_update_nonadaptive_moderation_interval_tx()
ena_com_update_nonadaptive_moderation_interval_rx()
functions.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add driver_supported_features to host_host info which is a new API used to
communicate to the device which features are supported by the driver.
Add the interrupt_moderation bit to host_info->driver_supported_features
and enable it to signal the device that this driver supports interrupt
moderation properly.
Reserved bits are for features implemented in the future
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Remove old adaptive interrupt moderation code from set/get_coalesce()
2. Add ena_update_rx_rings_intr_moderation() function for updating
nonadaptive interrupt moderation intervals similarly to
ena_update_tx_rings_intr_moderation().
3. Remove checks of multiple unsupported received interrupt coalescing
parameters. This makes code cleaner and cancels the need to update
it every time a new coalescing parameter is invented.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the dim library for the rx adaptive interrupt moderation implementation
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add intr_moder_rx_interval to struct ena_com_dev and use it as the
location where the interrupt moderation rx interval is saved, instead
of the interrupt moderation table.
This is done as a first step before removing the old interrupt moderation
code.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexandru Ardelean says:
====================
ethtool: implement Energy Detect Powerdown support via phy-tunable
This changeset proposes a new control for PHY tunable to control Energy
Detect Power Down.
The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
this feature is common across other PHYs (like EEE), and defining
`ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
The way EDPD works, is that the RX block is put to a lower power mode,
except for link-pulse detection circuits. The TX block is also put to low
power mode, but the PHY wakes-up periodically to send link pulses, to avoid
lock-ups in case the other side is also in EDPD mode.
Currently, there are 2 PHY drivers that look like they could use this new
PHY tunable feature: the `adin` && `micrel` PHYs.
This series updates only the `adin` PHY driver to support this new feature,
as this chip has been tested. A change for `micrel` can be proposed after a
discussion of the PHY-tunable API is resolved.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver becomes the first user of the kernel's `ETHTOOL_PHY_EDPD`
phy-tunable feature.
EDPD is also enabled by default on PHY config_init, but can be disabled via
the phy-tunable control.
When enabling EDPD, it's also a good idea (for the ADIN PHYs) to enable TX
periodic pulses, so that in case the other PHY is also on EDPD mode, there
is no lock-up situation where both sides are waiting for the other to
transmit.
Via the phy-tunable control, TX pulses can be disabled if specifying 0
`tx-interval` via ethtool.
The ADIN PHY supports only fixed 1 second intervals; they cannot be
configured. That is why the acceptable values are 1,
ETHTOOL_PHY_EDPD_DFLT_TX_MSECS and ETHTOOL_PHY_EDPD_NO_TX (which disables
TX pulses).
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
this feature is common across other PHYs (like EEE), and defining
`ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
The way EDPD works, is that the RX block is put to a lower power mode,
except for link-pulse detection circuits. The TX block is also put to low
power mode, but the PHY wakes-up periodically to send link pulses, to avoid
lock-ups in case the other side is also in EDPD mode.
Currently, there are 2 PHY drivers that look like they could use this new
PHY tunable feature: the `adin` && `micrel` PHYs.
The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
datasheet does not mention whether they can be disabled, but mentions that
they can modified.
The way this change is structured, is similar to the PHY tunable downshift
control:
* a `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` value is exposed to cover a default
TX interval; some PHYs could specify a certain value that makes sense
* `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
* `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD
As noted by the `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` the interval unit is 1
millisecond, which should cover a reasonable range of intervals:
- from 1 millisecond, which does not sound like much of a power-saver
- to ~65 seconds which is quite a lot to wait for a link to come up when
plugging a cable
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When skb_shinfo(skb) is not able to cache extra fragment (that is,
skb_shinfo(skb)->nr_frags >= MAX_SKB_FRAGS), xennet_fill_frags() assumes
the sk_buff_head list is already empty. As a result, cons is increased only
by 1 and returns to error handling path in xennet_poll().
However, if the sk_buff_head list is not empty, queue->rx.rsp_cons may be
set incorrectly. That is, queue->rx.rsp_cons would point to the rx ring
buffer entries whose queue->rx_skbs[i] and queue->grant_rx_ref[i] are
already cleared to NULL. This leads to NULL pointer access in the next
iteration to process rx ring buffer entries.
Below is how xennet_poll() does error handling. All remaining entries in
tmpq are accounted to queue->rx.rsp_cons without assuming how many
outstanding skbs are remained in the list.
985 static int xennet_poll(struct napi_struct *napi, int budget)
... ...
1032 if (unlikely(xennet_set_skb_gso(skb, gso))) {
1033 __skb_queue_head(&tmpq, skb);
1034 queue->rx.rsp_cons += skb_queue_len(&tmpq);
1035 goto err;
1036 }
It is better to always have the error handling in the same way.
Fixes: ad4f15dc2c ("xen/netfront: don't bug in case of too many frags")
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_kfree_skb() function performs also input parameter validation.
Thus the test around the shown calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a race condition that can occur when calling ena_down().
The ena_clean_tx_irq() - which is a part of the napi handler -
function might wake up the tx queue when the queue is supposed
to be down (during recovery or changing the size of the queues
for example) This causes the ena_start_xmit() function to trigger
and possibly try to access the destroyed queues.
The race is illustrated below:
Flow A: Flow B(napi handler)
ena_down()
netif_carrier_off()
netif_tx_disable()
ena_clean_tx_irq()
netif_tx_wake_queue()
ena_napi_disable_all()
ena_destroy_all_io_queues()
After these flows the tx queue is active and ena_start_xmit() accesses
the destroyed queue which leads to a kernel panic.
fixes: 1738cd3ed3 (net: ena: Add a driver for Amazon Elastic Network Adapters (ENA))
Signed-off-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
drop_monitor: Better sanitize notified packets
When working in 'packet' mode, drop monitor generates a notification
with a potentially truncated payload of the dropped packet. The payload
is copied from the MAC header, but I forgot to check that the MAC header
was set, so do it now.
Patch #1 sets the offsets to the various protocol layers in netdevsim,
so that it will continue to work after the MAC header check is added to
drop monitor in patch #2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When working in 'packet' mode, drop monitor generates a notification
with a potentially truncated payload of the dropped packet. The payload
is copied from the MAC header, but I forgot to check that the MAC header
was set, so do it now.
Fixes: ca30707dee ("drop_monitor: Add packet alert mode")
Fixes: 5e58109b1e ("drop_monitor: Add support for packet alert mode for hardware drops")
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver periodically generates "trapped" UDP packets that it then
passes on to devlink. Set the offsets to the various protocol layers.
This is a prerequisite to the next patch, where drop monitor is taught
to check that the offset to the MAC header was set.
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean says:
====================
tc-taprio offload for SJA1105 DSA
This is the third attempt to submit the tc-taprio offload model for
inclusion in the networking tree. The sja1105 switch driver will provide
the first implementation of the offload. Only the bare minimum is added:
- The offload model and a DSA pass-through
- The hardware implementation
- The interaction with the netdev queues in the tagger code
- Documentation
What has been removed from previous attempts is support for
PTP-as-clocksource in sja1105, as well as configuring the traffic class
for management traffic. These will be added as soon as the offload
model is settled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While not an exhaustive usage tutorial, this describes the details
needed to build more complex scenarios.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This qdisc offload is the closest thing to what the SJA1105 supports in
hardware for time-based egress shaping. The switch core really is built
around SAE AS6802/TTEthernet (a TTTech standard) but can be made to
operate similarly to IEEE 802.1Qbv with some constraints:
- The gate control list is a global list for all ports. There are 8
execution threads that iterate through this global list in parallel.
I don't know why 8, there are only 4 front-panel ports.
- Care must be taken by the user to make sure that two execution threads
never get to execute a GCL entry simultaneously. I created a O(n^4)
checker for this hardware limitation, prior to accepting a taprio
offload configuration as valid.
- The spec says that if a GCL entry's interval is shorter than the frame
length, you shouldn't send it (and end up in head-of-line blocking).
Well, this switch does anyway.
- The switch has no concept of ADMIN and OPER configurations. Because
it's so simple, the TAS settings are loaded through the static config
tables interface, so there isn't even place for any discussion about
'graceful switchover between ADMIN and OPER'. You just reset the
switch and upload a new OPER config.
- The switch accepts multiple time sources for the gate events. Right
now I am using the standalone clock source as opposed to PTP. So the
base time parameter doesn't really do much. Support for the PTP clock
source will be added in a future series.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a preparation patch for the tc-taprio offload (and potentially
for other future offloads such as tc-mqprio).
Instead of looking directly at skb->priority during xmit, let's get the
netdev queue and the queue-to-traffic-class mapping, and put the
resulting traffic class into the dsa_8021q PCP field. The switch is
configured with a 1-to-1 PCP-to-ingress-queue-to-egress-queue mapping
(see vlan_pmap in sja1105_main.c), so the effect is that we can inject
into a front-panel's egress traffic class through VLAN tagging from
Linux, completely transparently.
Unfortunately the switch doesn't look at the VLAN PCP in the case of
management traffic to/from the CPU (link-local frames at
01-80-C2-xx-xx-xx or 01-1B-19-xx-xx-xx) so we can't alter the
transmission queue of this type of traffic on a frame-by-frame basis. It
is only selected through the "hostprio" setting which ATM is harcoded in
the driver to 7.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support tc-taprio offload, the TTEthernet egress scheduling
core registers must be made visible through the static interface.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA currently handles shared block filters (for the classifier-action
qdisc) in the core due to what I believe are simply pragmatic reasons -
hiding the complexity from drivers and offerring a simple API for port
mirroring.
Extend the dsa_slave_setup_tc function by passing all other qdisc
offloads to the driver layer, where the driver may choose what it
implements and how. DSA is simply a pass-through in this case.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows taprio to offload the schedule enforcement to capable
network cards, resulting in more precise windows and less CPU usage.
The gate mask acts on traffic classes (groups of queues of same
priority), as specified in IEEE 802.1Q-2018, and following the existing
taprio and mqprio semantics.
It is up to the driver to perform conversion between tc and individual
netdev queues if for some reason it needs to make that distinction.
Full offload is requested from the network interface by specifying
"flags 2" in the tc qdisc creation command, which in turn corresponds to
the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit.
The important detail here is the clockid which is implicitly /dev/ptpN
for full offload, and hence not configurable.
A reference counting API is added to support the use case where Ethernet
drivers need to keep the taprio offload structure locally (i.e. they are
a multi-port switch driver, and configuring a port depends on the
settings of other ports as well). The refcount_t variable is kept in a
private structure (__tc_taprio_qopt_offload) and not exposed to drivers.
In the future, the private structure might also be expanded with a
backpointer to taprio_sched *q, to implement the notification system
described in the patch (of when admin became oper, or an error occurred,
etc, so the offload can be monitored with 'tc qdisc show').
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
make cifs more verbose about buffer size errors
and add some comments
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
SMB3 change notify is important to allow applications to wait
on directory change events of different types (e.g. adding
and deleting files from others systems). Add worker functions
for this.
Acked-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Introduce a new CONFIG_CIFS_ROOT option to handle root file systems
over a SMB share.
In order to mount the root file system during the init process, make
cifs.ko perform non-blocking socket operations while mounting and
accessing it.
Cc: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Paulo Alcantara (SUSE) <paulo@paulo.ac>
Signed-off-by: Steve French <stfrench@microsoft.com>
when mounting with modefromsid, we end up writing 4 ACE in a security
descriptor that only has room for 3, thus triggering an out-of-bounds
write. fix this by changing the min size of a security descriptor.
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
commit a091c5f67c99 ("smb3: allow parallelizing decryption of reads")
had a potential null dereference
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
An earlier patch "CIFS: fix deadlock in cached root handling"
did not completely address the deadlock in open_shroot. This
patch addresses the deadlock.
In testing the recent patch:
smb3: improve handling of share deleted (and share recreated)
we were able to reproduce the open_shroot deadlock to one
of the target servers in unmount in a delete share scenario.
Fixes: 7e5a70ad88 ("CIFS: fix deadlock in cached root handling")
This is version 2 of this patch. An earlier version of this
patch "smb3: fix unmount hang in open_shroot" had a problem
found by Dan.
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
In some cases to work around server bugs or performance
problems it can be helpful to be able to disable requesting
SMB2.1/SMB3 leases on a particular mount (not to all servers
and all shares we are mounted to). Add new mount parm
"nolease" which turns off requesting leases on directory
or file opens. Currently the only way to disable leases is
globally through a module load parameter. This is more
granular.
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
When a share is deleted, returning EIO is confusing and no useful
information is logged. Improve the handling of this case by
at least logging a better error for this (and also mapping the error
differently to EREMCHG). See e.g. the new messages that would be logged:
[55243.639530] server share \\192.168.1.219\scratch deleted
[55243.642568] CIFS VFS: \\192.168.1.219\scratch BAD_NETWORK_NAME: \\192.168.1.219\scratch
In addition for the case where a share is deleted and then recreated
with the same name, have now fixed that so it works. This is sometimes
done for example, because the admin had to move a share to a different,
bigger local drive when a share is running low on space.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Displayed in /proc/fs/cifs/Stats once for each
socket we are connected to.
This allows us to find out what the maximum number of
requests that had been in flight (at any one time). Note that
/proc/fs/cifs/Stats can be reset if you want to look for
maximum over a small period of time.
Sample output (immediately after mount):
Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0
0 session 0 share reconnects
Total vfs operations: 5 maximum at one time: 2
Max requests in flight: 2
1) \\localhost\scratch
SMBs: 18
Bytes read: 0 Bytes written: 0
...
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
No point in offloading read decryption if no other requests on the
wire
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
and convert smb2_query_path_info() to use it.
This will eliminate the need for a SMB2_Create when we already have an
open handle that can be used. This will also prevent a oplock break
in case the other handle holds a lease.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Disable offload of the decryption of encrypted read responses
by default (equivalent to setting this new mount option "esize=0").
Allow setting the minimum encrypted read response size that we
will choose to offload to a worker thread - it is now configurable
via on a new mount option "esize="
Depending on which encryption mechanism (GCM vs. CCM) and
the number of reads that will be issued in parallel and the
performance of the network and CPU on the client, it may make
sense to enable this since it can provide substantial benefit when
multiple large reads are in flight at the same time.
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
decrypting large reads on encrypted shares can be slow (e.g. adding
multiple milliseconds per-read on non-GCM capable servers or
when mounting with dialects prior to SMB3.1.1) - allow parallelizing
of read decryption by launching worker threads.
Testing to Samba on localhost showed 25% improvement.
Testing to remote server showed very large improvement when
doing more than one 'cp' command was called at one time.
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Where we have a tcon available we can log \\server\share as part
of the message. Only do this for the VFS log level.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Code cleanup in the 5.1 kernel changed the array
passed into signing verification on large reads leading
to warning messages being logged when copying files to local
systems from remote.
SMB signature verification returned error = -5
This changeset fixes verification of SMB3 signatures of large
reads.
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>