By default, coccicheck utilizes all available threads to implement
parallelisation. However, when all available threads are used,
a decrease in performance is noted. The elapsed time is minimum
when at most one thread per core is used.
For example, on benchmarking the semantic patch kfree.cocci for
usb/serial using hyperfine, the outputs obtained for J=5 and J=2
are 1.32 and 1.90 times faster than those for J=10 and J=9
respectively for two separate runs. For the larger drivers/staging
directory, minimium elapsed time is obtained for J=3 which is 1.86
times faster than that for J=12. The optimal J value does not
exceed 6 in any of the test runs. The benchmarks are run on a machine
with 6 cores, with 2 threads per core, i.e, 12 hyperthreads in all.
To improve performance, modify coccicheck to use at most only
one thread per core by default.
Signed-off-by: Sumera Priyadarsini <sylphrenadin@gmail.com>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Mistakenly bit 2 was set instead of bit 3 as in the vendor driver.
Fixes: a7a92cf815 ("r8169: sync PCIe PHY init with vendor driver 8.047.01")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan says:
====================
bnxt_en: Update for net-next.
This patch series adds 2 main features to the bnxt_en driver: 200G
link speed support and FEC support with some refactoring of the
link speed logic. The firmware interface is updated to have proper
support for these 2 features. The ethtool preset max channel value
is also adjusted properly to account for XDP and TCs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The current logic that calculates the preset maximum value for combined
channel does not take into account the rings used for XDP and mqprio
TCs. Each of these features will reduce the number of TX rings. Add
the logic to divide the TX rings accordingly based on whether the
device is currently in XDP mode and whether TCs are in use.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This feature allows the user to set the different FEC modes on the NIC
port. Any new setting will take effect immediately after a link toggle.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code is reporting the FEC configured settings during link up.
Change it to report the more useful active FEC encoding that may be
negotiated or auto detected.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement .get_fecparam() method to report the configured and active FEC
settings. Also report the supported and advertised FEC settings to
the .get_link_ksettings() method.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PORT_PHY_CONFIG is always sent with REQ_FLAGS_RESET_PHY set. This flag
must be set in order for the firmware to institute the requested PHY
change immediately, but it results in a link flap. This is unnecessary
and results in an improved user experience if the PHY reconfiguration
is avoided when the user requested speed does not constitute a change.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some 200G dual port NICs, if one port is configured to 200G,
firmware will disable the ethernet link on the other port. Firmware
will send notification to the driver for the disabled port when this
happens. Define a new field in the link_info structure to keep track
of this state. The new phy_state field replaces the unused loop_back
field.
Log a message when the phy_state changes state. In the disabled state,
disallow any PHY configurations on the disabled port as the firmware
will fail all calls to configure the PHY in this state.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ethtool PAM4 link modes for:
50000baseCR_Full
100000baseCR2_Full
200000baseCR4_Full
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware interface has added support for new link speeds using
PAM4 modulation. Expand the bnxt_link_info structure to closely
mirror the new firmware structures. Add logic to copy the PAM4
capabilities and settings from the firmware.
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It will be necessary to update more than one field in the link_info
structure when PAM4 speeds are added in a later patch. Instead of
merely translating ethtool speed values to firmware speed values,
change the responsiblity of this function to update all the necessary
link_info fields required to force the speed change to the desired
ethtool value. This also reduces code duplication somewhat at the two
call sites, which otherwise both have to independently update link_info
fields to turn off auto negotiation advertisements.
Also use the appropriate REQ_FORCE_LINK_SPEED definitions. These happen
to have the same values, but req_link_speed is utilimately passed as
force_link_speed in HWRM_PORT_PHY_CFG which is not defined in terms of
REQ_AUTO_LINK_SPEED.
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the code for determining an advertised speed is no longer
supported into a separate function. This will avoid some code
duplication in a later patch when supporting PAM4 speeds, since
these speeds are specified in a separate field.
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The main changes include FEC, ECN statistics, HWRM_PORT_PHY_QCFG
response size reduction, and a new counter added to
ctx_hw_stats_ext struct to support the new 58818 chip.
The ctx_hw_stats_ext structure is now the superset supporting the new
58818 chips and the prior P5 chips. Add a new flag to identify the new
chip and use constants for the chip specific ring statistics sizes
instead of the size of the structure.
Because the HWRM_PORT_PHY_QCFG response structure size has shrunk back
to 96 bytes, the workaround added earlier to limit the size of this
message for forwarding to the VF can be removed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added the missing stub function for ptp_get_msgtype().
Fixes: 036c508ba9 ("ptp: Add generic ptp message type function")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
rivers/net/ethernet/marvell/mvpp2/mvpp2_main.c:7084:36: warning: ‘mvpp2_acpi_match’ defined but not used [-Wunused-const-variable=]
7084 | static const struct acpi_device_id mvpp2_acpi_match[] = {
| ^~~~~~~~~~~~~~~~
Wrap the definition inside #ifdef/#endif.
Compile tested only.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Expose transceiver overheat counter
Amit says:
An overheated transceiver can be the root cause of various network
problems such as link flapping. Counting the number of times a
transceiver's temperature was higher than its configured threshold can
therefore help in debugging such issues.
This patch set exposes a transceiver overheat counter via ethtool. This
is achieved by configuring the Spectrum ASIC to generate events whenever
a transceiver is overheated. The temperature thresholds are queried from
the transceiver (if available) and set to the default otherwise.
Example:
...
transceiver_overheat: 2
Patch set overview:
Patches #1-#3 add required device registers
Patches #4-#5 add required infrastructure in mlxsw to configure and
count overheat events
Patches #6-#9 gradually add support for the transceiver overheat counter
Patch #10 exposes the transceiver overheat counter via ethtool
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add structures for port statistics which read from core and not directly
from registers.
When netdev's ethtool statistics are queried, query the corresponding
module's overheat counter from core and expose it as
"transceiver_overheat".
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Module temperature warning events are enabled for modules that have a
temperature sensor and configured according to the temperature
thresholds queried from the module.
When a module is unplugged we are guaranteed not to get temperature
warning events. However, when a module is plugged in we need to
potentially update its current settings (i.e., event enablement and
thresholds).
Register to port module plug/unplug events and update module's settings
upon plug in events.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The overheat counter is a per-module counter, but it is exposed as part
of the corresponding netdev's statistics. It should therefore be
presented to user space relative to the netdev's lifetime.
Query the counter just before registering the netdev, so that the value
exposed to user space will be relative to this initial value.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MTWE (Management Temperature Warning Event) is triggered for sensors
whose temperature event enable bit is enabled in the MTMP register.
Enable events for all the modules that have a temperature sensor.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MTWE (Management Temperature Warning Event) is triggered when module's
temperature is higher than its threshold.
Register for MTWE events and increase the module's overheat counter when
its corresponding sensor goes above the configured threshold.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initialize an array that stores per-module overheat state and a counter
indicating how many times the module was in overheat state.
Export a function to query the counter according to module number.
Will be used later on by the switch driver (i.e., mlxsw_spectrum) to expose
module's overheat counter as part of ethtool statistics.
Initialize mlxsw_env after driver initialization to be able to query
number of modules from MGPIR register.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MTMP register controls various temperature settings on a per-sensor
basis. Subsequent patches are going to alter some of these settings for
sensors found on port modules in response to certain events.
In order to prevent the current callers that write to MTMP from
overriding these settings, have them first query the register and then
change only the relevant register fields.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PMAOS register configures and retrieves the per module status.
The register is used also for enabling event for status change.
It will be used to enable PMPE (Port Module Plug/Unplug) event.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PMPE register reports any operational status change of a module.
It will be used for enabling temperature warning event when a module is
plugged in.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MTWE (Management Temperature Warning Event) register, which is used
for over temperature warning.
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan says:
====================
net: hns3: updates for -next
To facilitate code maintenance and compatibility, #1 and #2 add
device version to replace pci revision, #3 to #9 adds support for
querying device capabilities and specifications, then the driver
can use these query results to implement corresponding features
(some features will be implemented later).
And #10 is a minor cleanup since too many parameters for
hclge_shaper_para_calc().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As function hclge_shaper_para_calc() has too many arguments to add
more, so encapsulate its three arguments ir_b, ir_u, ir_s into a
structure.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device specifications querying is unsupported by the old
firmware, in this case, these specifications are 0. However,
some specifications should not be 0 or will cause problem.
So after querying from firmware, some device specifications
are needed to check their value and set to default value if
their values are 0.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The max tm rate is a fixed value(100Gb/s) now as it is defined by a
macro. In order to support other rates in different kinds of device,
it is better to use specification queried from firmware to replace
this macro.
As function hclge_shaper_para_calc() has too many arguments to add
more, so encapsulate its three arguments ir_b, ir_u, ir_s into a
structure.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To improve code maintainability and compatibility, new commands
HCLGE_OPC_QUERY_DEV_SPECS for PF and HCLGEVF_OPC_QUERY_DEV_SPECS
for VF are introduced to query device specifications, instead of
statically defining specifications by checking the hardware version
or other methods.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds debugfs to dump each device capability whether is supported.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to improve code maintainability and compatibility, the
capabilities of new features are queried from firmware.
The member flag in struct hnae3_ae_dev indicates not only
capabilities, but some initialized status. As capabilities bits
queried from firmware is too many, it is better to use new member
to indicate them. So adds member capabs in struce hnae3_ae_dev.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the revision of the pci device is used to identify
whether FEC is supported, which is not good for maintainability
and compatibility. So use a capability flag to do that.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to improve code maintainability and compatibility,
add support to query the device capability by expanding the
existing version query command. The device capability refers
to the features supported by the device.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fibre device of PCI revision 0x20 don't support autoneg, and the ops
get_autoneg() return AUTONEG_DISABLE so function hns3_nway_reset()
will return earlier than judging PCI revision.
Function hclge_handle_rocee_ras_error() don't need to judge PCI
revision again because its caller hclge_handle_hw_ras_error() has
judged once.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To better identify the device version, struct hnae3_handle adds a
member dev_version to replace pci revision. The dev_version consists
of hardware version and PCI revision. The hardware version is queried
from firmware by an existing firmware version query command.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If mlxsw_sp_acl_tcam_group_id_get() fails, the mutex initialized earlier
is not destroyed.
Fix this by initializing the mutex after calling the function. This is
symmetric to mlxsw_sp_acl_tcam_group_del().
Fixes: 5ec2ee28d2 ("mlxsw: spectrum_acl: Introduce a mutex to guard region list updates")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix build error by selecting MDIO_DEVRES for MDIO_THUNDER.
Fixes this build error:
ld: drivers/net/phy/mdio-thunder.o: in function `thunder_mdiobus_pci_probe':
drivers/net/phy/mdio-thunder.c:78: undefined reference to `devm_mdiobus_alloc_size'
Fixes: 379d7ac7ca ("phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: netdev@vger.kernel.org
Cc: David Daney <david.daney@cavium.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace commas with semicolons. What is done is essentially described by
the following Coccinelle semantic patch (http://coccinelle.lip6.fr/):
// <smpl>
@@ expression e1,e2; @@
e1
-,
+;
e2
... when any
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Switching the DMAR and HPET MSI code to use the generic MSI domain ops
missed to add the flag which tells the core code to update the domain
operations with the defaults. As a consequence the core code crashes
when an interrupt in one of those domains is allocated.
Add the missing flags.
Fixes: 9006c133a4 ("x86/msi: Use generic MSI domain ops")
Reported-by: Qian Cai <cai@redhat.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/87wo0fli8b.fsf@nanos.tec.linutronix.de
- Unbreak the magic 'search the timer interrupt' logic in IO/APIC code
which got wreckaged when the core interrupt code made the state
tracking logic stricter. That caused the interrupt line to stay masked
after switching from IO/APIC to PIC delivery mode, which obviously
prevents interrupts from being delivered.
- Make run_on_irqstack_code() typesafe. The function argument is a void
pointer which is then casted to 'void (*fun)(void *). This breaks
Control Flow Integrity checking in clang. Use proper helper functions
for the three variants reuqired.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9wqnATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWEHD/402XehvgOXySy/KuhMgtSAoJ/OElvo
dnZYugCEh/6mllRbAhH0hkWpjxdhmCEYjWjf5Qqj01KYbQRSzYIhitmPyAI/Z2Dk
CqhvRj4Hko7AQGmF97mg/VLdwn6sDIslfprgo3o0pqZcuyDd0QqUgkKLgQpvGDXe
4l4OtiOoo2yHnr9wJiiMSVgHkXgp5QpTYAhnWV5ea9FokwO6/OLwjZAHgBh7MqNY
6hnUQIHL/v4K3aXdxOED55Irf05Yk4OqiXKgNMHNLqd0xf+DBTqlfzvQ+omZpMXY
sF0SML08r2f+KnBGuXlXDeu7MDRJlBO9Jxeq/mLcyLqP1K2nI2Ue3B6dy7AocKX3
p1aUu0TUtRpDNdhMZSHFon1cQfPc8pFZSVkdq0ke2CYd+HleIrklvQzSn7gGrQqJ
ZvWT9zs/XONXqxcMcSS1+XCQdJsRwIqbvY0Vxa68OIhZcKrsP7Ewln/sj46ghzOq
kTbxXyG58anK2ssalJqR148tsDlZBrBlo3/Bm3sITIoAtb8m+jeuqi1fZnvmfkAC
IiSEqY+p7CnNz9pYEA5T4GHDHE2luBA6YryclvCEAvi60XMZ+LW+T5szs6+fOjS+
y4RK2dGgcE0ewlvm0jzWKLvKMdi7iv5c9ndtE1W4qxy/K/uXKvfHe45SpmcFAyqx
FVjhT+kF/HKF1A==
=BcVu
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two fixes for the x86 interrupt code:
- Unbreak the magic 'search the timer interrupt' logic in IO/APIC
code which got wreckaged when the core interrupt code made the
state tracking logic stricter.
That caused the interrupt line to stay masked after switching from
IO/APIC to PIC delivery mode, which obviously prevents interrupts
from being delivered.
- Make run_on_irqstack_code() typesafe. The function argument is a
void pointer which is then cast to 'void (*fun)(void *).
This breaks Control Flow Integrity checking in clang. Use proper
helper functions for the three variants reuqired"
* tag 'x86-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioapic: Unbreak check_timer()
x86/irq: Make run_on_irqstack_cond() typesafe
- Reset the TI/DM timer before enabling it instead of doing it the other
way round.
- Initialize the reload value for the GX6605s timer correctly so the
hardware counter starts at 0 again after overrun.
- Make error return value negative in the h8300 timer init function
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl9wqQ0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZx0EACJUIlCC54kw4CnZdxhoWu0f6tXEuip
+Iyb8OJw56FdyHigvkPBMoF1o4a0Ax32TbYYOKntpDy67vnqkO6DV1M/Mwt8IhfO
ey7h1t7e4y2vrXAfYN0oX1ZQAk9hkPGW5+wugEf6dbZZva7mm+jV0PfNP/yn7KWS
n9lUrLNlPJdndSIYwj9Cto5mMQBsM7/qM8MkBR84i8GxFP2rofh4C5bD8WTnXzHd
B8898riwkaaQmfq/Ch9Y79oMzpZXysAEYpZ3YExkQsEmi5YqZ8k6R8RD18mKQdFH
7Kqh/025j7oKk9fopOvPjZ9sIX22gGP8C+tdy3sipYDCY0wRVNu+SPXppwl0T9ML
JLX/D2pC20f/VUQ21yc8KgVt76g8QID4t+NV5/VdIHuxhei/4WN3hJxuI4w4Ivfn
YK8mB5TK+R4K8Ln+GFE0zh/wfpjJe84K7r4NmDJnClD8chTVhVZHOlv5qJBZzob8
Yd4fMFS0WufAj15ZMN55iLFEI30iubY5X1xaDD1sFrFJyO1VCj8ITH7mBtW9zW1a
a/8LQlB5yIjLNTGVZGTCcYfyQ7+MA1EmkutD7AnFN87Zwx6FtDYGEPZq/KI3dwrw
2qA7HTVBYoWQvSOQWt8inuXsbnqUQ2Hq2y8cIuieg333OGc1WQS6BZOeLdJWNGas
W0JztaeFr1S3ew==
=Htin
-----END PGP SIGNATURE-----
Merge tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"A set of clocksource/clockevents updates:
- Reset the TI/DM timer before enabling it instead of doing it the
other way round.
- Initialize the reload value for the GX6605s timer correctly so the
hardware counter starts at 0 again after overrun.
- Make error return value negative in the h8300 timer init function"
* tag 'timers-urgent-2020-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/timer-gx6605s: Fixup counter reload
clocksource/drivers/timer-ti-dm: Do reset before enable
clocksource/drivers/h8300_timer8: Fix wrong return value in h8300_8timer_init()
Pinned pages shouldn't be write-protected when fork() happens, because
follow up copy-on-write on these pages could cause the pinned pages to
be replaced by random newly allocated pages.
For huge PMDs, we split the huge pmd if pinning is detected. So that
future handling will be done by the PTE level (with our latest changes,
each of the small pages will be copied). We can achieve this by let
copy_huge_pmd() return -EAGAIN for pinned pages, so that we'll
fallthrough in copy_pmd_range() and finally land the next
copy_pte_range() call.
Huge PUDs will be even more special - so far it does not support
anonymous pages. But it can actually be done the same as the huge PMDs
even if the split huge PUDs means to erase the PUD entries. It'll
guarantee the follow up fault ins will remap the same pages in either
parent/child later.
This might not be the most efficient way, but it should be easy and
clean enough. It should be fine, since we're tackling with a very rare
case just to make sure userspaces that pinned some thps will still work
even without MADV_DONTFORK and after they fork()ed.
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This allows copy_pte_range() to do early cow if the pages were pinned on
the source mm.
Currently we don't have an accurate way to know whether a page is pinned
or not. The only thing we have is page_maybe_dma_pinned(). However
that's good enough for now. Especially, with the newly added
mm->has_pinned flag to make sure we won't affect processes that never
pinned any pages.
It would be easier if we can do GFP_KERNEL allocation within
copy_one_pte(). Unluckily, we can't because we're with the page table
locks held for both the parent and child processes. So the page
allocation needs to be done outside copy_one_pte().
Some trick is there in copy_present_pte(), majorly the wrprotect trick
to block concurrent fast-gup. Comments in the function should explain
better in place.
Oleg Nesterov reported a (probably harmless) bug during review that we
didn't reset entry.val properly in copy_pte_range() so that potentially
there's chance to call add_swap_count_continuation() multiple times on
the same swp entry. However that should be harmless since even if it
happens, the same function (add_swap_count_continuation()) will return
directly noticing that there're enough space for the swp counter. So
instead of a standalone stable patch, it is touched up in this patch
directly.
Link: https://lore.kernel.org/lkml/20200914143829.GA1424636@nvidia.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>