Tuong Lien says:
====================
tipc: improve TIPC unicast link throughput
The series introduces an algorithm to improve TIPC throughput especially
in terms of packet loss, also tries to reduce packet duplication due to
overactive NACK sending mechanism.
The link failover situation is also covered by the patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 0ae955e2656d ("tipc: improve TIPC throughput by Gap ACK
blocks"), we enhance the link transmq by releasing as many packets as
possible with the multi-ACKs from peer node. This also means the queue
is now non-linear and the peer link deferdq becomes vital.
Whereas, in the case of link failover, all messages in the link transmq
need to be transmitted as tunnel messages in such a way that message
sequentiality and cardinality per sender is preserved. This requires us
to maintain the link deferdq somehow, so that when the tunnel messages
arrive, the inner user messages along with the ones in the deferdq will
be delivered to upper layer correctly.
The commit accomplishes this by defining a new queue in the TIPC link
structure to hold the old link deferdq when link failover happens and
process it upon receipt of tunnel messages.
Also, in the case of link syncing, the link deferdq will not be purged
to avoid unnecessary retransmissions that in the worst case will fail
because the packets might have been freed on the sending side.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
For unicast transmission, the current NACK sending althorithm is over-
active that forces the sending side to retransmit a packet that is not
really lost but just arrived at the receiving side with some delay, or
even retransmit same packets that have already been retransmitted
before. As a result, many duplicates are observed also under normal
condition, ie. without packet loss.
One example case is: node1 transmits 1 2 3 4 10 5 6 7 8 9, when node2
receives packet #10, it puts into the deferdq. When the packet #5 comes
it sends NACK with gap [6 - 9]. However, shortly after that, when
packet #6 arrives, it pulls out packet #10 from the deferfq, but it is
still out of order, so it makes another NACK with gap [7 - 9] and so on
... Finally, node1 has to retransmit the packets 5 6 7 8 9 a number of
times, but in fact all the packets are not lost at all, so duplicates!
This commit reduces duplicates by changing the condition to send NACK,
also restricting the retransmissions on individual packets via a timer
of about 1ms. However, it also needs to say that too tricky condition
for NACKs or too long timeout value for retransmissions will result in
performance reducing! The criterias in this commit are found to be
effective for both the requirements to reduce duplicates but not affect
performance.
The tipc_link_rcv() is also improved to only dequeue skb from the link
deferdq if it is expected (ie. its seqno <= rcv_nxt).
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
During unicast link transmission, it's observed very often that because
of one or a few lost/dis-ordered packets, the sending side will fastly
reach the send window limit and must wait for the packets to be arrived
at the receiving side or in the worst case, a retransmission must be
done first. The sending side cannot release a lot of subsequent packets
in its transmq even though all of them might have already been received
by the receiving side.
That is, one or two packets dis-ordered/lost and dozens of packets have
to wait, this obviously reduces the overall throughput!
This commit introduces an algorithm to overcome this by using "Gap ACK
blocks". Basically, a Gap ACK block will consist of <ack, gap> numbers
that describes the link deferdq where packets have been got by the
receiving side but with gaps, for example:
link deferdq: [1 2 3 4 10 11 13 14 15 20]
--> Gap ACK blocks: <4, 5>, <11, 1>, <15, 4>, <20, 0>
The Gap ACK blocks will be sent to the sending side along with the
traditional ACK or NACK message. Immediately when receiving the message
the sending side will now not only release from its transmq the packets
ack-ed by the ACK but also by the Gap ACK blocks! So, more packets can
be enqueued and transmitted.
In addition, the sending side can now do "multi-retransmissions"
according to the Gaps reported in the Gap ACK blocks.
The new algorithm as verified helps greatly improve the TIPC throughput
especially under packet loss condition.
So far, a maximum of 32 blocks is quite enough without any "Too few Gap
ACK blocks" reports with a 5.0% packet loss rate, however this number
can be increased in the furture if needed.
Also, the patch is backward compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
I dropped the ball a bit here: these patches should all probably have
been part of rc2, but I wanted to get around to properly testing them in
the various configurations (qemu32, qeum64, unleashed) first.
Unfortunately I've been traveling and didn't have time to actually do
that, but since these fix concrete bugs and pass my old set of tests I
don't want to delay the fixes any longer.
There are four independent fixes here:
* A fix for the rv32 port that corrects the 64-bit user accesor's fixup
label address.
* A fix for a regression introduced during the merge window that broke
medlow configurations at run time. This patch also includes a fix
that disables ftrace for the same set of functions, which was found by
inspection at the same time.
* A modification of the memory map to avoid overlapping the FIXMAP and
VMALLOC regions on systems with small memory maps.
* A fix to the module handling code to use the correct syntax for
probing Kconfig entries.
These have passed my standard test flow, but I didn't have time to
expand that like I said I would.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlymeX0THHBhbG1lckBk
YWJiZWx0LmNvbQAKCRDvTKFQLMurQXKFD/9rncH6fbNjb7izocCsDXNaDHWrixOX
/qYaDTheLe45txZJytZwWcgpEHlk/W+SM6mbdk6oJ10bG4a/FVv7PltBtoJvkRxt
5/iX0xwE73Rug5Z2FkqzGV6qNK8xD+2CfMgeYQYCOmaOgJuCjzj0x4Buh3EBHUY+
nxueWg7/ITzWCGSzH0bYw0S9cDFz8qUhmHiHtDz6y98JPoibnz6TQ/Ie4idxfSs8
nvV5cN1t8N8UpNAtVyVvTuWzf4E+LT02LIbD59ShOcGJ9I0hzPmNzGvVFVL6sxRs
XA8lzwlXYSbGRFtcoqEjKws9nG1Jbx9TucdrQvSqNlZ+invjpY5KiJyucF9ODV+G
LQosPqa4vw+hwSD958y1v2APNCLcvVx42YSHgPNabQ33/1gQiISveoQBqLUFk19s
dsaNCrEv37EeSzNXjM3ZCjLA+udRRB4GU4lBEQSDsnDp6472cCgjpEO8mhKY7Qbq
jIAHoVg1oXXrqqDXJoq0juiYoMMthRLtSTm+daryh5NGB8qoU3XuHc6MspJI+Shy
tpjglSr8+xyyviDY8KnjHPmeTdUQKPhUf6/0fwMM3gX8AYdLRhXUiH/Cd2LehqFQ
3LqvKBhfKS5eqBNQv7Dy+MlsiifVY3gDf0lsBC6jifxF/CmmRaWCauHgNUD9AHeH
igftN7X07gwpXQ==
=JIdq
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V fixes from Palmer Dabbelt:
"I dropped the ball a bit here: these patches should all probably have
been part of rc2, but I wanted to get around to properly testing them
in the various configurations (qemu32, qeum64, unleashed) first.
Unfortunately I've been traveling and didn't have time to actually do
that, but since these fix concrete bugs and pass my old set of tests I
don't want to delay the fixes any longer.
There are four independent fixes here:
- A fix for the rv32 port that corrects the 64-bit user accesor's
fixup label address.
- A fix for a regression introduced during the merge window that
broke medlow configurations at run time. This patch also includes a
fix that disables ftrace for the same set of functions, which was
found by inspection at the same time.
- A modification of the memory map to avoid overlapping the FIXMAP
and VMALLOC regions on systems with small memory maps.
- A fix to the module handling code to use the correct syntax for
probing Kconfig entries.
These have passed my standard test flow, but I didn't have time to
expand that testing like I said I would"
* tag 'riscv-for-linus-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: Use IS_ENABLED(CONFIG_CMODEL_MEDLOW)
RISC-V: Fix FIXMAP_TOP to avoid overlap with VMALLOC area
RISC-V: Always compile mm/init.c with cmodel=medany and notrace
riscv: fix accessing 8-byte variable from RV32
Heiner Kallweit says:
====================
net: phy: use generic PHY ability readers if callback get_features isn't set
Meanwhile we have generic functions for reading the abilities of
Clause 22 / 45 PHY's. This allows to use them as fallback in case
callback get_features isn't set. Benefit is the reduction of
boilerplate code in PHY drivers.
v2:
- adjust a comment in patch 1 to match the code
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that phylib uses genphy_read_abilities() as fallback, we don't have
to set callback get_features any longer.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Meanwhile we have generic functions for reading the abilities of
Clause 22 / 45 PHY's. This allows to use them as fallback in case
callback get_features isn't set. Benefit is the reduction of
boilerplate code in PHY drivers.
v2:
- adjust the comment in phy_driver_register to match the code
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the mcast conversion to rhashtable this function has been unused, so
remove it.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to be careful and always zero the whole br_ip struct when it is
used for matching since the rhashtable change. This patch fixes all the
places which didn't properly clear it which in turn might've caused
mismatches.
Thanks for the great bug report with reproducing steps and bisection.
Steps to reproduce (from the bug report):
ip link add br0 type bridge mcast_querier 1
ip link set br0 up
ip link add v2 type veth peer name v3
ip link set v2 master br0
ip link set v2 up
ip link set v3 up
ip addr add 3.0.0.2/24 dev v3
ip netns add test
ip link add v1 type veth peer name v1 netns test
ip link set v1 master br0
ip link set v1 up
ip -n test link set v1 up
ip -n test addr add 3.0.0.1/24 dev v1
# Multicast receiver
ip netns exec test socat
UDP4-RECVFROM:5588,ip-add-membership=224.224.224.224:3.0.0.1,fork -
# Multicast sender
echo hello | nc -u -s 3.0.0.2 224.224.224.224 5588
Reported-by: liam.mcbirnie@boeing.com
Fixes: 19e3a9c90c ("net: bridge: convert multicast to generic rhashtable")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Make intel_pstate only load on Intel processors and prevent it
from printing pointless failure messages (Borislav Petkov).
- Update the turbostat utility:
* Assorted fixes (Ben Hutchings, Len Brown, Prarit Bhargava).
* Support for AMD Fam 17h (Zen) RAPL and package power (Calvin
Walton).
* Support for Intel Icelake and for systems with more than one
die per package (Len Brown).
* Cleanups (Len Brown).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcpmUKAAoJEILEb/54YlRxokUP/REDJUS8QMDe+jOOVySFj+LR
+7Pmsigempfh3wjL43yWN23f2IUEJXnW27NNj3O5JpXLng6JRqwoBLcwhba2Rzu+
n6lIWcirLRy72UdIK5oU3XP5k3FrYTuJJY+5vH7UjIWJ7bTZ+tACyG/MCNEDcTsD
BCsEEZ/MGdxpH8hQgj61CjYiOPMhPYujFhlnu2T91vcXQ/12FUXMR17fKWM+omAo
2TzoApQW+l2KYt7O/x4l69MDgEDY+3Pi40HSwQFGOIcY81ig1WX/DU9tKqoSa9SD
KwJB/FkSenWP+sQgPZSUODoz9waugMDJDXouxNqxlBnqu8hDb3T5OT9s12nvJvSo
NWRQ0XKuU3UxiOhTvvv09Ks16l8aY9ahe9JrRGFWN6PQFw6GcrlEspPGS31p7AqF
I/j9ww8POae7/s1/MFgSWD52KxTAnQ9f4kYP6jtToCwrPPuy5fGq2zQOfgH6DwDm
SkprpWKYkns1IjjamDTv+o609cgEK+oX3TYCxvDcT0fUfuNQxwerw4FUC0QG4E8B
H44u+yROGBmexBDT1dkD8LYoH7lYzki1BImkxLGK1EKhhl9yvt+OPTw55yXNUfTI
haumsRujr9pymrna9fujzlxSU4q1YSXSb4YPmGxQSQ3twZqW8d2tR2pfT5QQ/S+t
62bxDS4a3ui7dMEfLSG1
=RZRM
-----END PGP SIGNATURE-----
Merge tag 'pm-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix up the intel_pstate driver after recent changes to prevent
it from printing pointless messages and update the turbostat utility
(mostly fixes and new hardware support).
Specifics:
- Make intel_pstate only load on Intel processors and prevent it from
printing pointless failure messages (Borislav Petkov).
- Update the turbostat utility:
* Assorted fixes (Ben Hutchings, Len Brown, Prarit Bhargava).
* Support for AMD Fam 17h (Zen) RAPL and package power (Calvin
Walton).
* Support for Intel Icelake and for systems with more than one die
per package (Len Brown).
* Cleanups (Len Brown)"
* tag 'pm-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/intel_pstate: Load only on Intel hardware
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
Prevent stale GPE events from triggering spurious system wakeups from
suspend-to-idle (Furquan Shaikh).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcpmWiAAoJEILEb/54YlRxqHIP/22TfLZTQoPOd3+6K12YDETU
OnV4+xX1k8/tVU5Xz4wTDnUYSQpZ/A9VBIp7EFkth32H71vxn080FaJ6GubdM1W1
iBLVJKjtR7ypdmbnDSCRyDePFBNxZ1hnXd53JtChetrPFm3cNk2Tl2CZfd1ms3Va
6OvVw+CrwFOIulgLv9AsG7N4Jr0RFsjuAmUAAYre1ZRHiruJLCTvcKTMTzKqUFuX
0p8MIe0N77ByjdcFdZOPe7YQrCZPqRHUn3Rr7Rep//U9esUWKmvQs7al/uCrXXTu
iqw18okpqpe6w/BTShUziwGrdj0JP82UU5me8AalOrWJ7pddDphPuN8kjuHwTqy1
nPndWYlZ536az+DJRiiHeVVs3ANG35wB/WYzI7cHVVn0ujpHUTiml+JM/OZYi8eW
dECmTGJopWveC6PSCBDvUnrFM5v8z1k+CIoVu+Fo6SnWuksJeRBkl+rzl8nMMQhw
+zhM01T+R7/Yi2wyoBoRSdgdRhI1xSwUJV88kUJqXLSZ8a5xNCXOep7PJKYuv5hA
lVSO2o6+sWuz/N3PkqfZUu25lWos9+3sMzUCZ4exavQnwu/udG6w/LjW1j0bxH8L
plaTYdeHTFl7WFvKptD8G1UAJ85C241EGIA73DxiNjwmrw039zLX1FBs46QNKB6L
vXa+BMxQhrwUNoGAD3hZ
=XLS7
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Prevent stale GPE events from triggering spurious system wakeups from
suspend-to-idle (Furquan Shaikh)"
* tag 'acpi-5.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPICA: Clear status of GPEs before enabling them
This reverts commit 6578229d4e.
netif_receive_skb_list() doesn't support GRO, therefore we may have
scenarios with decreased performance. See discussion here [0].
[0] https://marc.info/?t=155403847400001&r=1&w=2
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
and GVT's fixes including vGPU display plane size calculation,
shadow mm pin count, error recovery path for workload create
and one kerneldoc fix.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcpiyNAAoJEPpiX2QO6xPKgUEIAITZrpMNGN+n/agRByVgkTJC
oR+7hon2nctO42kMrrc/oD6ddPp7WE88dlu7J5w3+25lL5xVM25I9mLmUzlcD1P3
nDCRIw5UCoQUhO+5Hk8+p38mPQdkv8bNUUsA1JgXgDPtJ7fxxkh6HbwV9+JkG8zd
f+/o2gZZVOd0EDXZqVCqFPT354LqwlX7OUBn8J2hLorL6y/2YbBNjTYGbe7+G18q
c0fOZafTrgGqIKWUviBgsNx0FHaG0vhLBO/uqoO67yDUgBnViwujOpbkz+OhoQRS
Qaf/62M/tC/218+CnftxdCcy45M4dTmhd2qKyjDVHPZB3NUcagaaotn8LXBbeUA=
=2Q2R
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-fixes-2019-04-04' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
Only one fix for DSC (backoff after drm_modeset_lock deadlock)
and GVT's fixes including vGPU display plane size calculation,
shadow mm pin count, error recovery path for workload create
and one kerneldoc fix.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190404161116.GA14522@intel.com
Linux currently disable ECN for incoming connections when the SYN
requests ECN and the IP header has ECT(0)/ECT(1) set, as some
networks were reportedly mangling the ToS byte, hence could later
trigger false congestion notifications.
RFC8311 §4.3 relaxes RFC3168's requirements such that ECT can be set
one TCP control packets (including SYNs). The main benefit of this
is the decreased probability of losing a SYN in a congested
ECN-capable network (i.e., it avoids the initial 1s timeout).
Additionally, this allows the development of newer TCP extensions,
such as AccECN.
This patch relaxes the previous check, by enabling ECN on incoming
connections using SYN+ECT if at least one bit of the reserved flags
of the TCP header is set. Such bit would indicate that the sender of
the SYN is using a newer TCP feature than what the host implements,
such as AccECN, and is thus implementing RFC8311. This enables
end-hosts not supporting such extensions to still negociate ECN, and
to have some of the benefits of using ECN on control packets.
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Suggested-by: Bob Briscoe <research@bobbriscoe.net>
Cc: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Fix failed reads due to enabled IRQs when suspended; twl-core
- Fix driver registration when using DT; sprd-sc27xx-spi
- Fix `make allyesconfig` on x86_64; SUN6I_PRCM
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEdrbJNaO+IJqU8IdIUa+KL4f8d2EFAlylz7oACgkQUa+KL4f8
d2Hb9A/7ByIAS5N/N6vp0g3PZLHnIikb8hOeva6SCZHzM5wTlw5F2x+evn6tmVaG
euXyek3EvHGCP8S92Ccj6aF01xEmA7xbCFzt22lhzmd6jplkyJhS84nt7S1E7spi
FcZRWf2pTMyz4t5tXf4weqLnSBnianBmXBZr7xSMvEMcaRXMKn3/SiasMnmLjE83
Uyv522tSa8s4sASOp0yGKGuPmxJHR1wbb41SIwSQnPfBohVEqBamaO6hMIxvRqwz
Dl+ns6n1mXPvSAMS4WTmi7YUHwZTOdCwGMBhX0a2z66dqKgHBeJajCB9r+aCmnny
WON0y9axqUdcal6XZcLfgPQQiNvKs2pAzBmNZaZEVjtyHDCM5vuFL0hmqnFrRqM7
i1/SbVT6SIiNJAEoi4LrssDPRKwpR9Z4OvAWEGVsCWtWREMfya30MP2n5048FnXc
88j+kuTumjgH0TTKnU7ls+Oqok3e6hAR/waEvgxG2IiT9T1d3u8UwqrWrCYA2y2i
9k1qO9D2989wTs+9YmIk4/94SIgJvLvAOh5USEhfuAEoHHSO1kKdiW/8EWei68u2
XfpX3kZQYny6q39Ia2dhTUaFpa7g6g142dzCK94M/iM/73UHszp5qpE50XvbFv6W
6rAGJbk9Wo9hVWOT+tZwxTqIVEGCbkBOx9R3gbF1V2FowTBzmJM=
=Sa3v
-----END PGP SIGNATURE-----
Merge tag 'mfd-fixes-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull mfd fixes from Lee Jones:
- Fix failed reads due to enabled IRQs when suspended; twl-core
- Fix driver registration when using DT; sprd-sc27xx-spi
- Fix `make allyesconfig` on x86_64; SUN6I_PRCM
* tag 'mfd-fixes-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: sun6i-prcm: Allow to compile with COMPILE_TEST
mfd: sc27xx: Use SoC compatible string for PMIC devices
mfd: twl-core: Disable IRQ while suspended
Jiri Pirko says:
====================
net: extend devlink port attrs with switch ID
Extend devlink port attrs to contain switch ID and change drivers that
register devlink ports to use that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if the driver registers devlink port instance, he should set
the devlink port attributes as well. Then the devlink core is able to
obtain switch id itself, no need for driver to implement the ndo.
Once all drivers will implement devlink port registration, this ndo
should be removed. This warning guides new drivers to do things as
they should be done.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get(). Leave
ndo_get_port_parent_id implementation only for legacy.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Obtain HW id and pass it down to mlxsw_core_port_init() as it would be
used as switch_id in devlink and exposed to user.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove implementation of get_port_parent_id ndo and rely on core calling
into devlink for the information directly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove implementation of get_port_parent_id ndo and rely on core calling
into devlink for the information directly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the switch_id is being only initialized when switching eswitch
mode from "legacy" to "switchdev". However, nothing prevents the id to
be initialized from the very beginning. Physical ports can show it even
in "legacy" mode.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove implementation of get_port_parent_id ndo and rely on core calling
into devlink for the information directly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass the switch ID down the to devlink through devlink_port_attrs_set()
so it can be used by devlink_compat_switch_id_get().
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce devlink_compat_switch_id_get() helper which fills up switch_id
according to passed netdev pointer. Call it directly from
dev_get_port_parent_id() as a fallback when ndo_get_port_parent_id
is not defined for given netdev.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend devlink_port_attrs_set() to pass switch ID for ports which are
part of switch and store it in port attrs. For other ports, this is
NULL.
Note that this allows the driver to group devlink ports into one or more
switches according to the actual topology.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to save space in the struct, convert bools to bits.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BITS_TO_LONGS() uses DIV_ROUND_UP() because of
this ppmax value can be greater than available
per cpu page pods.
This patch removes BITS_TO_LONGS() to fix this
issue.
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can optimize the fdb convergence when a backup_port is present by not
immediately flushing the entries of the stopped port since traffic for
those entries will flow towards the backup_port.
There are 2 cases specifically that benefit most:
- when the stopped port comes up before the entries expire by themselves
- when there's an external entry refresh and they're kept while the
backup_port is operating (e.g. mlag)
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/ibm/ibmvnic.c: In function '__ibmvnic_reset':
drivers/net/ethernet/ibm/ibmvnic.c:1971:21: warning: variable 'netdev' set but not used [-Wunused-but-set-variable]
It's never used since introduction in
commit ed651a1087 ("ibmvnic: Updated reset handling")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/ibm/ehea/ehea_qmr.c: In function 'ehea_create_cq':
drivers/net/ethernet/ibm/ehea/ehea_qmr.c:127:7: warning: variable 'cq_handle_ref' set but not used [-Wunused-but-set-variable]
drivers/net/ethernet/ibm/ehea/ehea_qmr.c:126:15: warning: variable 'epa' set but not used [-Wunused-but-set-variable]
They are never used since commit
7a29108322 ("[PATCH] ehea: IBM eHEA Ethernet Device Driver")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/pasemi/pasemi_mac.c: In function 'pasemi_mac_queue_csdesc':
drivers/net/ethernet/pasemi/pasemi_mac.c:1358:29: warning: variable 'cpyhdr' set but not used [-Wunused-but-set-variable]
It's never used since commit 8d636d8bc5 ("pasemi_mac: jumbo
frame support") and can be removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pr->tx_bytes should be assigned to tx_bytes other than
rx_bytes.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: ce45b87302 ("ehea: Fixing statistics")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot reports:
BUG: using __this_cpu_read() in preemptible code:
caller is dev_recursion_level include/linux/netdevice.h:3052 [inline]
__this_cpu_preempt_check+0x246/0x270 lib/smp_processor_id.c:47
dev_recursion_level include/linux/netdevice.h:3052 [inline]
ip6_skb_dst_mtu include/net/ip6_route.h:245 [inline]
I erronously downgraded a this_cpu_read to __this_cpu_read when
moving dev_recursion_level() around.
Reported-by: syzbot+51471b4aae195285a4a3@syzkaller.appspotmail.com
Fixes: 97cdcf37b5 ("net: place xmit recursion in softnet data")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb somehow dequeued out of inputq before processing, it causes to
NULL pointer and kernel crashed.
Add checking skb valid before using.
Fixes: c55c8edafa ("tipc: smooth change between replicast and broadcast")
Reported-by: Tuong Lien Tong <tuong.t.lien@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent changes to TC flower remove the requirement for rtnl lock when
accessing and modifying filters. Refcounts now ensure access and deletion
do not happen concurrently. However, the reoffload function which cycles
through all filters and replays them to registered hw drivers is not
protected.
Use the fl_get_next_filter() function to cycle the filters for reoffload
and ensure the ref taken by this function is put when done with each
filter.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Way back in 3c9c36bced the
ndo_fcoe_get_wwn pointer was switched from depending on CONFIG_FCOE to
CONFIG_LIBFCOE in order to allow building FCoE support into the bnx2x
driver and used by bnx2fc without including the generic software fcoe
module.
But, FCoE is generally used over an 802.1q VLAN, and the implementation
of ndo_fcoe_get_wwn in the 8021q module was not similarly changed. The
result is that if CONFIG_FCOE is disabled, then bnz2fc cannot make a
call to ndo_fcoe_get_wwn through the 8021q interface to the underlying
bnx2x interface. The bnx2fc driver then falls back to a potentially
different mapping of Ethernet MAC to Fibre Channel WWN, creating an
incompatibility with the fabric and target configurations when compared
to the WWNs used by pre-boot firmware and differently-configured
kernels.
So make the conditional inclusion of FCoE code in 8021q match the
conditional inclusion in netdevice.h
Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function do_write_buffer(), in the for loop, there is a case
chip_ready() returns 1 while chip_good() returns 0, so it never
break the loop.
To fix this, chip_good() is enough and it should timeout if it stay
bad for a while.
Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Yi Huaijie <yihuaijie@huawei.com>
Signed-off-by: Liu Jian <liujian56@huawei.com>
Reviewed-by: Tokunori Ikegami <ikegami_to@yahoo.co.jp>
Signed-off-by: Richard Weinberger <richard@nod.at>
Daniel Borkmann says:
====================
pull-request: bpf 2019-04-04
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Batch of fixes to the existing BPF flow dissector API to support
calling BPF programs from the eth_get_headlen context (support for
latter is planned to be added in bpf-next), from Stanislav.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
* pm-tools:
tools/power turbostat: update version number
tools/power turbostat: Warn on bad ACPI LPIT data
tools/power turbostat: Add checks for failure of fgets() and fscanf()
tools/power turbostat: Also read package power on AMD F17h (Zen)
tools/power turbostat: Add support for AMD Fam 17h (Zen) RAPL
tools/power turbostat: Do not display an error on systems without a cpufreq driver
tools/power turbostat: Add Die column
tools/power turbostat: Add Icelake support
tools/power turbostat: Cleanup CNL-specific code
tools/power turbostat: Cleanup CC3-skip code
tools/power turbostat: Restore ability to execute in topology-order
Storage devices which report supporting discard commands like
WRITE_SAME_16 with unmap, but reject discard commands sent to the
storage device. This is a clear storage firmware bug but it doesn't
change the fact that should a program cause discards to be sent to a
multipath device layered on this buggy storage, all paths can end up
failed at the same time from the discards, causing possible I/O loss.
The first discard to a path will fail with Illegal Request, Invalid
field in cdb, e.g.:
kernel: sd 8:0:8:19: [sdfn] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
kernel: sd 8:0:8:19: [sdfn] tag#0 Sense Key : Illegal Request [current]
kernel: sd 8:0:8:19: [sdfn] tag#0 Add. Sense: Invalid field in cdb
kernel: sd 8:0:8:19: [sdfn] tag#0 CDB: Write same(16) 93 08 00 00 00 00 00 a0 08 00 00 00 80 00 00 00
kernel: blk_update_request: critical target error, dev sdfn, sector 10487808
The SCSI layer converts this to the BLK_STS_TARGET error number, the sd
device disables its support for discard on this path, and because of the
BLK_STS_TARGET error multipath fails the discard without failing any
path or retrying down a different path. But subsequent discards can
cause path failures. Any discards sent to the path which already failed
a discard ends up failing with EIO from blk_cloned_rq_check_limits with
an "over max size limit" error since the discard limit was set to 0 by
the sd driver for the path. As the error is EIO, this now fails the
path and multipath tries to send the discard down the next path. This
cycle continues as discards are sent until all paths fail.
Fix this by training DM core to disable DISCARD if the underlying
storage already did so.
Also, fix branching in dm_done() and clone_endio() to reflect the
mutually exclussive nature of the IO operations in question.
Cc: stable@vger.kernel.org
Reported-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
return_address returns the address that is one level higher in the call
stack than requested in its argument, because level 0 corresponds to its
caller's return address. Use requested level as the number of stack
frames to skip.
This fixes the address reported by might_sleep and friends.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Toke Høiland-Jørgensen says:
====================
sched: A few small fixes for sch_cake
Kevin noticed a few issues with the way CAKE reads the skb protocol and the IP
diffserv fields. This series fixes those two issues, and should probably go to
in 4.19 as well. However, the previous refactoring patch means they don't apply
as-is; I can send a follow-up directly to stable if that's OK with you?
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is not actually any guarantee that the IP headers are valid before we
access the DSCP bits of the packets. Fix this using the same approach taken
in sch_dsmark.
Reported-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>