Pull arch/tile fix from Chris Metcalf:
"This fix eliminates a "section mismatch" warning caused by the new
__ex_table checking code in modpost"
* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
modpost: work correctly with tile coldtext sections
trigger with bad params.
Thanks to Peter Zijlstra and Arthur Marsh for going around on this one.
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVnZaMAAoJENkgDmzRrbjxKScP/0R8TDs/Ci0BzdGcLD7rShe9
LfSqJlDDUD9AGCmdCGFTWG8c7VFwMcjijtvp06ztkPa7XYm4tb0uwFCCXzHgfHal
NPkJlfx7HTvHkqTBO6dtJkuzGqjTdZ+hgg/9On8eSmGpRIhrQf6TBmS8ZBukFL71
hDBlXQFXVe5sW+TGCMUwFfJDujfu9QBh467JX0HBwxnqXNbEc4DkwKnEakpQKc9d
x7lYL3fpY+I++a52dl8kfwYfOAyhZ2SNi56akl8Uvu4iJZ1dq6z8KYf6I0VcvuS/
F8Zhsr1a+gQHCz6pbiMGH5qCwjVz/Fv5gwdztWF7LngZ0RmUkpEIyYbfSN7EphBa
BnUcCQ8lttoncwpqsVBdHR88HmMxU2wUVsuvewhn91YyYU4bnrpElWPN9wHjbCzM
1rHQURJJHAoRnU+BLy0KaI3dz6NrRKU5l6+k7YNTuQXZzcL+L6Wpn6d5lLperNRW
IjTGeLEBVjIcOt0zgi3MUQFvylhLX3bCML3jTIZBd17VZLXm3ESknXmyjCkW1PES
CV3xixbqAelvrEePr2uE+vu4+E6GwtS05J4Cuo552WKj/7E7rKhj2E55Jrpk7EkJ
YcXp2bNkkAiE0dKAe/TTQH8Acxb+V5JHlTcN7sMT/kMOH9vEPx99yRbcB58CXvqv
YvSS6lQbYENR+dy1HIFr
=CdQ7
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module fix from Rusty Russell:
"Single fix: missing rbtree removal in the module load failure path.
Easy to trigger with bad params.
Thanks to Peter Zijlstra and Arthur Marsh for going around on this
one"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: Fix load_module() error path
Currently "ALU_END_FROM_BE 32" and "ALU_END_FROM_LE 32" do not test if
the upper bits of the result are zeros (the arm64 JIT had such bugs).
Extend the two tests to catch this.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch avoids the double up_write to filter_sem if
efx_net_open() fails.
Resolves: 2d432f20d2
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hariprasad Shenai says:
====================
Cleanup, T6 changes and register range update
This patch series adds the following:
Don't use entire L2T table, update register ranges for T6 adapter,
read stats for only available channels for T6 and enable cim_la dump for
T6 adapter also.
This patch series has been created against net-next tree and includes
patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Updating the driver to read the stats of only available channels. T6 and
later has only 2 channels
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver was retrieving the parameters for the bounds of its
slice of the L2T from the firmware and then throwing those away and
using the entire table. This corrects that problem.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit b0e9a30dd6 ("bridge: Add vlan id to multicast groups")
there's a check in br_ip_equal() for a matching vlan id, but the mdb
functions were not modified to use (or at least zero it) so when an
entry was added it would have a garbage vlan id (from the local br_ip
variable in __br_mdb_add/del) and this would prevent it from being
matched and also deleted. So zero out the whole local ip var to protect
ourselves from future changes and also to fix the current bug, since
there's no vlan id support in the mdb uapi - use always vlan id 0.
Example before patch:
root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
RTNETLINK answers: Invalid argument
After patch:
root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent
root@debian:~# bridge mdb
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Fixes: b0e9a30dd6 ("bridge: Add vlan id to multicast groups")
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling connect() with an AF_TIPC socket would trigger a series
of error messages from SELinux along the lines of:
SELinux: Invalid class 0
type=AVC msg=audit(1434126658.487:34500): avc: denied { <unprintable> }
for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0
tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable>
permissive=0
This was due to a failure to initialize the security state of the new
connection sock by the tipc code, leaving it with junk in the security
class field and an unlabeled secid. Add a call to security_sk_clone()
to inherit the security state from the parent socket.
Reported-by: Tim Shearer <tim.shearer@overturenetworks.com>
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Paul Moore <paul@paul-moore.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Shradha Shah says:
====================
sfc: compat for lack of VADAPTOR_SET_MAC in adaptor_firmware <= 4.1.1.1023
This patch series resolves an incompatibility with legacy
firmware due to the lack of MC_CMD_VADAPTOR_SET_MAC in
adaptor_firmware <= 4.1.1.1023
Unless this patch series is applied there will be a compatibility
issue between the driver and Solarflare adapters running older
firmware.
Tested with and without CONFIG_SFC_SRIOV
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some versions of MCFW do not support the MC_CMD_VADAPTOR_SET_MAC
command, and ENOSYS will be returned.
If the PF created its own vport, the function's datapath must be
stopped and the vport can be reconfigured to reflect the new MAC
address.
If the MCFW created the vport for the PF (which is the case when
the nic_data->vport_mac is blank), nothing further needs to be
done as the vport is not under the control of the PF.
This only applies to PFs because the MCFW in question does not
support VFs.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Re-organize the structure of error handling to avoid having
to duplicate the netif_err() around the ifdefs.
The only change to the behaviour of the error-handling is that
the PF's data structure to record VF details should only be
updated if the original command succeeded.
Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When "primary_reselect" is set to "failure", primary interface should
not become active until current active slave is down. But if we set first
member of bond device as a "primary" interface and "primary_reselect"
is set to "failure" then whenever primary interface's link get back(up)
it become active slave even if current active slave is still up.
With this patch, "bond_find_best_slave" will not traverse members if
primary interface is not candidate for failover/reselection and current
active slave is still up.
Signed-off-by: Mazhar Rana <mazhar.rana@cyberoam.com>
Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use module_pci_driver for drivers whose init and exit functions
only register and unregister, respectively.
A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:
@a@
identifier f, x;
@@
-static f(...) { return pci_register_driver(&x); }
@b depends on a@
identifier e, a.x;
@@
-static e(...) { pci_unregister_driver(&x); }
@c depends on a && b@
identifier a.f;
declarer name module_init;
@@
-module_init(f);
@d depends on a && b && c@
identifier b.e, a.x;
declarer name module_exit;
declarer name module_pci_driver;
@@
-module_exit(e);
+module_pci_driver(x);
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frag needed should be sent only if the inner header asked
to not fragment. Currently fragmentation is broken if the
tunnel has df set, but df was not asked in the original
packet. The tunnel's df needs to be still checked to update
internally the pmtu cache.
Commit 23a3647bc4 broke it, and this commit fixes
the ipv4 df check back to the way it was.
Fixes: 23a3647bc4 ("ip_tunnels: Use skb-len to PMTU check.")
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Gunthorpe reported that since commit c02db8c629 ("rtnetlink: make
SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
anymore with respect to their policy, that is, ifla_vfinfo_policy[].
Before, they were part of ifla_policy[], but they have been nested since
placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
which is another nested attribute for the actual VF attributes such as
IFLA_VF_MAC, IFLA_VF_VLAN, etc.
Despite the policy being split out from ifla_policy[] in this commit,
it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
testing for struct nlattr, but it doesn't know about the data context and
their requirements.
Fix, on top of Jason's initial work, does 1) parsing of the attributes
with the right policy, and 2) using the resulting parsed attribute table
from 1) instead of the nla_for_each_nested() loop (just like we used to
do when still part of ifla_policy[]).
Reference: http://thread.gmane.org/gmane.linux.network/368913
Fixes: c02db8c629 ("rtnetlink: make SR-IOV VF interface symmetric")
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Cc: Greg Rose <gregory.v.rose@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Rony Efraim <ronye@mellanox.com>
Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When packet encapsulation is in use, the MTU needs to be reduced for
headroom reservation.
The existing code takes the updated MTU value only from the host side.
But vSwitch extensions, such as Open vSwitch, require the flexibility
to change the MTU to different values from within a guest during the
lifecycle of a vNIC, when the encapsulation protocol is changed. The
patch supports this kind of MTU changes.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a switch is attached to the mdio bus, the mdio bus can be used
while the interface is not open. If the IPG clock is not enabled, MDIO
reads/writes will simply time out.
Add support for runtime PM to control this clock. Enable/disable this
clock using runtime PM, with open()/close() and mdio read()/write()
function triggering runtime PM operations. Since PM is optional, the
IPG clock is enabled at probe and is no longer modified by
fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is
guaranteed the clock is running when MDIO operations are performed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running a kernel configured with CONFIG_DMA_API_DEBUG=y a warning
is issued:
DMA-API: device driver tries to sync DMA memory it has not allocated
This warning is the result of mapping the full range of the Rx buffer
pages allocated and then performing a dma_sync_single_for_cpu against
a calculated DMA address. The proper thing to do is to use the
dma_sync_single_range_for_cpu with a base DMA address and an offset.
Reported-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tilegx and tilepro compilers use .coldtext for their unlikely
executed text section name, so an __attribute__((cold)) function
will (when compiled with higher optimization levels) land in
the .coldtext section.
Modify modpost to add .coldtext to the set of OTHER_TEXT_SECTIONS
so we don't get warnings about referencing such a section in an
__ex_table block, and then also modify arch/tile/lib/memcpy_user_64.c
so that it uses plain ".coldtext" instead of ".coldtext.memcpy".
The latter naming is a relic of an earlier use of -ffunction-sections,
which we no longer use by default.
Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
This reverts commit e1622baf54.
The side effect of this commit is to add a '@NONE' after each virtual
interface name with a 'ip link'. It may break existing scripts.
Reported-by: Olivier Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
User space can crash kernel with
ip link add ifb10 numtxqueues 100000 type ifb
We must replace a BUG_ON() by proper test and return -EINVAL for
crazy values.
Fixes: 60877a32bc ("net: allow large number of tx queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add extra check for total vfs for SRIOV to check if that value is
bigger than total vfs in pci SRIOV capabalities. Fix a check and
print of the number of maximum vfs that hw can handle. Fix a check
and print of the number of maximum vfs per port that driver can handle.
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The trace bpf samples do not compile on s390x because they use x86
specific fields from the "pt_regs" structure.
Fix this and access the fields via new PT_REGS macros.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If rhashtable_walk_next detects a resize operation in progress, it jumps
to the new table and continues walking that one. But it misses to drop
the reference to it's current item, leading it to continue traversing
the new table's bucket in which the current item is sorted into, and
after reaching that bucket's end continues traversing the new table's
second bucket instead of the first one, thereby potentially missing
items.
This fixes the rhashtable runtime test for me. Bug probably introduced
by Herbert Xu's patch eddee5ba ("rhashtable: Fix walker behaviour during
rehash") although not explicitly tested.
Fixes: eddee5ba ("rhashtable: Fix walker behaviour during rehash")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start the delete timer when adding temp static entries so they can expire.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: ccb1c31a7a ("bridge: add flags to distinguish permanent mdb entires")
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a DTS file for the MP2 Cortex-A53 Soft Macrocell Model implemented
on a LogicTile Express 20MG (V2F-1XV7) daughterboard. This is based on
the version that's currently available from the ARM DTS repository [1].
[1] git://linux-arm.org/arm-dts.git
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
The CCI device node was added to vexpress CA15_A7(i.e. TC2) much before
the CCI PMU support and binding was added. This patch adds the missing
PMU node so that CCI PMUs can be used on TC2.
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
The dts for the CoreTile Express A15x2 A7x3 (TC2) only describes the
PMUs of the Cortex-A15 CPUs, and not the Cortex-A7 CPUs.
Now that we have a mechanism for describing disparate PMUs and their
interrupts in device tree, this patch makes use of these to describe the
PMUs for all CPUs in the system. For consistency, the existing A15 PMU
interrupt-affinity property is reflowed across two lines.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Enable SG support for Zynq SOC family devices.
Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The load_module() error path frees a module but forgot to take it out
of the mod_tree, leaving a dangling entry in the tree, causing havoc.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reported-by: Arthur Marsh <arthur.marsh@internode.on.net>
Tested-by: Arthur Marsh <arthur.marsh@internode.on.net>
Fixes: 93c2e105f6 ("module: Optimize __module_address() using a latched RB-tree")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The PCIe host controller uses MSIs provided by GICv3 ITS. Enable it on
Thunder SoCs by adding an entry to DT.
Signed-off-by: Tirumalesh Chalamarla <tchalamarla@cavium.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
this moves to general APIs, and all drivers will be changed based
on it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJVd+OkAAoJEDIv4aC191RhuWkP/2LzJBRoBP1WJmTOvR0Kofl7
4PG5Xi/+1NtSHubvTSJJX2kZ9I9FHemQAJba8pqZ0pgZkb+7t0dA2+G1mDi8SOr8
VyWlMotI8Im0PoUHXmt5GPHEI4E6WqyX1TQFUD4JM/Ln07mevH/ocjd94LN79eO6
tc1z8AEw1/d5MCcEI34FP+vNXyLOE/9hM7IsmBtf4Niq7VrO8SWS5y1tJBOz+d2v
HmZZYEP/WS2mX8E77cz89pf0+BkmhF6nj1zoFCmlLVxCXrlKncs+Uvnsb+n/k1BI
gA4UNOfSoA+k+h+1F/7BhyEABqYkEuBDYs9m5NsypL+WITaVnI9pmNWNAA7IZNTg
dEYoVKqpJX3lVPWJ0i0hvgK1xXNMKlbwFb6iV80upvUVNfzZs9JveSL97Iiy9WjI
LlRdCTlteBvgsNBJyVG//788VU5+dr8HXV1UCfm+drs3H43TudCnJ8AhF3JaToO+
wv045McJTUL/CI+WQiXGNlAoXDjprgjLSiXRyEKF1xNsem3u78Z5bpjGX5crpzgC
HIdnsxb39jCHCNVJLuK/eCJ9Oocpo9LdKU0wSqIx0GNAx+xdoPo5H8jmlE4bnSWo
9f/9784A1SI/tpc0mKRz/z3FEKiP0i3CTJfcmC9EAd4LWFYk54WgtlzPztRkE625
YQMHjM+cavYWpzzgnQ+B
=Beg0
-----END PGP SIGNATURE-----
Merge tag 'sirf-iobrg2regmap-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux into fixes
Merge "CSR SiRFSoC rtc iobrg move to regmap for 4.2" from Barry Song:
move CSR rtc iobrg read/write API to be regmap
this moves to general APIs, and all drivers will be changed based
on it.
* tag 'sirf-iobrg2regmap-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux:
ARM: prima2: move to use REGMAP APIs for rtciobrg
There are two duplicated xenvif_zerocopy_callback() definitions.
Remove one of them.
Signed-off-by: Liang Li <liang.z.li@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On UML builds, mdio-mux-mmioreg.c fails to compile:
drivers/net/phy/mdio-mux-mmioreg.c:50:3: error: implicit declaration of function ‘ioremap’ [-Werror=implicit-function-declaration]
drivers/net/phy/mdio-mux-mmioreg.c:63:3: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
This is due to CONFIG_OF now being user selectable. Add a dependency on
HAS_IOMEM to fix this.
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds poweroff button device node to support poweroff feature
on APM X-Gene Mustang platform.
Signed-off-by: Y Vo <yvo@apm.com>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Since 1d5da757da (ax25: Stop using magic
neighbour cache operations.) any attempt to transmit IP packets over
a bpqether device will result in a message like "Dead loop on virtual
device bpq0, fix it urgently!"
Fix suggested by Eric W. Biederman <ebiederm@xmission.com>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: <stable@vger.kernel.org> # 4.1
Signed-off-by: David S. Miller <davem@davemloft.net>
rate estimators are limited to 4 Mpps, which was fine years ago, but
too small with current hardware generation.
Lets use 2^5 scaling instead of 2^10 to get 128 Mpps new limit.
On 64bit arch, use an "unsigned long" for temp storage and remove limit.
(We do not expect 32bit arches to be able to reach this point)
Tested:
tc -s -d filter sh dev eth0 parent ffff:
filter protocol ip pref 1 u32
filter protocol ip pref 1 u32 fh 800: ht divisor 1
filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15
match 07000000/ff000000 at 12
action order 1: gact action drop
random type none pass val 0
index 1 ref 1 bind 1 installed 166 sec
Action statistics:
Sent 39734251496 bytes 863788076 pkt (dropped 863788117, overlimits 0 requeues 0)
rate 4067Mbit 11053596pps backlog 0b 0p requeues 0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
net_sched: act: lockless operation
As mentioned by Alexei last week in Budapest, it is a bit weird
to take a spinlock in order to drop a packet in a tc filter...
Lets add percpu infra for tc actions and use it for gact & mirred.
Before changes, my host with 8 RX queues was handling 5 Mpps with gact,
and more than 11 Mpps after.
Mirred change is not yet visible if ifb+qdisc is used, as ifb is
not yet multi queue enabled, but is a step forward.
====================
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like act_gact, act_mirred can be lockless in packet processing
1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) use rcu to protect tcfm_dev
4) Remove spinlock usage, as it is no longer needed.
Next step : add multi queue capability to ifb device
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Final step for gact RCU operation :
1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) Remove spinlock acquisition, as it is no longer needed.
Since this is the last contended lock in packet RX when tc gact is used,
this gives impressive gain.
My host with 8 RX queues was handling 5 Mpps before the patch,
and more than 11 Mpps after patch.
Tested:
On receiver :
dev=eth0
tc qdisc del dev $dev ingress 2>/dev/null
tc qdisc add dev $dev ingress
tc filter del dev $dev root pref 10 2>/dev/null
tc filter del dev $dev pref 10 2>/dev/null
tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \
u32 match ip src 7.0.0.0/8 flowid 1:15 action drop
Sender sends packets flood from 7/8 network
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Third step for gact RCU operation :
Following patch will get rid of spinlock protection,
so we need to read tcfg_ptype once.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Second step for gact RCU operation :
We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.
gact_determ() would not work without a central packet counter,
so lets add it for this mode.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>