We leak an active timer, the hotcpu notifier and all allocated
resources when we exit a namespace. Fix this by introducing a
flow_cache_fini() function where we release the resources before
we exit.
Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of __constant_<foo> has been unnecessary for quite awhile now.
Make these uses consistent with the rest of the kernel.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
priv is not instantiated at gfar_of_init() time, when
parsing the DT for info on supported HW queues. Before
the netdev can be allocated, the number of supported
queues must be known. Because the number of supported
queues depends on device type, move the compatibility
checks before netdev allocation. Local vars are used
to hold the operation mode info before netdev allocation.
This fixes the null accesses for priv->.., in gfar_of_init.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rf233 and rf231 are sufficiently similar that we can treat
rf233 like rf231.
rf233 is missing some features that rf231 has, but we don't currently
make use of them so there's nothing to handle differently yet.
Should we add support in the future for rf231 *_NOCLK or SLEEP states,
or PAD_IO drive strength, exceptions will need to be made for rf233.
Signed-off-by: Thomas Stilwell <stilwellt@openlabs.co>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
pkt_sched: allow scheduling points
We have seen delays of more than 50ms in class or qdisc dumps, in case
device is under high TX stress, even with the prior 4KB per skb limit.
With the new 16KB limit, this could translate to 200ms delays.
Add cond_resched() to give a chance to higher prio tasks to get cpu.
But before doing so, we need to remove the rcu locking from tc_dump_qdisc()
as David spotted.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We have seen delays of more than 50ms in class or qdisc dumps, in case
device is under high TX stress, even with the prior 4KB per skb limit.
Add cond_resched() to give a chance to higher prio tasks to get cpu.
Signed-off-by; Eric Dumazet <edumazet@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like all rtnetlink dump operations, we hold RTNL in tc_dump_qdisc(),
so we do not need to use rcu protection to protect list of netdevices.
This will allow preemption to occur, thus reducing latencies.
Following patch adds explicit cond_resched() calls.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The option code is taking IP address and putting it into a generic
container. Force cast to silence sparse warnings.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes sparse warnings in vlan driver.
It propagates the sparse __percpu attribute from alloc_percpu
into netdev_alloc_pcpu_stats. I expect it may trigger additional
sparse warnings from other drivers that are missing the __percpu
attribute.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets which have L2 address different from ours should be
already filtered before entering into ip6_forward().
Perform that check at the beginning to avoid processing such packets.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call skb_cow_head() before editing the tx packet header. The header
would be reallocated if it is shared.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using an own copy of struct net_device_stats in struct
cpsw_priv, use stats from struct net_device. Also remove the thus
unnecessary .ndo_get_stats function, as it just returns dev->stats,
which is the default.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not legal to create multiple kmem_cache having the same name.
flowcache can use a single kmem_cache, no need for a per netns
one.
Fixes: ca925cf153 ("flowcache: Make flow cache name space aware")
Reported-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Jakub Kicinski <moorray3@wp.pl>
Tested-by: Fan Du <fan.du@windriver.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not need to switch the net_ns if the target net_ns the same
as the current one, so here we add a pre-check of net_ns to avoid
this as David suggested.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All skb in socket write queue should be properly timestamped.
In case of FastOpen, we special case the SYN+DATA 'message' as we
queue in socket wrote queue the two fallback skbs:
1) SYN message by itself.
2) DATA segment by itself.
We should make sure these skbs have proper timestamps.
Add a WARN_ON_ONCE() to eventually catch future violations.
Fixes: 740b0f1841 ("tcp: switch rtt estimations to usec resolution")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to a bug in the Hyper-V host verion 2008R2, we need to use a slightly smaller
receive buffer size, otherwise the buffer will not be accepted by the legacy hosts.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct offset is 3 of the 6lowpanfrag_max_datagram_size value in proc
entry ctl table and not 2.
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
K. Y. Srinivasan says:
====================
Drivers: net: hyperv: Enable various offloads
This patch set enables both checksum as well as segmentation offload.
As part of this effort I have enabled scatter gather I/O a well.
In version 2 of these patches, I addressed comments from David Miller and
Dan Carpenter.
In this version I have addressed the latest comments from David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable segmentation offload.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable send side checksum offload.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable receive side checksum offload.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to enabling guest side offloads, enable the offloads on the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for enabling offloads, cleanup the send path.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup the code and enable scatter gather I/O.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver reads the mac address from the device registers which would
need to have been programmed by the bootloader. This patch adds
the ability to pull the mac from devicetree via the pci device dt node.
Signed-off-by: Tim Harvey <tharvey@gateworks.com>
Cc: netdev@vger.kernel.org
Cc: devicetree@vger.kernel.org
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <robh+dt@kernel.org>
Changes since v2:
- eliminated use of stack tmpaddr per feedback
Changes since v1:
- simplified based on feedback
- fixed formatting
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/l2tp/l2tp_core.c:1111:15: warning: unused variable
'sk' [-Wunused-variable]
Fixes: 31c70d5956 ("l2tp: keep original skb ownership")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call to mlx5_health_cleanup() in the module init function can never
be reached. Removing it.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One known problem with netlink is the fact that NLMSG_GOODSIZE is
really small on PAGE_SIZE==4096 architectures, and it is difficult
to know in advance what buffer size is used by the application.
This patch adds an automatic learning of the size.
First netlink message will still be limited to ~4K, but if user used
bigger buffers, then following messages will be able to use up to 16KB.
This speedups dump() operations by a large factor and should be safe
for legacy applications.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Faster than memcpy/memset on some architectures.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the "fsl,etsec2" compatible models the driver currently
supports 8 Tx and Rx DMA rings (aka HW queues). However, there
are only 2 pairs of Rx/Tx interrupt lines, as these controllers
are integrated in low power SoCs with 2 CPUs at most. As a result,
there are at most 2 NAPI instances that have to service multiple
Tx and Rx queues for these devices. This complicates the NAPI
polling routine having to iterate over the mutiple Rx/Tx queues
hooked to the same interrupt lines. And there's also an overhead
at HW level, as the controller needs to service all the 8 Tx rings
in a round robin manner. The combined overhead shows up for multi
parallel Tx flows transmitted by the kernel stack, when the driver
usually starts returning NETDEV_TX_BUSY leading to NETDEV WATCHDOG
Tx timeout triggering if the Tx path is congested for too long.
As an alternative, this patch makes the driver support only one
Tx/Rx DMA ring per NAPI instance (per interrupt group or pair
of Tx/Rx interrupt lines) by default. The simplified single queue
polling routine (gfar_poll_sq) will be the default napi poll routine
for the etsec2 devices too. Some adjustments needed to be made to
link the Tx/Rx HW queues with each NAPI instance (2 in this case).
The gfar_poll_sq() is already successfully used by older SQ_SG_MODE
(single interrupt group) controllers.
This patch fixes Tx timeout triggering under heavy Tx traffic load
(i.e. iperf -c -P 8) for the "fsl,etsec2" (currently the only
MQ_MG_MODE devices). There's also a significant memory footprint
reduction by supporting 2 Rx/Tx DMA rings (at most), instead of 8,
for these devices.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some concurrency issues on devices w/ 2 CPUs related
to the handling of Rx and Tx interrupts. eTSEC has separate
interrupt lines for Rx and Tx but a single imask register
to mask these interrupts and a single NAPI instance to handle
both Rx and Tx work. As a result, the Rx and Tx ISRs are
identical, both are invoking gfar_schedule_cleanup(), however
both handlers can be entered at the same time when the Rx and
Tx interrupts are taken by different CPUs. In this case
spurrious interrupts (SPU) show up (in /proc/interrupts)
indicating a concurrency issue. Also, Tx overruns followed
by Tx timeout have been observed under heavy Tx traffic load.
To address these issues, the schedule cleanup ISR part has
been changed to handle the Rx and Tx interrupts independently.
The patch adds a separate NAPI poll routine for Tx cleanup to
be triggerred independently by the Tx confirmation interrupts
only. Existing poll functions are modified to handle only
the Rx path processing. The Tx poll routine does not need a
budget, since Tx processing doesn't consume NAPI budget, and
hence it is registered with minimum NAPI weight.
NAPI scheduling does not require locking since there are
different NAPI instances between the Rx and Tx confirmation
paths now.
So, the patch fixes the occurence of spurrious Rx/Tx interrupts.
Tx overruns also occur less frequently now.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ether_addr_equal_64bits is more efficient than ether_addr_equal, and
can be used when each argument is an array within a structure that
contains at least two bytes of data beyond the array, so it is safe
to use it for vlan, and make sense for fast path.
Cc: Joe Perches <joe@perches.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According Joe's suggestion, maybe it'd be faster to add an unlikely to
the test for PCKET_OTHERHOST, so I add it and see whether the performance
could be better, although the differences is so small and negligible, but
it is hard to catch that any lower device would set the skb type to
PACKET_OTHERHOST, so most of time, I think it make sense to add unlikely
for the test.
Cc: Joe Perches <joe@perches.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resizing fq hash table allocates memory while holding qdisc spinlock,
with BH disabled.
This is definitely not good, as allocation might sleep.
We can drop the lock and get it when needed, we hold RTNL so no other
changes can happen at the same time.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to e1000e, ixgbevf and igb.
Majority of this series contains fixes and cleanups to e1000e,
most notably are:
Todd provides a fix to PTP in e1000e which adds a lock in e1000e_phc_adjfreq
to prevent concurrent changes to TIMINCA and SYSTIMH/L. Then provides an
igb fix to use ARRAY_SIZE for array size calculation.
David provides the remaining e1000e which contain:
- cleanup of pointer references that are no longer used
- fix an issue on systems with Management Engine enabled with the
ethernet cable unplugged
- fix an issue on 82579 where enabling EEE LPI sooner than one second
after link up causes link issues on some switches
- refactor the power management flows to prevent the suspend path from
being executed twice when hibernating
- refactor the runtime power management to fix interfering with the
functionality of Energy Efficient Ethernet when enabled and to fix
the device from repeatedly flip between suspend and resume with the
interface administratively downed
- enable the feature PHY Ultra Low Power Mode which is a power saving
feature that reduces the power consumption of the PHY when a cable is
not connected
- fix the ethtool offline tests for 82579 parts
- fix SHRA register access for 82579 parts which was introduced by
previous commit c3a0dce35a "e1000e: fix overrun of PHY RAR array"
Florian provides a fix for ixgbevf where skb->pkt_type was being checked
like a bitmask, but it is not a bitmask.
Fix an issue reported by Stephen Hemminger where there was a warning
about code defined but never used if IGB_HWMON is not defined.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix warning about code defined but never used if IGB_HWMON not defined.
Reported-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use ARRAY_SIZE for array size calculation.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
skb->pkt_type is not a bitmask, but contains only value at a time from
the range defined in include/uapi/linux/if_packet.h.
Checking it like if it was a bitmask of values would also cause
PACKET_OTHERHOST, PACKET_LOOPBACK and PACKET_FASTROUTE to be matched by
this check since their lower 2 bits are also set, although that does not
fix a real bug, it is still potentially confusing.
This bogus check was introduced in commit 815cccbf ("ixgbe: add setlink,
getlink support to ixgbe and ixgbevf").
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Previous commit c3a0dce35a fixed an overrun for the RAR on i218 devices.
This commit also attempted to homogenize the RAR/SHRA access for all parts
accessed by the e1000e driver. This change introduced an error for
assigning MAC addresses to guest OS's for 82579 devices.
Only RAR[0] is accessible to the driver for 82579 parts, and additional
addresses must be placed into the SHRA[L|H] registers. The rar_entry_count
was changed in the previous commit to an inaccurate value that accounted
for all RAR and SHRA registers, not just the ones usable by the driver.
This patch fixes the count to the correct value and adjusts the
e1000_rar_set_pch2lan() function to user the correct index.
Cc: John Greene <jogreene@redhat.com>
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Changes to the rar_entry_count value require a change to the indexing
used to access the SHRA[H|L] registers when testing them with
'ethtool -t <iface> offline'
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Valid values for InterruptThrottleRate are 10-100000, or one of
0, 1, 3, 4. '2' is not valid. This is a legacy from the branching
from the e1000 driver code that e1000e was based from.
Prior to this patch, if the e1000e driver was loaded with a forced
invalid InterruptThrottleRate of '2', then no throttle rate would be
set and no error message generated.
Now, a message will be generated that an invalid value was used and the
value for InterruptThrottleRate will be set to the default value.
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
ULP is a power saving feature that reduces the power consumption of the
PHY when a cable is not connected.
ULP is gated on the following conditions:
1) The hardware must support ULP. Currently this is only I218
devices from Intel
2) ULP is initiated by the driver, so, no driver results in no ULP.
3) ULP's implementation utilizes Runtime Power Management to toggle its
execution. ULP is enabled/disabled based on the state of Runtime PM.
4) ULP is not active when wake-on-unicast, multicast or broadcast is active
as these features are mutually-exclusive.
Since the PHY is in an unavailable state while ULP is active, any access
of the PHY registers will fail. This is resolved by utilizing kernel
calls that cause the device to exit Runtime PM (e.g. pm_runtime_get_sync)
and then, after PHY access is complete, allow the device to resume
Runtime PM (e.g. pm_runtime_put_sync).
Under certain conditions, toggling the LANPHYPC is necessary to disable
ULP mode. Break out existing code to toggle LANPHYPC to a new function
to avoid code duplication.
Signed-off-by: Dave Ertman <davidx.m.ertman@intel.com>
Cc: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>