Defer calling unregister_netdevice_queue to cleanup_net. It's simpler
and it allows the loopback device to land in the same batch as other
network devices.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To get the full benefit of batched network namespace cleanup netowrk
device deletion needs to be performed by the generic code. When
using register_pernet_gen_device and freeing the data in exit_net
it is impossible to delay allocation until after exit_net has called
as the device uninit methods are no longer safe.
To correct this, and to simplify working with per network namespace data
I have moved allocation and deletion of per network namespace data into
the network namespace core. The core now frees the data only after
all of the network namespace exit routines have run.
Now it is only required to set the new fields .id and .size
in the pernet_operations structure if you want network namespace
data to be managed for you automatically.
This makes the current register_pernet_gen_device and
register_pernet_gen_subsys routines unnecessary. For the moment
I have left them as compatibility wrappers in net_namespace.h
They will be removed once all of the users have been updated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is fairly common to kill several network namespaces at once. Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments. As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.
To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch. For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.
An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.
In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I will need this shortly to implement network namespace shutdown
batching. For sanity sake network devices should be removed in
the reverse order they were created in.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The motivation for an additional notifier in batched netdevice
notification (rt_do_flush) only needs to be called once per batch not
once per namespace.
For further batching improvements I need a guarantee that the
netdevices are unregistered in order allowing me to unregister an all
of the network devices in a network namespace at the same time with
the guarantee that the loopback device is really and truly
unregistered last.
Additionally it appears that we moved the route cache flush after
the final synchronize_net, which seems wrong and there was no
explanation. So I have restored the original location of the final
synchronize_net.
Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a space separating some if keywords from the following
parenthesis to conform to the CodingStyle.
Signed-off-by: Rudy Matela <rudy.matela@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a space separating some keywords (if/while) from the following
parenthesis to conform to the CodingStyle.
Signed-off-by: Rudy Matela <rudy.matela@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch "fix memory initialization:5d521fd36de4e61" didn't got merge.
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This driver has been mostly rewritten since Michael Brown's initial
work, so swap the order of the authors.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This integrates support for the SFC9000 family of 10G Ethernet
controllers and LAN-on-motherboard chips, starting with the SFL9021
'Siena' and SFC9020 'Bethpage'.
Credit for this code is largely due to my colleagues at Solarflare:
Guido Barzini
Steve Hodgson
Kieran Mansley
Matthew Slattery
Neil Turton
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for the SFC9000 family of 10G Ethernet controllers
and LAN-on-motherboard chips, starting with the SFL9021 'Siena' and
SFC9020 'Bethpage'.
The SFC9000 family is based on the SFC4000 'Falcon' architecture, but
with some significant changes:
- Two ports are associated with two independent PCI functions
(except SFC9010)
- Integrated 10GBASE-T PHY(s) (SFL9021/9022)
- MAC, PHY and board peripherals are managed by firmware
- Driver does not require board-specific code
- Firmware supports wake-on-LAN and lights-out management through NC-SI
- IPv6 checksum offload and RSS
- Filtering by MAC address and VLAN (not included in this code)
- PCI SR-IOV (not included in this code)
Credit for this code is largely due to my colleagues at Solarflare:
Guido Barzini
Steve Hodgson
Kieran Mansley
Matthew Slattery
Neil Turton
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In new NICs flash is managed by firmware and we will use high-level
operations on partitions rather than direct SPI commands. Add support
for multiple MTD partitions per flash device and remove the direct
link between MTD and SPI devices. Maintain a list of MTD partitions
in struct efx_nic.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New NICs have firmware managing the PHY, and we will discover the PHY
capabilities at run-time. Replace the static data with probe() and
test_name() operations.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
New NICs and PHYs support a wider variety of loopback modes.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
falcon_probe_nic_variant() does a lot less than it used to, and a
lot less than it claims to. Fold the remainder into its caller.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not including net/atm/
Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor efx_reset_down() and efx_reset_up() accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wake-on-LAN is a stub for Falcon, but will be implemented fully for
new NICs.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the efx_nic_type::monitor operation or event handling as
appropriate.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor PHY, MAC and NIC configuration operations so that the
existing link configuration can be re-pushed with:
efx->phy_op->reconfigure(efx);
efx->mac_op->reconfigure(efx);
and a new configuration with:
efx->nic_op->reconfigure_port(efx);
(plus locking and error-checking).
We have not held the link settings in software (aside from flow
control), and have relied on asking the hardware what they are. This
is a problem because in some cases the hardware may no longer be in a
state to tell us. In particular, if an entire multi-port board is
reset through one port, the driver bindings to other ports have no
chance to save settings before recovering.
We only actually need to keep track of the autonegotiation settings,
so add an ethtool advertising mask to struct efx_nic, initialise it
in PHY init and update it as necessary.
Remove now-unneeded uses of efx_phy_op::{get,set}_settings() and
struct ethtool_cmd.
Much of this was done by Steve Hodgson <shodgson@solarflare.com>.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is preparation for adding differing implementations for new NICs.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pause frame generation is gated by both RX_XOFF_MAC_EN and an enable
bit in each MAC. RX_XOFF_MAC_EN bit always reads back as 0 so we need
to set it correctly every time we modify RX_CFG_REG. Simplify this by
always setting it to 1 and only changing the enable bits in the MACs.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register only needs to be written after reset, not each time we
enable interrupts.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pktgen threads are bound to given CPU, we can allocate memory for
these threads in a NUMA aware way.
After a pktgen session on two threads, we can check flows memory was
allocated on right node, instead of a not related one.
# grep pktgen_thread_write /proc/vmallocinfo
0xffffc90007204000-0xffffc90007385000 1576960 pktgen_thread_write+0x3a4/0x6b0 [pktgen] pages=384 vmalloc N0=384
0xffffc90007386000-0xffffc90007507000 1576960 pktgen_thread_write+0x3a4/0x6b0 [pktgen] pages=384 vmalloc N1=384
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch extends the ethtool interface to display what PHY
is currently connected to a NIC. The results can be viewed in
ethtool ethX output.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows a base driver to specify Direct Attach as the
type of port through the ethtool interface.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an issue when clearing out the RAR entries. If RAR[0]
is the only address in use, don't clear the others.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
82598 shouldn't try and access LINKS2 while configuring
link and flow control. This is an 82599-only register.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Flow Control autoneg should be disabled for certain adapters
that don't support autonegotiation of Flow Control at 10 gigabit.
These interfaces are the 10GBASE-T devices, CX4, and SFP+, all
running at 10 gigabit only. 1 gigabit is fine.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver was doing a divide by zero when adjusting tx-usecs.
This patch removes the divide by zero code and changes the logic slightly
to ignore tx-usecs in the case of shared TxRx vectors.
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls to x25_dev_get check for dev = NULL which was not set.
It allowed x25 to set routes and ioctls on down interfaces.
This caused oopses and refcnt problems on device_unregister.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the CONFIG_SYSCTL ifdefs in x25_init into header.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert smc91x driver from legacy PM hooks over to using dev_pm_ops.
Tested on OMAP3 platform.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
T3-rtx timer expired but did not fit in one MTU (rule E3 above)
should be marked for retransmission and sent as soon as cwnd
allows (normally, when a SACK arrives). ".
This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).
Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).
This commit reverts d0ce92910b and
also removes the now unused transport->last_rto, introduced in
b6157d8e03.
p.s The problem is not only when multiple paths are there. It
can happen in a single homed environment. If the application
stops sending data, it possible to have a hung association.
Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add 'likely' hint to test of rx_checksum_enabled.
Don't count IP fragments; the IP stack can do that.
Do count non-matching multicast packets.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>