Pawel Staszewski wrote:
<blockquote>
Some time ago i report this:
http://bugzilla.kernel.org/show_bug.cgi?id=6648
and now with 2.6.29 / 2.6.29.1 / 2.6.29.3 and 2.6.30 it back
dmesg output:
oprofile: using NMI interrupt.
Fix inflate_threshold_root. Now=15 size=11 bits
...
Fix inflate_threshold_root. Now=15 size=11 bits
cat /proc/net/fib_triestat
Basic info: size of leaf: 40 bytes, size of tnode: 56 bytes.
Main:
Aver depth: 2.28
Max depth: 6
Leaves: 276539
Prefixes: 289922
Internal nodes: 66762
1: 35046 2: 13824 3: 9508 4: 4897 5: 2331 6: 1149 7: 5
9: 1 18: 1
Pointers: 691228
Null ptrs: 347928
Total size: 35709 kB
</blockquote>
It seems, the current threshold for root resizing is too aggressive,
and it causes misleading warnings during big updates, but it might be
also responsible for memory problems, especially with non-preempt
configs, when RCU freeing is delayed long after call_rcu.
It should be also mentioned that because of non-atomic changes during
resizing/rebalancing the current lookup algorithm can miss valid leaves
so it's additional argument to shorten these activities even at a cost
of a minimally longer searching.
This patch restores values before the patch "[IPV4]: fib_trie root
node settings", commit: 965ffea43d from
v2.6.22.
Pawel's report:
<blockquote>
I dont see any big change of (cpu load or faster/slower
routing/propagating routes from bgpd or something else) - in avg there
is from 2% to 3% more of CPU load i dont know why but it is - i change
from "preempt" to "no preempt" 3 times and check this my "mpstat -P ALL
1 30"
always avg cpu load was from 2 to 3% more compared to "no preempt"
[...]
cat /proc/net/fib_triestat
Basic info: size of leaf: 20 bytes, size of tnode: 36 bytes.
Main:
Aver depth: 2.44
Max depth: 6
Leaves: 277814
Prefixes: 291306
Internal nodes: 66420
1: 32737 2: 14850 3: 10332 4: 4871 5: 2313 6: 942 7: 371 8: 3 17: 1
Pointers: 599098
Null ptrs: 254865
Total size: 18067 kB
</blockquote>
According to this and other similar reports average depth is slightly
increased (~0.2), and root nodes are shorter (log 17 vs. 18), but
there is no visible performance decrease. So, until memory handling is
improved or added parameters for changing this individually, this
patch resets to safer defaults.
Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Reported-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check that network interface is running before changing its MAC address.
Otherwise, rxch is accessed when it's NULL - causing a kernel oops.
Moreover, check that the new MAC address is valid.
Signed-off-by: Pablo Bitton <pablo.bitton@gmail.com>
Signed-off-by: Chaithrika U S <chaithrika@ti.com>
Tested-by: Chaithrika U S <chaithrika@ti.com>
[tested on DM6467 EVM]
Signed-off-by: David S. Miller <davem@davemloft.net>
The igb driver was defaulting to using the lock for pci-e function 0 for
all of the phys due to the fact that the lan id was not being set prior to
initialization. This change makes it so that the function id is set prior
to checking for the phy id.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fec: fix definition of 5272 version of FEC_X_DES_ACTIVE register
The ColdFire 5272 FEC driver has a different register address map
than other users of the FEC driver. And its definition of the
FEC_X_DES_ACTIVE register is incorrect, it should be 0x14.
The fec interface cannot transmit data with the old value.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
----
Signed-off-by: David S. Miller <davem@davemloft.net>
The bit that tells us whether a statistics counter snapshot operation
has completed is located in the GLOBAL register block, not in the
GLOBAL2 register block, so fix up mv88e6xxx_stats_wait() to poll the
right register address.
Signed-off-by: Stephane Contri <Stephane.Contri@grassvalley.com>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet a écrit :
> Ingo Molnar a écrit :
>>> The following changes since commit 5298976562:
>>> Linus Torvalds (1):
>>> Merge git://git.kernel.org/.../davem/net-2.6
>>>
>>> are available in the git repository at:
>>>
>>> master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master
>> Hm, something in this lot quickly wrecked networking here - see the
>> tx timeout dump below. It starts with:
>>
>> [ 351.004596] WARNING: at net/sched/sch_generic.c:246 dev_watchdog+0x10b/0x19c()
>> [ 351.011815] Hardware name: System Product Name
>> [ 351.016220] NETDEV WATCHDOG: eth0 (forcedeth): transmit queue 0 timed out
>>
>> Config attached. Unfortunately i've got no time to do bisection
>> today.
>
>
>
> forcedeth might have a problem, in its netif_wake_queue() logic, but
> I could not see why a recent patch could make this problem visible now.
>
> CPU0/1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 02
> is not a new cpu either :)
>
> forcedeth uses an internal tx_stop without appropriate barrier.
>
> Could you try following patch ?
>
> (random guess as I dont have much time right now)
We might have a race in napi_schedule(), leaving interrupts disabled forever.
I cannot test this patch, I dont have the hardware...
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The call resource_size(res) returns res->end - res->start + 1 and thus the
second change is semantics-preserving. res_size is then used as the second
argument of a call to request_mem_region, and the memory allocated by this
call appears to be the same as what is released in the two calls to
release_mem_region. So the size argument for those calls should be
resource_size(size) as well. Alternatively, in the second call to
release_mem_region, the second argument could be res_size, as that variable
has already been initialized at the point of this call.
The problem was found using the following semantic patch:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
struct resource *res;
@@
- (res->end - res->start) + 1
+ resource_size(res)
@@
struct resource *res;
@@
- res->end - res->start
+ BAD(resource_size(res))
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
add new id (RIOS System PC CARD3 ETHERNET).
Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch properly defines the maximum values for rx/tx coalescing timeouts.
Signed-off-by: Vlad Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Problem reported by Flavio Leitner <fleitner@redhat.com>:
When setting rx/tx coalescing timeout to the values less than 12 traffic was
stopped.
The FW supports coalescing in 12us granularity, and so value of less then 12
should be interpreted as disabling coalescing
Signed-off-by: Vlad Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is currently possible for an asynchronous device unregister
to cause the same tun device to be unregistered twice. This
is because the unregister in tun_chr_close only checks whether
__tun_get(tfile) != NULL. This however has nothing to do with
whether the device has already been unregistered. All it tells
you is whether __tun_detach has been called.
This patch fixes this by using the most obvious thing to test
whether the device has been unregistered.
It also moves __tun_detach outside of rtnl_unlock since nothing
that it does requires that lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Occasionally we may see an interrupt without an event in the eq.
In intx, we currently see the event queue and return IRQ_NONE causing
a the irq to be disabled ("no one cared".) Instead, read the CEV_ISR
reg to check the existence of the interrupt.
Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This workaround is required for an issue in hardware where noise on the
interconnect between the MAC and PHY could be generated by a lower power
mode (K1) at 1000Mbps resulting in bad packets. Disable K1 while at 1000
Mbps but keep it enabled for 10/100Mbps and when the cable is disconnected.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some PHYs may require two reads of the PHY_STATUS register to determine the
link status. If the PHY is being accessed by another thread it is possible
the first read could timeout and fail. In this case, put a delay in so
the second read will pick up the correct link status.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Limit NVM writes to 4K sections to prevent NVM corruption on larger
sector allocations (up to 64K).
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver was accessing register bits for features on parts that do
not support that feature. This could cause problems in the hardware.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A previous workaround for 82578 to avoid link stall causes some PHY
registers to get cleared inadvertently. Add a delay after all LCD resets
to make sure PHY registers are in a stable state before continuing. Also,
after resets check the EEC register for the state of PHY configuration
performed by the MAC for ICH9 and earlier parts (as done before), but check
the LAN_INIT_DONE bit in the STATUS register for ICH10 and newer parts (EEC
doesn't exist in these newer parts).
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PHY loopback on 82578 fails to work as a result of flushing the packets
in the FIFO buffer in the link stall workaround. Don't perform the
workaround if in PHY loopback mode.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wake-on-lan is currently only supported by 82599 KX4 devices, in all
other cases return a proper value from ixgbe_wol_exclusion function call.
Otherwise from ethtool we will be able to change wol options of
unsupported 8259x devices.
Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if we loaded the driver, insert an unsupported module, and then
attempt to "ifconfig up" the device it will be brought down but the netdev
would not be unregistered. This behavior is different than all other
code paths. This patch corrects that by down'ing the device and then
scheduling the sfp_config_module_task tasklet. The tasklet will detect
this condition (like it does with other code paths) and do the
unregister_netdev().
I also removed the log message as this condition (an unsupported SFP+
module) will be logged in sfp_config_module_task.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The change to check the SFP+ module again on open() was
causing the XFP (non-SFP+) adapters to be rejected. We
only want to try and re-identify the SFP+ module if the
original probe found that this device was an SFP+ device.
So for this code path (driver loaded with SFP module, module
inserted, ifconfig up of the device) the type will be
ixgbe_phy_unknown for an unidentified SFP+ module. So we
only check if that is the case.
This problem also shows up on Copper devices.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several small fixes around negative test case of the insertion of a
IXGBE_ERR_NOT_SUPPORTED module.
- mdio45_probe call was always failing due to mdio.prtad not being
set. The function set to mdio.mdio_read was still working as we just
happen to always be at prtad == 0. This will allow us to set the phy_id
and phy.type correctly now.
- There was timing issue with i2c calls when initiated from a tasklet.
A small delay was added to allow the electrical oscillation to calm down.
- Logic change in ixgbe_sfp_task that allows NOT_SUPPORTED condition
to be recognized.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some usage was only sizing a pointer rather than the data type.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to set/clear the mac address register when the link goes up/down
respectively. Without this both ports of a 2-port device can end up
with the same mac address in a bonding scenario.
The new ql_link_on() and ql_link_off() will also be used in handling
certain firmware events.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This addes functionality to set/clear the MAC address in the hardware
when the link goes up/down.
The MAC address register is persistent across function resets. In
bonding the same address can bounce from one port to the other. This
can cause packets to be delivered to the wrong port.
This patch clears the MAC address in the hardware when the link is down
and sets it when the link comes up.
It was found that pulling/pushing the cable from one port to another
causes the same MAC address to be in both ports.
The next patch in this series will use this functionality as well.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The caller will free acquired resouces if a failure occurs.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were turning on the carrier without verifying the link was up.
This adds link up to the link initialize check before turning carrier
on.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not clearing the routing bits can cause frames to erroneously get routed to
management processor.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware semaphore covers the configuration register as well as the
ICB registers. The ICB high and low regs contain the address of the
initialization control block and the config register is used to signal
the hardware that a block is ready to be downloaded. Currently we were
only protecting the ICB regs. This changes expands to cover the config
register as well.
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a bug in addrconf_prefix_rcv() where it won't update the
preferred lifetime of an IPv6 address if the current valid lifetime
of the address is less than 2 hours (the minimum value in the RA).
For example, If I send a router advertisement with a prefix that
has valid lifetime = preferred lifetime = 2 hours we'll build
this address:
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
valid_lft 7175sec preferred_lft 7175sec
If I then send the same prefix with valid lifetime = preferred
lifetime = 0 it will be ignored since the minimum valid lifetime
is 2 hours:
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
valid_lft 7161sec preferred_lft 7161sec
But according to RFC 4862 we should always reset the preferred lifetime
even if the valid lifetime is invalid, which would cause the address
to immediately get deprecated. So with this patch we'd see this:
5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic
valid_lft 7163sec preferred_lft 0sec
The comment winds-up being 5x the size of the code to fix the problem.
Update the preferred lifetime of IPv6 addresses derived from a prefix
info option in a router advertisement even if the valid lifetime in
the option is invalid, as specified in RFC 4862 Section 5.5.3e. Fixes
an issue where an address will not immediately become deprecated.
Reported by Jens Rosenboom.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SCTP pushed the skb above the sctp chunk header, so the
check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in
_decode_session6() will never return 0 and the ports decode
of sctp will always fail. (nh + offset + 1 - skb->data < 0)
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SCTP pushed the skb data above the sctp chunk header, so the check
of pskb_may_pull(skb, xprth + 4 - skb->data) in _decode_session4() will
never return 0 because xprth + 4 - skb->data < 0, the ports decode of
sctp will always fail.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not safe to use match_int without checking the token type returned
by match_token (especially when the token type returned is Opt_err and
args is empty). Fix it.
Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds ETH_P_1588 protocol ID define.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PHY_HALTED state disables phydev->link, but the link will not be
updated upon entering PHY_RESUMING. Add a call to phy_read_status() to
update the link before entering PHY_RUNNING. If the link is not up at
this point, enter the PHY_NOLINK state instead.
Also, when transitioning from PHY_RESUMING to PHY_RUNNING, calls to
netif_carrier_on() and phydev->adjust_link() are missing. Add the calls
similar to the other transitions to PHY_RUNNING.
Signed-off-by: Wade Farnsworth <wfarnsworth@mvista.com>
Acked-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Restrict firmware reset to following cases -
o chip rev is NX2031 (firmare doesn't support heartbit).
o firmware is dead.
o previous attempt to init firmware had failed.
o we have got newer file firmware.
This speeds up module load tremendously (by upto 8 sec),
also avoids downtime for NCSI (management) pass-thru
traffic.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct firmware encoding is 8 bit major, 8 bit minor and
16 bit subversion. Flash has sizes rightly set, but original
driver submission messed it leaving 16 bit major and 8 bit
subversion.
Also fix a infinite loop when cut-thru file firmware is
invalid.
Signed-off-by: Dhananjay Phadke <dhananjay@netxen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop a sanity check which doesn't serve any useful purpose anymore.
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
ISDN connection setup failed if the "connection active" and
"B channel up" messages from the device arrived in a different
order than expected. Modify the state machine to accept them in
any order.
Impact: bugfix
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 73ce7b01b4.
After discovering that we don't listen to gratuitious arps in 2.6.30
I tracked the failure down to this commit.
The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227.
are all in agreement that an arp request with sip == 0 should be used
for the probe (to prevent learning) and an arp request with sip == tip
should be used for the gratitous announcement that people can learn
from.
It appears the author of the broken patch got those two cases confused
and modified the code to drop all gratuitous arp traffic. Ouch!
Cc: stable@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI drivers that implement the io_error_detected callback should return
PCI_ERS_RESULT_DISCONNECT if the state passed in is
pci_channel_io_perm_failure. This patch fixes the issue for igb.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
on permanent failure
PCI drivers that implement the io_error_detected callback
should return PCI_ERS_RESULT_DISCONNECT if the state
passed in is pci_channel_io_perm_failure. This state is not
checked in many of the network drivers.
This patch fixes the omission in the e1000e driver.
Signed-off-by: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PCI drivers that implement the io_error_detected callback
should return PCI_ERS_RESULT_DISCONNECT if the state
passed in is pci_channel_io_perm_failure. This state is
not checked in many of the network drivers.
The patch fixes the omission in the e1000 driver.
Based on Mike Mason's similar patch for e1000e.
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
CC: Mike Mason <mmlnx@us.ibm.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
driver was mixing NET_IP_ALIGN count bytes in map/unmap calls
unevenly. Only map the bytes that the hardware might dma into
also fix unmap related bug where ->dma was not being cleared
after unmap
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>