While testing the performance of different receive interrupt
coalescing settings on a single stream TCP benchmark, I noticed two
very different results. With rx-usecs=50, most of the time a
connection would hit 8280 Mbps but once in a while it would hit
9330 Mbps.
It turns out we are only applying the interrupt coalescing settings
to the first queue and whenever the rx hash would direct us onto
that queue we ran faster.
With this patch applied and rx-usecs=50, I get 9330 Mbps
consistently.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flush_scheduled_work() is on its way out. This patch contains simple
conversions to replace flush_scheduled_work() usage with direct
cancels and flushes.
Directly cancel the used works on driver detach and flush them in
other cases.
The conversions are mostly straight forward and the only dangers are,
* Forgetting to cancel/flush one or more used works.
* Cancelling when a work should be flushed (ie. the work must be
executed once scheduled whether the driver is detaching or not).
I've gone over the changes multiple times but it would be much
appreciated if you can review with the above points in mind.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jay Cliburn <jcliburn@gmail.com>
Cc: Michael Chan <mchan@broadcom.com>
Cc: Divy Le Ray <divy@chelsio.com>
Cc: e1000-devel@lists.sourceforge.net
Cc: Vasanthy Kolluri <vkolluri@cisco.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Andrew Gallatin <gallatin@myri.com>
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Ramkrishna Vepa <ramkrishna.vepa@exar.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: Shreyas Bhatewara <sbhatewara@vmware.com>
Cc: netdev@vger.kernel.org
Currently the ret variable is not used for anything other than
receive the value of the t3_adapter_error(), which will always be 0,
because the reset parameter is 0.
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove racy queue stopping after device registration.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Along the same lines as "cxgb4: fix crash due to manipulating queues
before registration" (8f6d9f4047), before
commit "net: allocate tx queues in register_netdevice"
netif_tx_stop_all_queues and related functions could be used between
device allocation and registration but now only after registration.
cxgb4 has such a call before registration and crashes now. Move it
after register_netdev.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: eric.dumazet@gmail.com
Cc: sonnyrao@us.ibm.com
Cc: Divy Le Ray <divy@chelsio.com>
Cc: Dimitris Michailidis <dm@chelsio.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only negative return from bind_qsets() should be considered an error and
propagated.
It fixes an issue reported by IBM on P Series platform.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Tested-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed formatting (tabs and line breaks).
The CHELSIO_GET_QSET_NUM device ioctl allows unprivileged users to read
4 bytes of uninitialized stack memory, because the "addr" member of the
ch_reg struct declared on the stack in cxgb_extension_ioctl() is not
altered or zeroed before being copied back to the user. This patch
takes care of it.
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't call flush_workqueue() on the cxgb3 Work Queue in cxgb_down() when
we're being called from the fatal error task ... which is executing on the
cxgb3 Work Queue.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IRQ and resource[] may not have correct values until
after PCI hotplug setup occurs at pci_enable_device() time.
The semantic match that finds this problem is as follows:
// <smpl>
@@
identifier x;
identifier request ~= "pci_request.*|pci_resource.*";
@@
(
* x->irq
|
* x->resource
|
* request(x, ...)
)
...
*pci_enable_device(x)
// </smpl>
Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memdup_user when user data is immediately copied into the
allocated region.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@
- to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+ to = memdup_user(from,size);
if (
- to==NULL
+ IS_ERR(to)
|| ...) {
<+... when != goto l1;
- -ENOMEM
+ PTR_ERR(to)
...+>
}
- if (copy_from_user(to, from, size) != 0) {
- <+... when != goto l2;
- -EFAULT
- ...+>
- }
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some Power7 platforms, when using VIOS (Virtual I/O Server), we
need to wait longer for control packets to finish transfer during
initialization.
Without this change, initialization may fail prematurely.
Signed-off-by: Wen Xiong <wenxiong@us.ibm.com>
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
bridge: ensure to unlock in error path in br_multicast_query().
drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
sky2: Avoid rtnl_unlock without rtnl_lock
ipv6: Send netlink notification when DAD fails
drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant
ipconfig: Handle devices which take some time to come up.
mac80211: Fix memory leak in ieee80211_if_write()
mac80211: Fix (dynamic) power save entry
ipw2200: use kmalloc for large local variables
ath5k: read eeprom IQ calibration values correctly for G mode
ath5k: fix I/Q calibration (for real)
ath5k: fix TSF reset
ath5k: use fixed antenna for tx descriptors
libipw: split ieee->networks into small pieces
mac80211: Fix sta_mtx unlocking on insert STA failure path
rt2x00: remove KSEG1ADDR define from rt2x00soc.h
net: add ColdFire support to the smc91x driver
asix: fix setting mac address for AX88772
ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
net: Fix dev_mc_add()
...
queue restart tasklets need to be stopped after napi handlers are stopped
since the latter can restart them. So stop them after stopping napi.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (48 commits)
IB/srp: Clean up error path in srp_create_target_ib()
IB/srp: Split send and recieve CQs to reduce number of interrupts
RDMA/nes: Add support for KR device id 0x0110
IB/uverbs: Use anon_inodes instead of private infinibandeventfs
IB/core: Fix and clean up ib_ud_header_init()
RDMA/cxgb3: Mark RDMA device with CXIO_ERROR_FATAL when removing
RDMA/cxgb3: Don't allocate the SW queue for user mode CQs
RDMA/cxgb3: Increase the max CQ depth
RDMA/cxgb3: Doorbell overflow avoidance and recovery
IB/core: Pack struct ib_device a little tighter
IB/ucm: Clean whitespace errors
IB/ucm: Increase maximum devices supported
IB/ucm: Use stack variable 'base' in ib_ucm_add_one
IB/ucm: Use stack variable 'devnum' in ib_ucm_add_one
IB/umad: Clean whitespace
IB/umad: Increase maximum devices supported
IB/umad: Use stack variable 'base' in ib_umad_init_port
IB/umad: Use stack variable 'devnum' in ib_umad_init_port
IB/umad: Remove port_table[]
IB/umad: Convert *cdev to cdev in struct ib_umad_port
...
T3 hardware doorbell FIFO overflows can cause application stalls due
to lost doorbell ring events. This has been seen when running large
NP IMB alltoall MPI jobs. The T3 hardware supports an xon/xoff-type
flow control mechanism to help avoid overflowing the HW doorbell FIFO.
This patch uses these interrupts to disable RDMA QP doorbell rings
when we near an overflow condition, and then turn them back on (and
ring all the active QP doorbells) when when the doorbell FIFO empties
out. In addition if an doorbell ring is dropped by the hardware, the
code will now recover.
Design:
cxgb3:
- enable these DB interrupts
- in the interrupt handler, schedule work tasks to call the ULPs event
handlers with the new events.
- ring all the qset txqs when an overflow is detected.
iw_cxgb3:
- disable db ringing on all active qps when we get the DB_FULL event
- enable db ringing on all active qps and ring all active dbs when we get
the DB_EMPTY event
- On DB_DROP event:
- disable db rings in the event handler
- delay-schedule a work task which rings and enables the dbs on
all active qps.
- in post_send and post_recv logic, don't ring the db if it's disabled.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Removed whole t3_rx_mode structure and appropriate helpers cause they are no
longer needed.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use DEFINE_PCI_DEVICE_TABLE() so we get place PCI ids table into correct section
in every case.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 4b77b0a2ba ("PCI: Clear
saved_state after the state has been restored"), the EEH is not
working proplery on cxgb3.
This patch fixes it, always saving the PCI state after a recovery,
in order to allow further reoveries.
Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only files where David Miller is the primary git-signer.
wireless, wimax, ixgbe, etc are not modified.
Compile tested x86 allyesconfig only
Not all files compiled (not x86 compatible)
Added a few > 80 column lines, which I ignored.
Existing checkpatch complaints ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace run-time string formatting with preprocessor string
manipulation.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch added support of private MAC address per port and provisioning
packet handler for iSCSI traffic only.
The above changes are isolated to the cxgb3 driver, independent of any scsi or iscsi driver changes.
Acked-by: Karen Xie <kxie@chelsio.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Rakesh Ranjan <rakesh@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
netxen: update copyright
netxen: fix tx timeout recovery
netxen: fix file firmware leak
netxen: improve pci memory access
netxen: change firmware write size
tg3: Fix return ring size breakage
netxen: build fix for INET=n
cdc-phonet: autoconfigure Phonet address
Phonet: back-end for autoconfigured addresses
Phonet: fix netlink address dump error handling
ipv6: Add IFA_F_DADFAILED flag
net: Add DEVTYPE support for Ethernet based devices
mv643xx_eth.c: remove unused txq_set_wrr()
ucc_geth: Fix hangs after switching from full to half duplex
ucc_geth: Rearrange some code to avoid forward declarations
phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
drivers/net/phy: introduce missing kfree
drivers/net/wan: introduce missing kfree
net: force bridge module(s) to be GPL
Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
...
Fixed up trivial conflicts:
- arch/x86/include/asm/socket.h
converted to <asm-generic/socket.h> in the x86 tree. The generic
header has the same new #define's, so that works out fine.
- drivers/net/tun.c
fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
switched over to using 'tun->socket.sk' instead of the redundantly
available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
to the TUN driver") which added a new 'tun->sk' use.
Noted in 'next' by Stephen Rothwell.
Massage the err_handler upcall into an event handler upcall, pass
netdev port events to the cxgb3 ULPs and generate RDMA port events
based on LLD port events.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a cut'n paste error in the AEL2020 twinax EDC file name
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c3a8c5b6 ("cxgb3: move away from LLTX") exposed a bug in how
cxgb3 looks up the netdev_queue it stashes away in a qset during
initialization. For multiport devices, the TX queue index it uses is
offset by the first_qset index of each port. This leads to a crash
once LLTX is removed, since hard_start_xmit is called with one TX
queue lock held, while the TX reclaim timer task grabs a different
(wrong) TX queue lock when it frees skbs.
Fix this by removing the first_qset offset used to look up the TX
queue passed into t3_sge_alloc_qset() from setup_sge_qsets().
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drain the MAC TX fifos when a port goes down.
Back pressure might otherwise occur, leading to both
ports of the same adapter to hang.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the HW SMT table initialization to avoid random
mss miscomputations for offload connections.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
use request_firmware() to load the phy's EDC programmation
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pre-allocate a skb at init time to be used for control messages to the HW
if skb allocation fails.
Tolerate failures to send messages initializing some memories at the cost of
parity error detection for these memories.
Retry sending connection id release messages if both alloc_skb(GFP_ATOMIC)
and alloc_skb(GFP_KERNEL) fail.
Do not bring the interface up if messages binding queue set to port fail to
be sent.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the AEL2020 phy.
Add PCI IDs of the boards using this phy.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not call t3_link_fault() under spinlock, as it calls msleep().
Besides, only the access to pi->link_fault needs to be serialized.
Also initialize local variables before checking the link status,
link state fields might otherwise end up containing garbage.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EEH attempts to recover up 6 times.
The last attempt leaves all the ports and adapter down.hen
The driver is then unloaded, bringing the adapter down again
unconditionally. The unload will hang.
Check if the adapter is already down before trying to bring it down again.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fatal error task can be scheduled while processing an offload packet
in NAPI context when the connection handle is bogus. this can race
with the ports being brought down and the cxgb3 workqueue being flushed.
Stop napi processing before flushing the work queue.
The ULP drivers (iSCSI, iWARP) might also schedule a task on keventd_wk
while releasing a connection handle (cxgb3_offload.c::cxgb3_queue_tid_release()).
The driver however does not flush any work on keventd_wq while being unloaded.
This patch also fixes this.
Also call cancel_delayed_work_sync in place of the the deprecated
cancel_rearming_delayed_workqueue.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the existing periodic task to handle link faults.
The link fault interrupt handler is also called in work queue context,
which is wrong and might cause potential deadlocks.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use resource_size_t to declare mmio start and len variables.
Print PEX error register after EEH resumed.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start queue set reclaim timers after the queue sets have been
allocated successfully.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently ignores the local or remote link faults
raised at the mac layer. This patch fixes it.
Our mac however only advertizes link events, so wait for the
phy to stabilize the link, then enable mac link events interrupts.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under RX pressure, The HW might generate a high load of interrupts
to signal mac fifo or free lists overflow.
Disable the interrupts, and poll the relevant status bits
to maintain stats.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>