Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
This patch fixes a nasty bug that has been sitting there since the
very first versions of the driver, but is generating a panic because
we changed the number of 2K buffers for 2.6.16.
The consumer_index and producer_index are u32's that get incremented
on every buffer emptied and replenished respectively. We use
the {producer,consumer}_index mod'ed with the size of the pool to
pick out an entry in the free_map. The problem happens when the
u32 rolls over and the number of the buffers in the pool is not a
perfect divisor of 2^32. i.e. if the number of 2K buffers is 0x300,
before the consumer_index rolls over, our index to the free map =
0xffffffff mod 0x300 = 0xff. The next time a buffer is emptied, we
want the index to the free map to be 0x100, but 0x0 mod 0x300 is 0x0.
This patch assigns the mod'ed result back to the consumer and producer
indexes so that they never roll over. The second chunk of the patch
covers the unlikely case where the consumer_index has just been reset
to 0x0 and the hypervisor is not able to accept that buffer.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch changes the name of the proc file for each ibmveth adapter
from the network device name to the slot number in the virtual bus.
The proc file is created when the device is probed, so a change
in the name of the device will not be reflected in the name of the
proc file giving problems when identifying and removing the adapter.
The slot number is a property that does not change through the life
of the adapter so we use that instead.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch fixes a race that panics the kernel when opening the
device after a kdump. Without this patch there is a window where the
hypervisor can send an interrupt before all the structures for the
kdump ibmveth module are ready (because the hypervisor is not aware
that the partition crashed and that the virtual driver is reloading).
We close this window by disabling the interrupts before registering
the adapter to the hypervisor.
This patch depends on the "ibmveth: Harden driver initilisation" patch.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch adds the net poll controller function to ibmveth to support
netconsole and netdump.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch has been floating around for a while now, Santi originally
sent it in March: http://www.spinics.net/lists/netdev/msg00471.html
After a kexec the ibmveth driver will fail when trying to register
with the Hypervisor because the previous kernel has not unregistered.
So if the registration fails, we unregister and then try again.
We don't unconditionally unregister, because we don't want to disturb
the regular code path for 99% of users.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Our pseries hcall interfaces are out of control:
plpar_hcall_norets
plpar_hcall
plpar_hcall_8arg_2ret
plpar_hcall_4out
plpar_hcall_7arg_7ret
plpar_hcall_9arg_9ret
Create 3 interfaces to cover all cases:
plpar_hcall_norets: 7 arguments no returns
plpar_hcall: 6 arguments 4 returns
plpar_hcall9: 9 arguments 9 returns
There are only 2 cases in the kernel that need plpar_hcall9, hopefully
we can keep it that way.
Pass in a buffer to stash return parameters so we avoid the &dummy1,
&dummy2 madness.
Signed-off-by: Anton Blanchard <anton@samba.org>
--
Signed-off-by: Paul Mackerras <paulus@samba.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (139 commits)
[POWERPC] re-enable OProfile for iSeries, using timer interrupt
[POWERPC] support ibm,extended-*-frequency properties
[POWERPC] Extra sanity check in EEH code
[POWERPC] Dont look for class-code in pci children
[POWERPC] Fix mdelay badness on shared processor partitions
[POWERPC] disable floating point exceptions for init
[POWERPC] Unify ppc syscall tables
[POWERPC] mpic: add support for serial mode interrupts
[POWERPC] pseries: Print PCI slot location code on failure
[POWERPC] spufs: one more fix for 64k pages
[POWERPC] spufs: fail spu_create with invalid flags
[POWERPC] spufs: clear class2 interrupt status before wakeup
[POWERPC] spufs: fix Makefile for "make clean"
[POWERPC] spufs: remove stop_code from struct spu
[POWERPC] spufs: fix spu irq affinity setting
[POWERPC] spufs: further abstract priv1 register access
[POWERPC] spufs: split the Cell BE support into generic and platform dependant parts
[POWERPC] spufs: dont try to access SPE channel 1 count
[POWERPC] spufs: use kzalloc in create_spu
[POWERPC] spufs: fix initial state of wbox file
...
Manually resolved conflicts in:
drivers/net/phy/Makefile
include/asm-powerpc/spu.h
This patch provides a sysfs interface to change some properties of the
ibmveth buffer pools (size of the buffers, number of buffers per pool,
and whether a pool is active). Ethernet drivers use ethtool to provide
this type of functionality. However, the buffers in the ibmveth driver
can have an arbitrary size (not only regular, mini, and jumbo which are
the only sizes that ethtool can change), and also ibmveth can have an
arbitrary number of buffer pools
Under heavy load we have seen dropped packets which obviously kills TCP
performance. We have created several fixes that mitigate this issue,
but we definitely need a way of changing the number of buffers for an
adapter dynamically. Also, changing the size of the buffers allows
users to change the MTU to something big (bigger than a jumbo frame)
greatly improving performance on partition to partition transfers.
The patch creates directories pool1...pool4 in the device directory in
sysfs, each with files: num, size, and active (which default to the
values in the mainline version).
Comments and suggestions are welcome...
--
Santiago A. Leon
Power Linux Development
IBM Linux Technology Center
Signed-off-by: Jeff Garzik <jeff@garzik.org>
ibmveth_printk() is only used to print the driver version when the module
initializes, which means on all machines as long as it's compiled in.
If it's really only needed for debugging, boot with loglevel=8, or get
it from dmesg instead.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Also cleans up some nearby whitespace problems.
Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
At the moment ibmveth has DEBUG enabled which is rather verbose. Disable
it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch removes almost all inclusions of linux/version.h. The 3
#defines are unused in most of the touched files.
A few drivers use the simple KERNEL_VERSION(a,b,c) macro, which is
unfortunatly in linux/version.h.
There are also lots of #ifdef for long obsolete kernels, this was not
touched. In a few places, the linux/version.h include was move to where
the LINUX_VERSION_CODE was used.
quilt vi `find * -type f -name "*.[ch]"|xargs grep -El '(UTS_RELEASE|LINUX_VERSION_CODE|KERNEL_VERSION|linux/version.h)'|grep -Ev '(/(boot|coda|drm)/|~$)'`
search pattern:
/UTS_RELEASE\|LINUX_VERSION_CODE\|KERNEL_VERSION\|linux\/\(utsname\|version\).h
Signed-off-by: Olaf Hering <olh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a panic in the current tree caused by a race condition between the initial replenish cycle and the rx processing of the first packets trying to replenish the buffers.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a bug that happens when the hypervisor can't add a
buffer. The old code wrote IBM_VETH_INVALID_MAP into the free_map
array, so next time the index was used, a ibmveth_assert() caught it and
called BUG(). The patch writes the right value into the free_map array
so that the index can be reused.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch adds the lockless TX feature to the ibmveth driver. The
hypervisor has its own locking so the only change that is necessary is
to protect the statistics counters.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch removes the allocation of RX skb's buffers from a workqueue
to be called directly at RX processing time. This change was suggested
by Dave Miller when the driver was starving the RX buffers and
deadlocking under heavy traffic:
> Allocating RX SKBs via tasklet is, IMHO, the worst way to
> do it. It is no surprise that there are starvation cases.
>
> If tasklets or work queues get delayed in any way, you lose,
> and it's very easy for a card to catch up with the driver RX'ing
> packets very fast, no matter how aggressive you make the
> replenishing. By the time you detect that you need to be
> "more aggressive" it is already too late.
> The only pseudo-reliable way is to allocate at RX processing time.
>
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch changes the way the ibmveth driver handles the receive
buffers. The old code mallocs and maps all the buffers in the pools
regardless of MTU size and it also limits the number of buffer pools to
three. This patch makes the driver malloc and map the buffers necessary
to support the current MTU. It also changes the hardcoded names of the
buffer pool number, size, and elements to arrays to make it easier to
change (with the hope of making them runtime parameters in the future).
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch updates dev->trans_start and dev->last_rx so that the ibmveth
driver can be used with the ARP monitor in the bonding driver.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
There has been a need expressed for dma_addr_t to be 64 bits on PPC64.
This patch does that.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Just set the name field directly in the device_driver structure
contained in the vio_driver struct.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
A bunch of create_proc_dir_entry() calls creating directories had crept
in since the last sweep; converted to proc_mkdir().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!