linux

Author	SHA1	Message	Date
Christoph Lameter	4f10493459	slab allocators: Remove SLAB_CTOR_ATOMIC SLAB_CTOR atomic is never used which is no surprise since I cannot imagine that one would want to do something serious in a constructor or destructor. In particular given that the slab allocators run with interrupts disabled. Actions in constructors and destructors are by their nature very limited and usually do not go beyond initializing variables and list operations. (The i386 pgd ctor and dtors do take a spinlock in constructor and destructor..... I think that is the furthest we go at this point.) There is no flag passed to the destructor so removing SLAB_CTOR_ATOMIC also establishes a certain symmetry. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	50953fe9e0	slab allocators: Remove SLAB_DEBUG_INITIAL flag I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by SLAB. I think its purpose was to have a callback after an object has been freed to verify that the state is the constructor state again? The callback is performed before each freeing of an object. I would think that it is much easier to check the object state manually before the free. That also places the check near the code object manipulation of the object. Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was compiled with SLAB debugging on. If there would be code in a constructor handling SLAB_DEBUG_INITIAL then it would have to be conditional on SLAB_DEBUG otherwise it would just be dead code. But there is no such code in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real use of, difficult to understand and there are easier ways to accomplish the same effect (i.e. add debug code before kfree). There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be clear in fs inode caches. Remove the pointless checks (they would even be pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors. This is the last slab flag that SLUB did not support. Remove the check for unimplemented flags from SLUB. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:57 -07:00
Christoph Lameter	0a31bd5f2b	KMEM_CACHE(): simplify slab cache creation This patch provides a new macro KMEM_CACHE(<struct>, <flags>) to simplify slab creation. KMEM_CACHE creates a slab with the name of the struct, with the size of the struct and with the alignment of the struct. Additional slab flags may be specified if necessary. Example struct test_slab { int a,b,c; struct list_head; } __cacheline_aligned_in_smp; test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC) will create a new slab named "test_slab" of the size sizeof(struct test_slab) and aligned to the alignment of test slab. If it fails then we panic. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Christoph Lameter	5af6083990	slab allocators: Remove obsolete SLAB_MUST_HWCACHE_ALIGN This patch was recently posted to lkml and acked by Pekka. The flag SLAB_MUST_HWCACHE_ALIGN is 1. Never checked by SLAB at all. 2. A duplicate of SLAB_HWCACHE_ALIGN for SLUB 3. Fulfills the role of SLAB_HWCACHE_ALIGN for SLOB. The only remaining use is in sparc64 and ppc64 and their use there reflects some earlier role that the slab flag once may have had. If its specified then SLAB_HWCACHE_ALIGN is also specified. The flag is confusing, inconsistent and has no purpose. Remove it. Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	f9a14399ae	mm: optimize kill_bdev() Remove duplicate work in kill_bdev(). It currently invalidates and then truncates the bdev's mapping. invalidate_mapping_pages() will opportunistically remove pages from the mapping. And truncate_inode_pages() will forcefully remove all pages. The only thing truncate doesn't do is flush the bh lrus. So do that explicitly. This avoids (very unlikely) but possible invalid lookup results if the same bdev is quickly re-issued. It also will prevent extreme kernel latencies which are observed when blockdevs which have a large amount of pagecache are unmounted, by avoiding invalidate_mapping_pages() on that path. invalidate_mapping_pages() has no cond_resched (it can be called under spinlock), whereas truncate_inode_pages() has one. [akpm@linux-foundation.org: restore nrpages==0 optimisation] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
Peter Zijlstra	f98393a64c	mm: remove destroy_dirty_buffers from invalidate_bdev() Remove the destroy_dirty_buffers argument from invalidate_bdev(), it hasn't been used in 6 years (so akpm says). find * -name \.[ch] \| xargs grep -l invalidate_bdev \| while read file; do quilt add $file; sed -ie 's/invalidate_bdev($[^,]$,[^)]*)/invalidate_bdev(\1)/g' $file; done Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:55 -07:00
David Miller	3a2cba993b	Quicklist support for sparc64 I ported this to sparc64 as per the patch below, tested on UP SunBlade1500 and 24 cpu Niagara T1000. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:54 -07:00
Christoph Lameter	6225e93735	Quicklists for page table pages On x86_64 this cuts allocation overhead for page table pages down to a fraction (kernel compile / editing load. TSC based measurement of times spend in each function): no quicklist pte_alloc 1569048 4.3s(401ns/2.7us/179.7us) pmd_alloc 780988 2.1s(337ns/2.7us/86.1us) pud_alloc 780072 2.2s(424ns/2.8us/300.6us) pgd_alloc 260022 1s(920ns/4us/263.1us) quicklist: pte_alloc 452436 573.4ms(8ns/1.3us/121.1us) pmd_alloc 196204 174.5ms(7ns/889ns/46.1us) pud_alloc 195688 172.4ms(7ns/881ns/151.3us) pgd_alloc 65228 9.8ms(8ns/150ns/6.1us) pgd allocations are the most complex and there we see the most dramatic improvement (may be we can cut down the amount of pgds cached somewhat?). But even the pte allocations still see a doubling of performance. 1. Proven code from the IA64 arch. The method used here has been fine tuned for years and is NUMA aware. It is based on the knowledge that accesses to page table pages are sparse in nature. Taking a page off the freelists instead of allocating a zeroed pages allows a reduction of number of cachelines touched in addition to getting rid of the slab overhead. So performance improves. This is particularly useful if pgds contain standard mappings. We can save on the teardown and setup of such a page if we have some on the quicklists. This includes avoiding lists operations that are otherwise necessary on alloc and free to track pgds. 2. Light weight alternative to use slab to manage page size pages Slab overhead is significant and even page allocator use is pretty heavy weight. The use of a per cpu quicklist means that we touch only two cachelines for an allocation. There is no need to access the page_struct (unless arch code needs to fiddle around with it). So the fast past just means bringing in one cacheline at the beginning of the page. That same cacheline may then be used to store the page table entry. Or a second cacheline may be used if the page table entry is not in the first cacheline of the page. The current code will zero the page which means touching 32 cachelines (assuming 128 byte). We get down from 32 to 2 cachelines in the fast path. 3. x86_64 gets lightweight page table page management. This will allow x86_64 arch code to faster repopulate pgds and other page table entries. The list operations for pgds are reduced in the same way as for i386 to the point where a pgd is allocated from the page allocator and when it is freed back to the page allocator. A pgd can pass through the quicklists without having to be reinitialized. 64 Consolidation of code from multiple arches So far arches have their own implementation of quicklist management. This patch moves that feature into the core allowing an easier maintenance and consistent management of quicklists. Page table pages have the characteristics that they are typically zero or in a known state when they are freed. This is usually the exactly same state as needed after allocation. So it makes sense to build a list of freed page table pages and then consume the pages already in use first. Those pages have already been initialized correctly (thus no need to zero them) and are likely already cached in such a way that the MMU can use them most effectively. Page table pages are used in a sparse way so zeroing them on allocation is not too useful. Such an implementation already exits for ia64. Howver, that implementation did not support constructors and destructors as needed by i386 / x86_64. It also only supported a single quicklist. The implementation here has constructor and destructor support as well as the ability for an arch to specify how many quicklists are needed. Quicklists are defined by an arch defining CONFIG_QUICKLIST. If more than one quicklist is necessary then we can define NR_QUICK for additional lists. F.e. i386 needs two and thus has config NR_QUICK int default 2 If an arch has requested quicklist support then pages can be allocated from the quicklist (or from the page allocator if the quicklist is empty) via: quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>) Page table pages can be freed using: quicklist_free(<quicklist-nr>, <destructor>, <page>) Pages must have a definite state after allocation and before they are freed. If no constructor is specified then pages will be zeroed on allocation and must be zeroed before they are freed. If a constructor is used then the constructor will establish a definite page state. F.e. the i386 and x86_64 pgd constructors establish certain mappings. Constructors and destructors can also be used to track the pages. i386 and x86_64 use a list of pgds in order to be able to dynamically update standard mappings. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:54 -07:00
Christoph Lameter	643b113849	slub: enable tracking of full slabs If slab tracking is on then build a list of full slabs so that we can verify the integrity of all slabs and are also able to built list of alloc/free callers. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:54 -07:00
Christoph Lameter	b49af68ff9	Add virt_to_head_page and consolidate code in slab and slub Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:54 -07:00
Christoph Lameter	6d7779538f	mm: optimize compound_head() by avoiding a shared page flag The patch adds PageTail(page) and PageHead(page) to check if a page is the head or the tail of a compound page. This is done by masking the two bits describing the state of a compound page and then comparing them. So one comparision and a branch instead of two bit checks and two branches. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Christoph Lameter	d85f33855c	Make page->private usable in compound pages If we add a new flag so that we can distinguish between the first page and the tail pages then we can avoid to use page->private in the first page. page->private == page for the first page, so there is no real information in there. Freeing up page->private makes the use of compound pages more transparent. They become more usable like real pages. Right now we have to be careful f.e. if we are going beyond PAGE_SIZE allocations in the slab on i386 because we can then no longer use the private field. This is one of the issues that cause us not to support debugging for page size slabs in SLAB. Having page->private available for SLUB would allow more meta information in the page struct. I can probably avoid the 16 bit ints that I have in there right now. Also if page->private is available then a compound page may be equipped with buffer heads. This may free up the way for filesystems to support larger blocks than page size. We add PageTail as an alias of PageReclaim. Compound pages cannot currently be reclaimed. Because of the alias one needs to check PageCompound first. The RFC for the this approach was discussed at http://marc.info/?t=117574302800001&r=1&w=2 [nacc@us.ibm.com: fix hugetlbfs] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Christoph Lameter	614410d589	SLUB: allocate smallest object size if the user asks for 0 bytes Makes SLUB behave like SLAB in this area to avoid issues.... Throw a stack dump to alert people. At some point the behavior should be switched back. NULL is no memory as far as I can tell and if the use asked for 0 bytes then he need to get no memory. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Christoph Lameter	81819f0fc8	SLUB core This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Christoph Lameter	b5637e65ee	i386: use page allocator to allocate thread_info structure i386 uses kmalloc to allocate the threadinfo structure assuming that the allocations result in a page sized aligned allocation. That has worked so far because SLAB exempts page sized slabs from debugging and aligns them in special ways that goes beyond the restrictions imposed by KMALLOC_ARCH_MINALIGN valid for other slabs in the kmalloc array. SLUB also works fine without debugging since page sized allocations neatly align at page boundaries. However, if debugging is switched on then SLUB will extend the slab with debug information. The resulting slab is not longer of page size. It will only be aligned following the requirements imposed by KMALLOC_ARCH_MINALIGN. As a result the threadinfo structure may not be page aligned which makes i386 fail to boot with SLUB debug on. Replace the calls to kmalloc with calls into the page allocator. An alternate solution may be to create a custom slab cache where the alignment is set to PAGE_SIZE. That would allow slub debugging to be applied to the threadinfo structure. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:53 -07:00
Jan Kara	6ce745ed39	readahead: code cleanup Rename file_ra_state.prev_page to prev_index and file_ra_state.offset to prev_offset. Also update of prev_index in do_generic_mapping_read() is now moved close to the update of prev_offset. [wfg@mail.ustc.edu.cn: fix it] Signed-off-by: Jan Kara <jack@suse.cz> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Jan Kara	ec0f163722	readahead: improve heuristic detecting sequential reads Introduce ra.offset and store in it an offset where the previous read ended. This way we can detect whether reads are really sequential (and thus we should not mark the page as accessed repeatedly) or whether they are random and just happen to be in the same page (and the page should really be marked accessed again). Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Nick Piggin <nickpiggin@yahoo.com.au> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
David Rientjes	b813e931b4	smaps: add clear_refs file to clear reference Adds /proc/pid/clear_refs. When any non-zero number is written to this file, pte_mkold() and ClearPageReferenced() is called for each pte and its corresponding page, respectively, in that task's VMAs. This file is only writable by the user who owns the task. It is now possible to measure _approximately_ how much memory a task is using by clearing the reference bits with echo 1 > /proc/pid/clear_refs and checking the reference count for each VMA from the /proc/pid/smaps output at a measured time interval. For example, to observe the approximate change in memory footprint for a task, write a script that clears the references (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and extracts the size in kB. Add the sizes for each VMA together for the total referenced footprint. Moments later, repeat the process and observe the difference. For example, using an efficient Mozilla: accumulated time referenced memory ---------------- ----------------- 0 s 408 kB 1 s 408 kB 2 s 556 kB 3 s 1028 kB 4 s 872 kB 5 s 1956 kB 6 s 416 kB 7 s 1560 kB 8 s 2336 kB 9 s 1044 kB 10 s 416 kB This is a valuable tool to get an approximate measurement of the memory footprint for a task. Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> [akpm@linux-foundation.org: build fixes] [mpm@selenic.com: rename for_each_pmd] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Zachary Amsden	0013572b2a	i386: use pte_update_defer in ptep_test_and_clear_{dirty,young} If you actually clear the bit, you need to: + pte_update_defer(vma->vm_mm, addr, ptep); The reason is, when updating PTEs, the hypervisor must be notified. Using atomic operations to do this is fine for all hypervisors I am aware of. However, for hypervisors which shadow page tables, if these PTE modifications are not trapped, you need a post-modification call to fulfill the update of the shadow page table. Acked-by: Zachary Amsden <zach@vmware.com> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
David Rientjes	10a8d6ae4b	i386: add ptep_test_and_clear_{dirty,young} Add ptep_test_and_clear_{dirty,young} to i386. They advertise that they have it and there is at least one place where it needs to be called without the page table lock: to clear the accessed bit on write to /proc/pid/clear_refs. ptep_clear_flush_{dirty,young} are updated to use the new functions. The overall net effect to current users of ptep_clear_flush_{dirty,young} is that we introduce an additional branch. Cc: Hugh Dickins <hugh@veritas.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Borislav Petkov	9490991482	Add unitialized_var() macro for suppressing gcc warnings Introduce a macro for suppressing gcc from generating a warning about a probable uninitialized state of a variable. Example: - spinlock_t ptl; + spinlock_t uninitialized_var(ptl); Not a happy solution, but those warnings are obnoxious. - Using the usual pointlessly-set-it-to-zero approach wastes several bytes of text. - Using a macro means we can (hopefully) do something else if gcc changes cause the `x = x' hack to stop working - Using a macro means that people who are worried about hiding true bugs can easily turn it off. Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Andy Whitcroft	14e0729841	add pfn_valid_within helper for sub-MAX_ORDER hole detection Generally we work under the assumption that memory the mem_map array is contigious and valid out to MAX_ORDER_NR_PAGES block of pages, ie. that if we have validated any page within this MAX_ORDER_NR_PAGES block we need not check any other. This is not true when CONFIG_HOLES_IN_ZONE is set and we must check each and every reference we make from a pfn. Add a pfn_valid_within() helper which should be used when scanning pages within a MAX_ORDER_NR_PAGES block when we have already checked the validility of the block normally with pfn_valid(). This can then be optimised away when we do not have holes within a MAX_ORDER_NR_PAGES block of pages. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Bob Picco <bob.picco@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Adrian Bunk	ac267728f1	mm/slab.c: proper prototypes Add proper prototypes in include/linux/slab.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:52 -07:00
Heiko Carstens	411f0f3edc	Introduce CONFIG_HAS_DMA Architectures that don't support DMA can say so by adding a config NO_DMA to their Kconfig file. This will prevent compilation of some dma specific driver code. Also dma-mapping-broken.h isn't needed anymore on at least s390. This avoids compilation and linking of otherwise dead/broken code. Other architectures that include dma-mapping-broken.h are arm26, h8300, m68k, m68knommu and v850. If these could be converted as well we could get rid of the header file. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> "John W. Linville" <linville@tuxdriver.com> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: <James.Bottomley@SteelEye.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <geert@linux-m68k.org> Cc: <zippel@linux-m68k.org> Cc: <spyro@f2s.com> Cc: <uclinux-v850@lsi.nec.co.jp> Cc: <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Nick Piggin	6fe6900e1e	mm: make read_cache_page synchronous Ensure pages are uptodate after returning from read_cache_page, which allows us to cut out most of the filesystem-internal PageUptodate calls. I didn't have a great look down the call chains, but this appears to fixes 7 possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in block2mtd. All depending on whether the filler is async and/or can return with a !uptodate page. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Nick Piggin	5f22df00a0	mm: remove gcc workaround Minimum gcc version is 3.2 now. However, with likely profiling, even modern gcc versions cannot always eliminate the call. Replace the placeholder functions with the more conventional empty static inlines, which should be optimal for everyone. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Adrian Bunk	d2ba27e800	proper prototype for hugetlb_get_unmapped_area() Add a proper prototype for hugetlb_get_unmapped_area() in include/linux/hugetlb.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: William Irwin <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
Jeremy Fitzhardinge	aee16b3cee	Add apply_to_page_range() which applies a function to a pte range Add a new mm function apply_to_page_range() which applies a given function to every pte in a given virtual address range in a given mm structure. This is a generic alternative to cut-and-pasting the Linux idiomatic pagetable walking code in every place that a sequence of PTEs must be accessed. Although this interface is intended to be useful in a wide range of situations, it is currently used specifically by several Xen subsystems, for example: to ensure that pagetables have been allocated for a virtual address range, and to construct batched special pagetable update requests to map I/O memory (in ioremap()). [akpm@linux-foundation.org: fix warning, unpleasantly] Signed-off-by: Ian Pratt <ian.pratt@xensource.com> Signed-off-by: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Matt Mackall <mpm@waste.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:51 -07:00
David Gibson	abb4a23907	serial: define FIXED_PORT flag for serial_core At present, the serial core always allows setserial in userspace to change the port address, irq and base clock of any serial port. That makes sense for legacy ISA ports, but not for (say) embedded ns16550 compatible serial ports at peculiar addresses. In these cases, the kernel code configuring the ports must know exactly where they are, and their clocking arrangements (which can be unusual on embedded boards). It doesn't make sense for userspace to change these settings. Therefore, this patch defines a UPF_FIXED_PORT flag for the uart_port structure. If this flag is set when the serial port is configured, any attempts to alter the port's type, io address, irq or base clock with setserial are ignored. In addition this patch uses the new flag for on-chip serial ports probed in arch/powerpc/kernel/legacy_serial.c, and for other hard-wired serial ports probed by drivers/serial/of_serial.c. Signed-off-by: David Gibson <dwg@au1.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:50 -07:00
Thomas Koeller	bd71c182d5	RM9000 serial driver Add support for the integrated serial ports of the MIPS RM9122 processor and its relatives. The patch also does some whitespace cleanup. [akpm@linux-foundation.org: cleanups] Signed-off-by: Thomas Koeller <thomas.koeller@baslerweb.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:50 -07:00
Marc St-Jean	beab697ab4	serial driver PMC MSP71xx Serial driver patch for the PMC-Sierra MSP71xx devices. There are three different fixes: 1 Fix for DesignWare APB THRE errata: In brief, this is a non-standard 16550 in that the THRE interrupt will not re-assert itself simply by disabling and re-enabling the THRI bit in the IER, it is only re-enabled if a character is actually sent out. It appears that the "8250-uart-backup-timer.patch" in the "mm" tree also fixes it so we have dropped our initial workaround. This patch now needs to be applied on top of that "mm" patch. 2 Fix for Busy Detect on LCR write: The DesignWare APB UART has a feature which causes a new Busy Detect interrupt to be generated if it's busy when the LCR is written. This fix saves the value of the LCR and rewrites it after clearing the interrupt. 3 Workaround for interrupt/data concurrency issue: The SoC needs to ensure that writes that can cause interrupts to be cleared reach the UART before returning from the ISR. This fix reads a non-destructive register on the UART so the read transaction completion ensures the previously queued write transaction has also completed. Signed-off-by: Marc St-Jean <Marc_St-Jean@pmc-sierra.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:50 -07:00
Bernhard Walle	6179b5562d	add new_id to PCMCIA drivers PCI drivers have the new_id file in sysfs which allows new IDs to be added at runtime. The advantage is to avoid re-compilation of a driver that works for a new device, but it's ID table doesn't contain the new device. This mechanism is only meant for testing, after the driver has been tested successfully, the ID should be added in source code so that new revisions of the kernel automatically detect the device. The implementation follows the PCI implementation. The interface is documented in Documentation/pcmcia/driver.txt. Computations should be done in userspace, so the sysfs string contains the raw structure members for matching. Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:50 -07:00
Pekka Enberg	fd76bab2fa	slab: introduce krealloc This introduce krealloc() that reallocates memory while keeping the contents unchanged. The allocator avoids reallocation if the new size fits the currently used cache. I also added a simple non-optimized version for mm/slob.c for compatibility. [akpm@linux-foundation.org: fix warnings] Acked-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Acked-by: Matt Mackall <mpm@selenic.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 12:12:50 -07:00
Linus Torvalds	e3ebadd95c	Revert "[PATCH] x86: __pa and __pa_symbol address space separation" This was broken. It adds complexity, for no good reason. Rather than separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(), and preferably __pa() too - and just use "virt_to_phys()" instead, which is more readable and has nicer semantics. However, right now, just undo the separation, and make __pa_symbol() be the exact same as __pa(). That fixes the bugs this patch introduced, and we can do the fairly obvious cleanups later. Do the new __phys_addr() function (which is now the actual workhorse for the unified __pa()/__pa_symbol()) as a real external function, that way all the potential issues with compile/link-time optimizations of constant symbol addresses go away, and we can also, if we choose to, add more sanity-checking of the argument. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-07 08:44:24 -07:00
Domen Puncer	2e1ee1f766	[POWERPC] mpc52xx suspend to deep-sleep Implement deep-sleep on MPC52xx. SDRAM is put into self-refresh with help of SRAM code (alternatives would be code in FLASH, I-cache). Interrupt code must also not be in SDRAM, so put it in I-cache. MPC52xx core is static, so contents will remain intact even with clocks turned off. Signed-off-by: Domen Puncer <domen.puncer@telargo.com> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-07 20:31:15 +10:00
Sylvain Munaut	de41189bf6	[POWERPC] Export of_device_get_modalias Apparently other parts of the kernel need to know the modalias internally (like the sysfs code in macintosh driver). To avoid consistency issues, we export this code and use it everywhere it's needed rather than repeat it ... Signed-off-by: Sylvain Munaut <tnt@246tNt.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-07 20:31:15 +10:00
David Gibson	0aeafb0cef	[POWERPC] Kill off the PTE_FMT macro 32-bit powerpc uses a PTE_FMT macro to handle printk() formatting of PTE entries (which can vary in type and size). Apparently there was a good reason for it once, but with current compilers it's simpler just to workaround the variation with a cast in the printk() itself (there's only one use). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-07 20:31:14 +10:00
Johannes Berg	543b9fd352	[POWERPC] powermac: Suspend to disk on G5 Powermac G5 suspend to disk implementation. The code is platform agnostic but only tested on powermac, no other 64-bit powerpc machines. Because nvidiafb still breaks suspend I have marked it EXPERIMENTAL on powermac and because I can't test it and some lowlevel code will need changes it is BROKEN on all other 64-bit platforms. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-07 20:31:14 +10:00
Johannes Berg	7e11580b36	[POWERPC] DART iommu suspend This implements save and restore hooks for IOMMUs and implements it the DART iommu. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Olof Johansson <olof@lixom.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-07 20:31:14 +10:00
Johannes Berg	d9333afd6a	[POWERPC] powermac: Support G5 CPU hotplug This allows "hotplugging" of CPUs on G5 machines. CPUs that are disabled are put into an idle loop with the decrementer frequency set to minimum. To wake them up again we kick them just like when bringing them up. To stop those CPUs from messing with any global state we stop them from entering the timer interrupt. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-07 20:31:13 +10:00
Johannes Berg	3669e93048	[POWERPC] MPIC sys_device & suspend/resume This adds mpic to the system devices and implements suspend and resume for them. This is necessary to get interrupts for modules back to where they were before a suspend to disk. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-07 20:31:13 +10:00
Ivo van Doorn	cf4328cd94	[NET]: rfkill: add support for input key to control wireless radio The RF kill patch that provides infrastructure for implementing switches controlling radio states on various network and other cards. [dtor@insightbb.com: address review comments] [akpm@linux-foundation.org: cleanups, build fixes] Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-07 00:34:20 -07:00
David S. Miller	861fe90656	[SPARC64]: SUN4U PCI-E controller support. Some minor refactoring in the generic code was necessary for this: 1) This controller requires 8-byte access to the interrupt map and clear register. They are 64-bits on all the other SBUS and PCI controllers anyways, so this was easy to cure. 2) The IMAP register has a different layout and some bits that we need to preserve, so use a read/modify/write when making changes to the IMAP register in generic code. 3) Flushing the entire IOMMU TLB is best done with a single write to a register on this PCI controller, add a iommu->iommu_flushinv for this. Still lacks MSI support, that will come later. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-06 22:44:06 -07:00
Roland Dreier	ed23a72778	IB: Return "maybe missed event" hint from ib_req_notify_cq() The semantics defined by the InfiniBand specification say that completion events are only generated when a completions is added to a completion queue (CQ) after completion notification is requested. In other words, this means that the following race is possible: while (CQ is not empty) ib_poll_cq(CQ); // new completion is added after while loop is exited ib_req_notify_cq(CQ); // no event is generated for the existing completion To close this race, the IB spec recommends doing another poll of the CQ after requesting notification. However, it is not always possible to arrange code this way (for example, we have found that NAPI for IPoIB cannot poll after requesting notification). Also, some hardware (eg Mellanox HCAs) actually will generate an event for completions added before the call to ib_req_notify_cq() -- which is allowed by the spec, since there's no way for any upper-layer consumer to know exactly when a completion was really added -- so the extra poll of the CQ is just a waste. Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for ib_req_notify_cq() so that it can return a hint about whether the a completion may have been added before the request for notification. The return value of ib_req_notify_cq() is extended so: < 0 means an error occurred while requesting notification == 0 means notification was requested successfully, and if IB_CQ_REPORT_MISSED_EVENTS was passed in, then no events were missed and it is safe to wait for another event. > 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was passed in. It means that the consumer must poll the CQ again to make sure it is empty to avoid the race described above. We add a flag to enable this behavior rather than turning it on unconditionally, because checking for missed events may incur significant overhead for some low-level drivers, and consumers that don't care about the results of this test shouldn't be forced to pay for the test. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-06 21:18:11 -07:00
Michael S. Tsirkin	f4fd0b224d	IB: Add CQ comp_vector support Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-06 21:18:11 -07:00
Ryusuke Sakato	39374aadcd	sh: R7785RP board updates. Some fixups for the R7785RP board. Gets iVDR working. Signed-off-by: Ryusuke Sakato <sakato.ryusuke@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:57 +00:00
Paul Mundt	3a2e117e22	sh: Add die chain notifiers. Add the atomic die chains in, kprobes needs these. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:57 +00:00
Ryusuke Sakato	6865f0ea6a	sh: Solution Engine 7722 board support. This adds more full-featured support for the SH7722 Solution Engine. Previously this was using the generic board, and lacked most of the peripheral support. Signed-off-by: Ryusuke Sakato <sakato.ryusuke@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:57 +00:00
Paul Mundt	6b817c0348	sh: Fix r7780rp build. With the addition of the R7780MP and R7785RP, the R7780RP build ended up breaking. Trivial compile fix. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:57 +00:00
Paul Mundt	4d5ade5b29	sh: kdump support. This adds support for kexec based crash dumps. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:56 +00:00
Paul Mundt	db62e5bd29	sh: Move clock reporting to its own proc entry. Previously this was done in cpuinfo, but with the number of clocks growing, it makes more sense to place this in a different proc entry. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:56 +00:00
Nobuhiro Iwamatsu	2a8ff4596c	sh: Solution Engine SH7705 board and CPU updates. This fixes up SH7705 CPU support and the SE7705 board for some of the recent changes. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.zh@hitachi.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:56 +00:00
dmitry pervushin	1929cb340b	sh: SH7722 clock framework support. This adds support for the SH7722 (MobileR) to the clock framework. Signed-off-by: dmitry pervushin <dimka@nomadgs.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:56 +00:00
Paul Mundt	dd12666278	sh: Obey CONFIG_HZ for HZ definition. This wasn't being set before, so now it's set for when it makes sense. The shwdt case still requires HZ to be fixed at 1000 for the WOVF period, so this is still preserved. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:56 +00:00
SUGIOKA Toshinobu	760bcb1dee	sh: Fix fstatat64() syscall. Signed-off-by: SUGIOKA Toshinobu <sugioka@itonet.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:55 +00:00
Nobuhiro Iwamatsu	b75762302e	sh: SH7780 Solution Engine board support. This adds support for the SH7780-based Solution Engine reference board. Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.zh@hitachi.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:55 +00:00
Paul Mundt	0264f16039	sh: Tidy up L-BOX area5 addresses. L-BOX can use the normal PA_AREA5_IO, there's no reason for it to reproduce it. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:11:55 +00:00
Paul Mundt	01066625e9	sh: bootmem tidying for discontig/sparsemem preparation. This reworks some of the node 0 bootmem initialization in preparation for discontigmem and sparsemem support. ARCH_POPULATES_NODE_MAP is switched to as a result of this. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:10:54 +00:00
Nobuhiro Iwamatsu	9465a54fa4	sh: MS7712SE01 board support. Support the SH7712 (SH3-DSP) Solution Engine reference board. Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:10:54 +00:00
Nobuhiro Iwamatsu	c86c5a9104	sh: L-BOX RE2 support. This adds support for the L-BOX RE2 router. http://www.nttcom.co.jp/l-box/ L-BOX RE2 is a SH7751R-based router. It has CF, Cardbus, serial, and LAN x2. This is one of the very few SH boards that a general person can obtain now. The L-BOX shipped with a 2.4.28 kernel, this is a rewritten patch adding it to current git. Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:10:54 +00:00
Paul Mundt	32351a28a7	sh: Add SH7785 Highlander board support (R7785RP). This adds preliminary support for the SH7785-based Highlander board. Some of the Highlander support code is reordered so that most of it can be reused directly. This also plugs in missing SH7785 checks in the places that need it, as this is the first board to support the CPU. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:10:53 +00:00
Paul Mundt	be782df54c	sh: NR_IRQS consolidation. Each board sets the total number of IRQs that it's interested in via the machvec. Previously we cared about the off vs on-chip IRQ range, but any code relying on that is long dead. Set NR_IRQS to something sensible given the vector range, and allow boards to cap it if they really care. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:10:53 +00:00
Paul Mundt	fa69151173	sh: generic BUG() support. Wire up GENERIC_BUG for SH. This moves off of the special bug frame and on to the generic struct bug_entry. Roughly the same semantics are retained, and we can kill off some of the verbose BUG() reporting code. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:10:53 +00:00
Paul Mundt	fa5da2f7bd	sh: Bring kgdb back from the dead. This code has suffered quite a bit of bitrot, do some basic tidying to get it to a reasonably functional state again. This gets the basic support and the console working again. Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-05-07 02:10:51 +00:00
Marc Eshel	85f3f1b3f7	lockd: pass cookie in nlmsvc_testlock Change NLM internal interface to pass more information for test lock; we need this to make sure the cookie information is pushed down to the place where we do request deferral, which is handled for testlock by the following patch. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:50 -04:00
Marc Eshel	2b36f412ab	lockd: save lock state on deferral We need to keep some state for a pending asynchronous lock request, so this patch adds that state to struct nlm_block. This also adds a function which defers the request, by calling rqstp->rq_chandle.defer and storing the resulting deferred request in a nlm_block structure which we insert into lockd's global block list. That new function isn't called yet, so it's dead code until a later patch. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:50 -04:00
Marc Eshel	2beb6614f5	locks: add fl_grant callback for asynchronous lock return Acquiring a lock on a cluster filesystem may require communication with remote hosts, and to avoid blocking lockd or nfsd threads during such communication, we allow the results to be returned asynchronously. When a ->lock() call needs to block, the file system will return -EINPROGRESS, and then later return the results with a call to the routine in the fl_grant field of the lock_manager_operations struct. This differs from the case when ->lock returns -EAGAIN to a blocking lock request; in that case, the filesystem calls fl_notify when the lock is granted, and the caller retries the original lock. So while fl_notify is merely a hint to the caller that it should retry, fl_grant actually communicates the final result of the lock operation (with the lock already acquired in the succesful case). Therefore fl_grant takes a lock, a status and, for the test lock case, a conflicting lock. We also allow fl_grant to return an error to the filesystem, to handle the case where the fl_grant requests arrives after the lock manager has already given up waiting for it. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>	2007-05-06 20:38:49 -04:00
Marc Eshel	9b9d2ab415	locks: add lock cancel command Lock managers need to be able to cancel pending lock requests. In the case where the exported filesystem manages its own locks, it's not sufficient just to call posix_unblock_lock(); we need to let the filesystem know what's happening too. We do this by adding a new fcntl lock command: FL_CANCELLK. Some day this might also be made available to userspace applications that could benefit from an asynchronous locking api. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 20:38:28 -04:00
Marc Eshel	150b393456	locks: allow {vfs,posix}_lock_file to return conflicting lock The nfsv4 protocol's lock operation, in the case of a conflict, returns information about the conflicting lock. It's unclear how clients can use this, so for now we're not going so far as to add a filesystem method that can return a conflicting lock, but we may as well return something in the local case when it's easy to. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 19:23:24 -04:00
Marc Eshel	7723ec9777	locks: factor out generic/filesystem switch from setlock code Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for all the setlk code. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 18:08:49 -04:00
J. Bruce Fields	3ee17abd14	locks: factor out generic/filesystem switch from test_lock Factor out the code that switches between generic and filesystem-specific lock methods; eventually we want to call this from lock managers (lockd and nfsd) too; currently they only call the generic methods. This patch does that for test_lock. Note that this hasn't been necessary until recently, because the few filesystems that define ->lock() (nfs, cifs...) aren't exportable via NFS. However GFS (and, in the future, other cluster filesystems) need to implement their own locking to get cluster-coherent locking, and also want to be able to export locking to NFS (lockd and NFSv4). So we accomplish this by factoring out code such as this and exporting it for the use of lockd and nfsd. Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 18:06:44 -04:00
Marc Eshel	9d6a8c5c21	locks: give posix_test_lock same interface as ->lock posix_test_lock() and ->lock() do the same job but have gratuitously different interfaces. Modify posix_test_lock() so the two agree, simplifying some code in the process. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>	2007-05-06 17:39:00 -04:00
Linus Torvalds	15700770ef	Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild: (38 commits) kconfig: fix mconf segmentation fault kbuild: enable use of code from a different dir kconfig: error out if recursive dependencies are found kbuild: scripts/basic/fixdep segfault on pathological string-o-death kconfig: correct minor typo in Kconfig warning message. kconfig: fix path to modules.txt in Kconfig help usr/Kconfig: fix typo kernel-doc: alphabetically-sorted entries in index.html of 'htmldocs' kbuild: be more explicit on missing .config file kbuild: clarify the creation of the LOCALVERSION_AUTO string. kbuild: propagate errors from find in scripts/gen_initramfs_list.sh kconfig: refer to qt3 if we cannot find qt libraries kbuild: handle compressed cpio initramfs-es kbuild: ignore section mismatch warning for references from .paravirtprobe to .init.text kbuild: remove stale comment in modpost.c kbuild/mkuboot.sh: allow spaces in CROSS_COMPILE kbuild: fix make mrproper for Documentation/DocBook/man kbuild: remove kconfig binaries during make mrproper kconfig/menuconfig: do not hardcode '.config' kbuild: override build timestamp & version ...	2007-05-06 13:21:57 -07:00
Linus Torvalds	6de410c2b0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (66 commits) KVM: Remove unused 'instruction_length' KVM: Don't require explicit indication of completion of mmio or pio KVM: Remove extraneous guest entry on mmio read KVM: SVM: Only save/restore MSRs when needed KVM: fix an if() condition KVM: VMX: Add lazy FPU support for VT KVM: VMX: Properly shadow the CR0 register in the vcpu struct KVM: Don't complain about cpu erratum AA15 KVM: Lazy FPU support for SVM KVM: Allow passing 64-bit values to the emulated read/write API KVM: Per-vcpu statistics KVM: VMX: Avoid unnecessary vcpu_load()/vcpu_put() cycles KVM: MMU: Avoid heavy ASSERT at non debug mode. KVM: VMX: Only save/restore MSR_K6_STAR if necessary KVM: Fold drivers/kvm/kvm_vmx.h into drivers/kvm/vmx.c KVM: VMX: Don't switch 64-bit msrs for 32-bit guests KVM: VMX: Reduce unnecessary saving of host msrs KVM: Handle guest page faults when emulating mmio KVM: SVM: Report hardware exit reason to userspace instead of dmesg KVM: Retry sleeping allocation if atomic allocation fails ...	2007-05-06 13:21:18 -07:00
Linus Torvalds	c6799ade4a	Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (82 commits) [ARM] Add comments marking in-use ptrace numbers [ARM] Move syscall saving out of the way of utrace [ARM] 4360/1: S3C24XX: regs-udc.h remove unused macro [ARM] 4358/1: S3C24XX: mach-qt2410.c: remove linux/mmc/protocol.h header [ARM] mm 10: allow memory type to be specified with ioremap [ARM] mm 9: add additional device memory types [ARM] mm 8: define mem_types table L1 bit 4 to be for ARMv6 [ARM] iop: add missing parens in macro [ARM] mm 7: remove duplicated __ioremap() prototypes ARM: OMAP: fix OMAP1 mpuio suspend/resume oops ARM: OMAP: MPUIO wake updates ARM: OMAP: speed up gpio irq handling ARM: OMAP: plat-omap changes for 2430 SDP ARM: OMAP: gpio object shrinkage, cleanup ARM: OMAP: /sys/kernel/debug/omap_gpio ARM: OMAP: Implement workaround for GPIO wakeup bug in OMAP2420 silicon ARM: OMAP: Enable 24xx GPIO autoidling [ARM] 4318/2: DSM-G600 Board Support [ARM] 4227/1: minor head.S fixups [ARM] 4328/1: Move i.MX UART regs to driver ...	2007-05-06 13:20:10 -07:00
Russell King	5cd4715515	Merge branch 'ixp4xx' into devel Conflicts: include/asm-arm/arch-ixp4xx/io.h	2007-05-06 20:58:29 +01:00
Russell King	6f95416ebe	Merge branches 'arm-mm', 'at91', 'clkevts', 'imx', 'iop', 'misc', 'netx', 'ns9xxx', 'omap', 'pxa', 'rpc', 's3c' and 'sa1100' into devel	2007-05-06 20:57:51 +01:00
Christoph Hellwig	d7a54e30d3	[SCSI] sas_scsi_host: Convert to use the kthread API Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>	2007-05-06 09:33:17 -05:00
Russell King	1b11652286	[ARM] Add comments marking in-use ptrace numbers Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-06 14:49:56 +01:00
Russell King	5ba6d3febd	[ARM] Move syscall saving out of the way of utrace utrace removes the ptrace_message field in task_struct. Move our use of this field into a new member in thread_info called "syscall" Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-06 13:56:26 +01:00
Linus Torvalds	ea62ccd00f	Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6 * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (231 commits) [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall [PATCH] i386: type may be unused [PATCH] i386: Some additional chipset register values validation. [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. [PATCH] x86-64: Don't exclude asm-offsets.c in Documentation/dontdiff [PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu [PATCH] i386: white space fixes in i387.h [PATCH] i386: Drop noisy e820 debugging printks [PATCH] x86-64: Fix allnoconfig error in genapic_flat.c [PATCH] x86-64: Shut up warnings for vfat compat ioctls on other file systems [PATCH] x86-64: Share identical video.S between i386 and x86-64 [PATCH] x86-64: Remove CONFIG_REORDER [PATCH] x86-64: Print type and size correctly for unknown compat ioctls [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) [PATCH] i386: Little cleanups in smpboot.c [PATCH] x86-64: Don't enable NUMA for a single node in K8 NUMA scanning [PATCH] x86: Use RDTSCP for synchronous get_cycles if possible [PATCH] i386: Add X86_FEATURE_RDTSCP [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 [PATCH] i386: Implement alternative_io for i386 ... Fix up trivial conflict in include/linux/highmem.h manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-05 14:55:20 -07:00
Ralf Baechle	989485c190	Fix nfsroot build CC fs/nfs/nfsroot.o fs/nfs/nfsroot.c:131: error: tokens causes a section type conflict make[2]: *** [fs/nfs/nfsroot.o] Error 1 This is due to mixing const and non-const content in the same section which halfway recent gccs absolutely hate. Fixed by dropping the const. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-05 14:15:32 -07:00
Linus Torvalds	68762f3d8e	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [TG3]: Add TG3_FLAG_SUPPORT_MSI flag. [TG3]: Eliminate the TG3_FLAG_5701_REG_WRITE_BUG flag. [TG3]: Eliminate the TG3_FLAG_GOT_SERDES_FLOWCTL flag. [TG3]: Remove reset during MAC address changes. [TG3]: WoL fixes. [TG3]: Clear GPIO mask before storing. [TG3]: Improve NVRAM sizing. [TG3]: Fix TSO bugs. [MAC80211]: Add maintainers entry for mac80211. [MAC80211]: Add debugfs attributes. [MAC80211]: Add mac80211 wireless stack. [MAC80211]: Add generic include/linux/ieee80211.h [NETLINK]: Remove references to process ID [AF_IUCV]: Compile fix - adopt to skbuff changes.	2007-05-05 14:13:36 -07:00
Linus Torvalds	4f7a307dc6	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (87 commits) [SCSI] fusion: fix domain validation loops [SCSI] qla2xxx: fix regression on sparc64 [SCSI] modalias for scsi devices [SCSI] sg: cap reserved_size values at max_sectors [SCSI] BusLogic: stop using check_region [SCSI] tgt: fix rdma transfer bugs [SCSI] aacraid: fix aacraid not finding device [SCSI] aacraid: Correct SMC products in aacraid.txt [SCSI] scsi_error.c: Add EH Start Unit retry [SCSI] aacraid: [Fastboot] Panics for AACRAID driver during 'insmod' for kexec test. [SCSI] ipr: Driver version to 2.3.2 [SCSI] ipr: Faster sg list fetch [SCSI] ipr: Return better qc_issue errors [SCSI] ipr: Disrupt device error [SCSI] ipr: Improve async error logging level control [SCSI] ipr: PCI unblock config access fix [SCSI] ipr: Fix for oops following SATA request sense [SCSI] ipr: Log error for SAS dual path switch [SCSI] ipr: Enable logging of debug error data for all devices [SCSI] ipr: Add new PCI-E IDs to device table ...	2007-05-05 13:30:44 -07:00
Linus Torvalds	fabb5c4e4a	Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/jejb/voyager-2.6: [VOYAGER] add smp alternatives [VOYAGER] Use modern techniques to setup and teardown low identiy mappings. [VOYAGER] Convert the monitor thread to use the kthread API [VOYAGER] clockevents driver: bring voyager in to line [VOYAGER] clockevents: correct boot cpu is zero assumption [VOYAGER] add smp_call_function_single	2007-05-05 13:30:23 -07:00
Arnaud Patard	d0fdb5a58e	[ARM] 4360/1: S3C24XX: regs-udc.h remove unused macro The S3C2410_UDC_SETIX() macro is not used and won't be used by the udc driver, so delete it. Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-05 21:09:42 +01:00
Sergei Shtylyov	e93df705af	sl82c105: rework PIO support (take 2) Get rid of the 'pio_speed' member of 'ide_drive_t' that was only used by this driver by storing the PIO mode timings in the 'drive_data' instead -- this allows us to greatly simplify the process of "reloading" of the chip's timing register and do it right in sl82c150_dma_off_quietly() and to get rid of two extra arguments to config_for_pio() -- which got renamed to sl82c105_tune_pio() and now returns a PIO mode selected, with ide_config_drive_speed() call moved into the tuneproc() method, now called sl82c105_tune_drive() with the code to set drive's 'io_32bit' and 'unmask' flags in its turn moved to its proper place in the init_hwif() method. Also, while at it, rename get_timing_sl82c105() into get_pio_timings() and get rid of the code in it clamping cycle counts to 32 which was both incorrect and never executed anyway... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-05-05 22:03:49 +02:00
Russell King	3603ab2b62	[ARM] mm 10: allow memory type to be specified with ioremap __ioremap() took a set of page table flags (specifically the cacheable and bufferable bits) to control the mapping type. However, with the advent of ARMv6, this is far too limited. Replace the page table flags with a memory type index, so that the desired attributes can be selected from the mem_type table. Finally, to prevent silent miscompilation due to the differing arguments, rename the __ioremap() and __ioremap_pfn() functions. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-05 20:59:27 +01:00
Russell King	0af92befeb	[ARM] mm 9: add additional device memory types Add cached device type for ioremap_cached(). Group all device memory types together, and ensure that they all have a "MT_DEVICE" prefix. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-05 20:28:16 +01:00
Jiri Benc	f0706e828e	[MAC80211]: Add mac80211 wireless stack. Add mac80211, the IEEE 802.11 software MAC layer. Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2007-05-05 11:45:53 -07:00
Jiri Benc	a9de8ce094	[MAC80211]: Add generic include/linux/ieee80211.h Add generic IEEE 802.11 definitions. Signed-off-by: Jiri Benc <jbenc@suse.cz> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-05 11:43:04 -07:00
Herbert Xu	cf130cb102	[NETLINK]: Remove references to process ID People treating the *_pid fields in netlink as a process ID has caused endless confusion over the years. The fact that our own netlink.h does this only adds to the confusion. So here is a patch to change the comments to refer to it as the port ID which hopefully will make it clear what the purpose of the fields really is. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-05 11:42:03 -07:00
Russell King	ad902cb9e2	[ARM] iop: add missing parens in macro Fix: drivers/serial/8250.c:1837: warning: suggest parentheses around arithmetic in operand of \| due to a macro argument being used without required parenthesis. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-05 11:59:13 +01:00
Russell King	0058ca32c3	[ARM] mm 7: remove duplicated __ioremap() prototypes Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-05 11:57:39 +01:00
Michael-Luke Jones	28bd3a0dcc	[ARM] 4318/2: DSM-G600 Board Support This patch adds support for the D-Link DSM-G600 Rev A. This is an ARM XScale IXP4xx system relatively similar to the NSLU2 and NAS-100D already supported by mainline. An important difference is Gigabit Ethernet support using the Via Velocity chipset. This patch is the combined work of Michael Westerhof and Alessandro Zummo, with contributions from Michael-Luke Jones. This version addresses review comments from rmk and Deepak Saxena. Signed-off-by: Michael-Luke Jones <mlj28@cam.ac.uk> Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Michael Westerhof <mwester@dls.net> Signed-off-by: Deepak Saxena <dsaxena@plexity.net> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-05 10:06:49 +01:00
Linus Torvalds	62ea6d8021	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc: (46 commits) mmc-omap: Clean up omap set_ios and make MMC_POWER_ON work mmc-omap: Fix omap to use MMC_POWER_ON mmc-omap: add missing '\n' mmc: make tifm_sd_set_dma_data() static mmc: remove old card states mmc: support unsafe resume of cards mmc: separate out reading EXT_CSD mmc: break apart switch function MMC: Fix handling of low-voltage cards MMC: Consolidate voltage definitions mmc: add bus handler wbsd: check for data opcode earlier mmc: Separate out protocol ops mmc: Move core functions to subdir mmc: deprecate mmc bus topology mmc: remove card upon suspend mmc: allow suspended block driver to be removed mmc: Flush pending detects on host removal mmc: Move host and card drivers to subdirs mmc: Move queue functions to mmc_block ...	2007-05-04 21:44:34 -07:00
Linus Torvalds	4d4700707c	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: (28 commits) NFS: Fix a compile glitch on 64-bit systems NFS: Clean up nfs_create_request comments spkm3: initialize hash spkm3: remove bad kfree, unnecessary export spkm3: fix spkm3's use of hmac NFS4: invalidate cached acl on setacl NFS: Fix directory caching problem - with test case and patch. NFS: Set meaningful value for fattr->time_start in readdirplus results. NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. SUNRPC: RPC client should retry with different versions of rpcbind SUNRPC: remove old portmapper NFS: switch NFSROOT to use new rpcbind client SUNRPC: switch the RPC server to use the new rpcbind registration API SUNRPC: switch socket-based RPC transports to use rpcbind SUNRPC: introduce rpcbind: replacement for in-kernel portmapper SUNRPC: Eliminate side effects from rpc_malloc SUNRPC: RPC buffer size estimates are too large NLM: Shrink the maximum request size of NLM4 requests NFS: Use pgoff_t in structures and functions that pass page cache offsets NFS: Clean up nfs_sync_mapping_wait() ...	2007-05-04 19:55:11 -07:00
Linus Torvalds	7e20ef030d	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (49 commits) [SCTP]: Set assoc_id correctly during INIT collision. [SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv() [SCTP]: Fix the SO_REUSEADDR handling to be similar to TCP. [SCTP]: Verify all destination ports in sctp_connectx. [XFRM] SPD info TLV aggregation [XFRM] SAD info TLV aggregationx [AF_RXRPC]: Sort out MTU handling. [AF_IUCV/IUCV] : Add missing section annotations [AF_IUCV]: Implementation of a skb backlog queue [NETLINK]: Remove bogus BUG_ON [IPV6]: Some cleanups in include/net/ipv6.h [TCP]: zero out rx_opt in tcp_disconnect() [BNX2]: Fix TSO problem with small MSS. [NET]: Rework dev_base via list_head (v3) [TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start [BNX2]: Update version and reldate. [BNX2]: Print bus information for PCIE devices. [BNX2]: Add 1-shot MSI handler for 5709. [BNX2]: Restructure PHY event handling. [BNX2]: Add indirect spinlock. ...	2007-05-04 19:36:58 -07:00
Linus Torvalds	a3d52136ee	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/dtor/input: (65 commits) Input: gpio_keys - add support for switches (EV_SW) Input: cobalt_btns - convert to use polldev library Input: add skeleton for simple polled devices Input: update some documentation Input: wistron - fix typo in keymap for Acer TM610 Input: add input_set_capability() helper Input: i8042 - add Fujitsu touchscreen/touchpad PNP IDs Input: i8042 - add Panasonic CF-29 to nomux list Input: lifebook - split into 2 devices Input: lifebook - add signature of Panasonic CF-29 Input: lifebook - activate 6-byte protocol on select models Input: lifebook - work properly on Panasonic CF-18 Input: cobalt buttons - separate device and driver registration Input: ati_remote - make button repeat sensitivity configurable Input: pxa27x - do not use deprecated SA_INTERRUPT flag Input: ucb1400 - make delays configurable Input: misc devices - switch to using input_dev->dev.parent Input: joysticks - switch to using input_dev->dev.parent Input: touchscreens - switch to using input_dev->dev.parent Input: mice - switch to using input_dev->dev.parent ... Fixed up conflicts with core device model removal of "struct subsystem" manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 18:16:12 -07:00
Linus Torvalds	5b33991576	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: remove "struct subsystem" as it is no longer needed sysfs: printk format warning DOC: Fix wrong identifier name in Documentation/driver-model/devres.txt platform: reorder platform_device_del Driver core: fix show_uevent from taking up way too much stack	2007-05-04 18:04:48 -07:00
Linus Torvalds	89661adaae	Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (59 commits) PCI: Free resource files in error path of pci_create_sysfs_dev_files() pci-quirks: disable MSI on RS400-200 and RS480 PCI hotplug: Use menuconfig objects PCI: ZT5550 CPCI Hotplug driver fix PCI: rpaphp: Remove semaphores PCI: rpaphp: Ensure more pcibios_add/pcibios_remove symmetry PCI: rpaphp: Use pcibios_remove_pci_devices() symmetrically PCI: rpaphp: Document is_php_dn() PCI: rpaphp: Document find_php_slot() PCI: rpaphp: Rename rpaphp_register_pci_slot() to rpaphp_enable_slot() PCI: rpaphp: refactor tail call to rpaphp_register_slot() PCI: rpaphp: remove rpaphp_set_attention_status() PCI: rpaphp: remove print_slot_pci_funcs() PCI: rpaphp: Remove setup_pci_slot() PCI: rpaphp: remove a call that does nothing but a pointer lookup PCI: rpaphp: Remove another wrappered function PCI: rpaphp: Remve another call that is a wrapper PCI: rpaphp: remove a function that does nothing but wrap debug printks PCI: rpaphp: Remove un-needed goto PCI: rpaphp: Fix a memleak; slot->location string was never freed ...	2007-05-04 18:04:29 -07:00
Linus Torvalds	6adae5d9e6	Merge master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6: [CRYPTO] padlock: Remove pointless padlock module [CRYPTO] api: Add ablkcipher_request_set_tfm [CRYPTO] cryptd: Add software async crypto daemon [CRYPTO] api: Do not remove users unless new algorithm matches [CRYPTO] cryptomgr: Fix parsing of nested templates [CRYPTO] api: Add async blkcipher type [CRYPTO] templates: Pass type/mask when creating instances [CRYPTO] tcrypt: Use async blkcipher interface [CRYPTO] api: Add async block cipher interface [CRYPTO] api: Proc functions should be marked as unused	2007-05-04 18:01:17 -07:00
Masashi Kimoto	640729014e	ps3: Make `ps3videomode -v 0 (auto mode) work again ps3: Make `ps3videomode -v 0' (auto mode) work again Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:59:08 -07:00
Geert Uytterhoeven	fffe52e86b	ps3av: misc updates ps3av: - Move the definition of struct ps3av to ps3av.c, as it's locally used only. - Kill ps3av.sem, use the existing ps3av.mutex instead. - Make the 512-byte buffer in ps3av_do_pkt() static to reduce stack usage. Its use is protected by a semaphore anyway. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:59:08 -07:00
Geert Uytterhoeven	5caf5db887	ps3av: thread updates ps3av: Replace the kernel_thread and the ping pong semaphores by a singlethread workqueue and a completion. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:59:08 -07:00
Geert Uytterhoeven	254f9c5cd2	Convert non-highmem kmap_atomic() to static inline function Convert kmap_atomic() in the non-highmem case from a macro to a static inline function, for better type-checking and the ability to pass void pointers instead of struct page pointers. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:59:08 -07:00
Finn Thain	f877958879	NuBus header update Sync the nubus defines with the latest code in the mac68k repo. Some of these are needed for DP8390 driver update in the next patch. Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:59:07 -07:00
Finn Thain	df7e7d6a89	m68k: remove unused adb.h The asm-m68k/adb.h header is unused. Some definitions are wrong and the rest are duplicated in linux/adb.h. Remove it. Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:59:07 -07:00
Roman Zippel	b3e2fd9ceb	lockdep: Add missing disable/enable irq variant Add missing disable/enable irq variant Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:59:06 -07:00
Michael Schmitz	c04cb856e2	m68k: Atari keyboard and mouse support. Atari keyboard and mouse support. (reformating and Kconfig fixes by Roman Zippel) Signed-off-by: Michael Schmitz <schmitz@debian.org> Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:59:05 -07:00
Linus Torvalds	8d41f0e8d5	Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6 * 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: (44 commits) i2c-s3c2410: Fix bug in releasing driver i2c-s3c2410: Fix I2C SDA to SCL setup time i2c: New i2c-tiny-usb bus driver i2c: Documentation update i2c: SPIN_LOCK_UNLOCKED cleanup i2c: Obsolete i2c-ixp2000, i2c-ixp4xx and scx200_i2c i2c: New Simtec I2C bus driver i2c: Bitbanging I2C bus driver using the GPIO API Use menuconfig objects - I2C i2c: Restore i2c_smbus_read_block_data i2c-pxa: Clean transaction stop i2c-algo-bit: Improve debugging i2c-algo-bit: Implement a 50/50 SCL duty cycle i2c-omap: Switch to static adapter numbering i2c: Blackfin Two Wire Interface driver i2c-algo-sgi: Comment and whitespace cleanups i2c: Make i2c_del_driver a void function i2c: Move i2c-isa-only exported symbol declarations i2c: Document i2c_new_device() i2c: Add i2c_new_probed_device() ... Fixed trivial conflict in Documentation/feature-removal-schedule.txt manually. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-04 17:46:27 -07:00
Linus Torvalds	ded1504dfa	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq * master.kernel.org:/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Report the number of processors in PowerNow-k8 correctly [CPUFREQ] do not declare undefined functions [CPUFREQ] cleanup kconfig options [CPUFREQ] Longhaul - Revert Longhaul ver. 2 [CPUFREQ] Remove deprecated /proc/acpi/processor/performance write support [CPUFREQ] Fix limited cpufreq when booted on battery Fix preemption warnings in speedstep-centrino.c [CPUFREQ] Longhaul - Correct PCI code [CPUFREQ] p4-clockmod: switch to rdmsr_on_cpu/wrmsr_on_cpu	2007-05-04 17:38:48 -07:00
Linus Torvalds	98b96173c7	Merge master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart * master.kernel.org:/pub/scm/linux/kernel/git/davej/agpgart: [AGPGART] sworks-agp: Switch to PCI ref counting APIs [AGPGART] Nvidia AGP: Use refcount aware PCI interfaces [AGPGART] Fix sparse warning in sgi-agp.c [AGPGART] Intel-agp adjustments [AGPGART] Move [un]map_page_into_agp into asm/agp.h [AGPGART] Add missing calls to global_flush_tlb() to ali-agp [AGPGART] prevent probe collision of sis-agp and amd64_agp	2007-05-04 17:38:16 -07:00
Vlad Yasevich	07d9396771	[SCTP]: Set assoc_id correctly during INIT collision. During the INIT/COOKIE-ACK collision cases, it's possible to get into a situation where the association id is not yet set at the time of the user event generation. As a result, user events have an association id set to 0 which will confuse applications. This happens if we hit case B of duplicate cookie processing. In the particular example found and provided by Oscar Isaula <Oscar.Isaula@motorola.com>, flow looks like this: A B ---- INIT-------> (lost) <---------INIT------ ---- INIT-ACK---> <------ Cookie ECHO When the Cookie Echo is received, we end up trying to update the association that was created on A as a result of the (lost) INIT, but that association doesn't have the ID set yet. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 13:55:27 -07:00
Sridhar Samudrala	827bf12236	[SCTP]: Re-order SCTP initializations to avoid race with sctp_rcv() Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 13:36:30 -07:00
Jamal Hadi Salim	5a6d34162f	[XFRM] SPD info TLV aggregation Aggregate the SPD info TLVs. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 12:55:39 -07:00
Jamal Hadi Salim	af11e31609	[XFRM] SAD info TLV aggregationx Aggregate the SAD info TLVs. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 12:55:13 -07:00
Jennifer Hunt	561e036006	[AF_IUCV]: Implementation of a skb backlog queue With the inital implementation we missed to implement a skb backlog queue . The result is that socket receive processing tossed packets. Since AF_IUCV connections are working synchronously it leads to connection hangs. Problems with read, close and select also occured. Using a skb backlog queue is fixing all of these problems . Signed-off-by: Jennifer Hunt <jenhunt@us.ibm.com> Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-04 12:22:07 -07:00
Martin Schwidefsky	cf8ba7a955	[S390] add hardware capability support (ELF_HWCAP). Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-05-04 18:48:35 +02:00
Cornelia Huck	52706ec903	[S390] cio: Deprecate read_dev_chars() and read_conf_data{,_lpm}(). These helper functions are a leftover from 2.4 sync I/O and are a notorious source for bugs. They lead to device driver specific code creeping into cio, and some issues can't really be fixed at all. Device drivers can easily implement those functions themselves in a more robust manner, so let's get rid of them. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-05-04 18:48:25 +02:00
Christoph Hellwig	33464e3b57	[S390] get rid of kprobes notifier call chain. And here's a port of the powerpc patch to get rid of the notifier chain completely to s390. It's ontop of Martins patch as that one is in mainline already. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-05-04 18:48:24 +02:00
Eric Dumazet	db3459d1a7	[IPV6]: Some cleanups in include/net/ipv6.h 1) struct ip6_flowlabel : moves 'users' field to avoid two 32bits holes for 64bit arches. Shrinks by 8 bytes sizeof(struct ip6_flowlabel) 2) ipv6_addr_cmp() and ipv6_addr_copy() dont need (void ) casts : Compiler might take into account natural alignement of in6_addr structs to emit better code for memcpy()/memcmp() Casts to (void ) force byte accesses. 3) ipv6_addr_prefix() optimization : Better to clear whole struct, as compiler can emit better code for memset(addr, 0, 16) (2 stores on x86_64), and avoid some conditional branches. # size vmlinux.after vmlinux.before text data bss dec hex filename 5262262 647612 557432 6467306 62aeea vmlinux.after 5262550 647612 557432 6467594 62b00a vmlinux.before thats 288 bytes saved. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 17:39:04 -07:00
Pavel Emelianov	7562f876cd	[NET]: Rework dev_base via list_head (v3) Cleanup of dev_base list use, with the aim to simplify making device list per-namespace. In almost every occasion, use of dev_base variable and dev->next pointer could be easily replaced by for_each_netdev loop. A few most complicated places were converted to using first_netdev()/next_netdev(). Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 15:13:45 -07:00
Michael Chan	27a005b883	[BNX2]: Add support for 5709 Serdes. Add PCI ID and code to support the 5709 Serdes PHY. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 13:23:41 -07:00
Michael Chan	427c2196b9	[ETHTOOL]: Add 2.5G bit definitions. Add 2.5G supported and advertising bit definitions. 2.5G is supported by the bnx2 driver. Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 13:17:25 -07:00
Sascha Hauer	ff4bfb2163	[ARM] 4328/1: Move i.MX UART regs to driver This patch moves the i.MX UART register descriptions from include/asm-arm/arch-imx/imx-regs.h to the serial driver itself. This helps using the driver on other architectures like mx31 Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-03 20:24:21 +01:00
Sascha Hauer	fe7fdb80e9	[ARM] 4329/1: fix position of NETX_SYSTEM_REG This patch fixes the position of the netx reset control register Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-03 20:22:49 +01:00
Russell King	5559bca8e6	[ARM] ecard: Convert card type enum to a flag 'type' in the struct expansion_card is only used to indicate whether this card is an EASI card or not. Therefore, having it as an enum is wasteful (and introduces additional noise when we come to remove the enum.) Convert it to a mere flag instead. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-03 14:16:56 +01:00
Russell King	c0b04d1b2c	[ARM] ecard: Move private ecard junk out of asm/ecard.h Move ecard.c private junk from asm/ecard.h to a local header file. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-03 14:16:56 +01:00
Andrew Victor	7776a94c31	[ARM] 4352/1: AT91: Platform data for LCD and AC97. Define resources, platform_device and device registration functions for the LCD and AC97 controllers on the AT91SAM9263. Also update the AT91SAM9261 to use the common atmel_lcdfb driver. Signed-off-by: Nicolas Ferre <nicolas.ferre@rfo.atmel.com> Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-03 14:10:22 +01:00
Andrew Victor	ce813b97e5	[ARM] 4350/1: AT91: Hardware header for ADC peripheral Definitions for Analog-to-Digital Converter (ADC) found on the Atmel AT91SAM9260 processor. Signed-off-by: Andrew Victor <andrew@sanpeople.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-03 14:10:20 +01:00
Dan Williams	d2dd8b1fed	[ARM] 4342/2: iop13xx: add resource definitions for the tpmi units The tpmi units interface with the SAS controller on iop348. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-03 14:03:54 +01:00
Dan Williams	e90ddd813d	[ARM] 4348/4: iop3xx: Give Linux control over PCI initialization Currently the iop3xx platform support code assumes that RedBoot is the bootloader and has already initialized the ATU. Linux should handle this initialization for three reasons: 1/ The memory map that RedBoot sets up is not optimal (page_to_dma and virt_to_phys return different addresses). The effect of this is that using the dma mapping API for the internal bus dma units generates pci bus addresses that are incorrect for the internal bus. 2/ Not all iop platforms use RedBoot 3/ If the ATU is already initialized it indicates that the iop is an add-in card in another host, it does not own the PCI bus, and should not be re-initialized. Changelog: * rather than change nr_controllers to zero, simply do not call pci_common_init Cc: Lennert Buytenhek <kernel@wantstofly.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-05-03 14:02:48 +01:00
Patrick McHardy	fc38582db9	[NETFILTER]: bridge netfilter: consolidate header pushing/pulling code Consolidate the common push/pull sequences into a few helper functions. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:36:16 -07:00
Jorge Boncompte	c2a1910b06	[NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack and nat modules to a 2.4.32 kernel I noticed that the gre_key function returns a wrong pointer to the GRE key of a version 0 packet thus corrupting the packet payload. The intended behaviour for GREv0 packets is to act like nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the offending functions (not used anymore) and modified the nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets. Signed-off-by: Jorge Boncompte <jorge@dti2.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:34:42 -07:00
Ilpo Järvinen	0ec96822d5	[TCP]: Use S+L catcher only with SACK for now TCP has a transitional state when SACK is not in use during which this invariant is temporarily broken. Without SACK, tcp_clean_rtx_queue does not decrement sacked_out. Therefore calls to tcp_sync_left_out before sacked_out is again corrected by tcp_fastretrans_alert can trigger this trap as sacked_out still has couple of segments that are already out of window. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:30:34 -07:00
Patrick McHardy	4e9cac2ba4	[NET]: Add __dev_getfirstbyhwtype Add __dev_getfirstbyhwtype for callers that don't want a reference but some data from the device and thus need to take the rtnl anyway. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:28:13 -07:00
Randy Dunlap	be52178b9f	[NET] skbuff: fix kernel-doc Fix skbuff.h kernel-doc: linux-2.6.21-git4//include/linux/skbuff.h:316): No description found for parameter 'transport_header' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:16:20 -07:00
David Howells	ef4533f8af	[AFS]: Make the match_() functions take const options. Make the match_() functions take a const pointer to the options table and make strings pointers in the options table const too. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:10:39 -07:00
Eric Dumazet	709525fad8	[IPV6]: Get rid of __HAVE_ARCH_ADDR_SET. __HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:08:43 -07:00
Avi Kivity	2ff81f70b5	KVM: Remove unused 'instruction_length' As we no longer emulate in userspace, this is meaningless. We don't compute it on SVM anyway. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:32 +03:00
Avi Kivity	02c8320972	KVM: Don't require explicit indication of completion of mmio or pio It is illegal not to return from a pio or mmio request without completing it, as mmio or pio is an atomic operation. Therefore, we can simplify the userspace interface by avoiding the completion indication. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:32 +03:00
Avi Kivity	b8836737d9	KVM: Add fpu get/set operations These are really helpful when migrating an floating point app to another machine. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:28 +03:00
Avi Kivity	e8207547d2	KVM: Add physical memory aliasing feature With this, we can specify that accesses to one physical memory range will be remapped to another. This is useful for the vga window at 0xa0000 which is used as a movable window into the (much larger) framebuffer. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:28 +03:00
Avi Kivity	039576c03c	KVM: Avoid guest virtual addresses in string pio userspace interface The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:25 +03:00
Avi Kivity	07c45a366d	KVM: Allow kernel to select size of mmap() buffer This allows us to store offsets in the kernel/user kvm_run area, and be sure that userspace has them mapped. As offsets can be outside the kvm_run struct, userspace has no way of knowing how much to mmap. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	1961d276c8	KVM: Add guest mode signal mask Allow a special signal mask to be used while executing in guest mode. This allows signals to be used to interrupt a vcpu without requiring signal delivery to a userspace handler, which is quite expensive. Userspace still receives -EINTR and can get the signal via sigwait(). Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	1b19f3e61d	KVM: Add a special exit reason when exiting due to an interrupt This is redundant, as we also return -EINTR from the ioctl, but it allows us to examine the exit_reason field on resume without seeing old data. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	8eb7d334bd	KVM: Fold kvm_run::exit_type into kvm_run::exit_reason Currently, userspace is told about the nature of the last exit from the guest using two fields, exit_type and exit_reason, where exit_type has just two enumerations (and no need for more). So fold exit_type into exit_reason, reducing the complexity of determining what really happened. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	b4e63f560b	KVM: Allow userspace to process hypercalls which have no kernel handler This is useful for paravirtualized graphics devices, for example. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	5d308f4550	KVM: Add method to check for backwards-compatible API extensions Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	739872c56f	KVM: Renumber ioctls The recent changes have left the ioctl numbers in complete disarray. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	2a4dac3952	KVM: Remove minor wart from KVM_CREATE_VCPU ioctl That ioctl does not transfer any data, so it should be an _IO rather than an _IOW. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	106b552b43	KVM: Remove the 'emulated' field from the userspace interface We no longer emulate single instructions in userspace. Instead, we service mmio or pio requests. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	06465c5a3a	KVM: Handle cpuid in the kernel instead of punting to userspace KVM used to handle cpuid by letting userspace decide what values to return to the guest. We now handle cpuid completely in the kernel. We still let userspace decide which values the guest will see by having userspace set up the value table beforehand (this is necessary to allow management software to set the cpu features to the least common denominator, so that live migration can work). The motivation for the change is that kvm kernel code can be impacted by cpuid features, for example the x86 emulator. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	46fc147788	KVM: Do not communicate to userspace through cpu registers during PIO Currently when passing the a PIO emulation request to userspace, we rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx (on string instructions). This (a) requires two extra ioctls for getting and setting the registers and (b) is unfriendly to non-x86 archs, when they get kvm ports. So fix by doing the register fixups in the kernel and passing to userspace only an abstract description of the PIO to be done. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	9a2bb7f486	KVM: Use a shared page for kernel/user communication when runing a vcpu Instead of passing a 'struct kvm_run' back and forth between the kernel and userspace, allocate a page and allow the user to mmap() it. This reduces needless copying and makes the interface expandable by providing lots of free space. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	ff42697436	KVM: Export <linux/kvm.h> This allows users to actually build prgrams that use kvm without the entire source tree. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:22 +03:00
Avi Kivity	bbe4432e66	KVM: Use own minor number Use the minor number (232) allocated to kvm by lanana. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:22 +03:00
Mike Frysinger	a830df367c	Input: pull input.h into uinpit.h uinput.h relies on structures found in input.h, so pull in the header Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2007-05-03 00:55:34 -04:00
Adrian Bunk	ecf36501bc	PCI: the overdue removal of pci_module_init() Unless we finally completely remove it, people will always add new users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Adrian Bunk	5adc55da4a	PCI: remove the broken PCI_MULTITHREAD_PROBE option This patch removes the PCI_MULTITHREAD_PROBE option that had already been marked as broken. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Michael Ellerman	032de8e2fe	MSI: Give archs the option to free all MSI/Xs at once. This patch introduces an optional function, arch_teardown_msi_irqs(), which gives an arch the opportunity to do per-device teardown for MSI/X. If that's not required, the default version simply calls arch_teardown_msi_irq() for each msi irq required. arch_teardown_msi_irqs() is simply passed a pdev, attached to the pdev is a list of msi_descs, it is up to the arch to free the irq associated with each of these as appropriate. For archs that _don't_ implement arch_teardown_msi_irqs(), all msi_descs with irq == 0 are considered unallocated, and the arch teardown routine is not called on them. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Michael Ellerman	9c8313343c	MSI: Give archs the option to allocate all MSI/Xs at once. This patch introduces an optional function, arch_setup_msi_irqs(), (note the plural) which gives an arch the opportunity to do per-device setup for MSI/X and then allocate all the requested MSI/Xs at once. If that's not required by the arch, the default version simply calls arch_setup_msi_irq() for each MSI irq required. arch_setup_msi_irqs() is passed a pdev, attached to the pdev is a list of msi_descs with irq == 0, it is up to the arch to connect these up to an irq (via set_irq_msi()) or return an error. For convenience the number of vectors and the type are passed also. All msi_descs with irq != 0 are considered allocated, and the arch teardown routine will be called on them when necessary. The existing semantics of pci_enable_msix() are that if the requested number of irqs can not be allocated, the maximum number that _could_ be allocated is returned. To support that, we define that in case of an error from arch_setup_msi_irqs(), the number of msi_descs with irq != 0 are considered allocated, and are counted toward the "max that could be allocated". Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Michael Ellerman	314e77b3ee	MSI: Remove dev->first_msi_irq Now that we keep a list of msi descriptors, we don't need first_msi_irq in the pci dev. If we somehow have zero MSIs configured list_entry() will give us weird oopes or nice memory corruption bugs. So be paranoid. Add BUG_ONs and also a check in pci_msi_check_device() to make sure nvec > 0. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:37 -07:00
Michael Ellerman	4aa9bc955d	MSI: Use a list instead of the custom link structure The msi descriptors are linked together with what looks a lot like a linked list, but isn't a struct list_head list. Make it one. The only complication is that previously we walked a list of irqs, and got the descriptor for each with get_irq_msi(). Now we have a list of descriptors and need to get the irq out of it, so it needs to be in the actual struct msi_desc. We use 0 to indicate no irq is setup. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:37 -07:00
Michael Ellerman	65891215e6	PCI: Create alloc_pci_dev(), the one true way to create a struct pci_dev There are currently several places in the kernel where we kmalloc() a struct pci_dev and start initialising it. It'd be preferable to have an allocator so we can ensure the pci_dev is correctly initialised in one place. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:37 -07:00
Michael Ellerman	c9953a73e9	MSI: Add an arch_msi_check_device() Add an arch_check_device(), which gives archs a chance to check the input to pci_enable_msi/x. The arch might be interested in the value of nvec so pass it in. Propagate the error value returned from the arch routine out to the caller. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:37 -07:00
Sergei Shtylyov	0da0ead901	PCI: define pci_request/release_regions() for CONFIG_PCI=n Balance declarations of pci_request_regions() and pci_release_regions() with empty inline definitions for the CONFIG_PCI=n case -- otherwise my patch to drivers/net/3c59x.c in the -mm tree doesn't compile. :-) Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:35 -07:00
Jean Delvare	6473d160b4	PCI: Cleanup the includes of <linux/pci.h> I noticed that many source files include <linux/pci.h> while they do not appear to need it. Here is an attempt to clean it all up. In order to find all possibly affected files, I searched for all files including <linux/pci.h> but without any other occurence of "pci" or "PCI". I removed the include statement from all of these, then I compiled an allmodconfig kernel on both i386 and x86_64 and fixed the false positives manually. My tests covered 66% of the affected files, so there could be false positives remaining. Untested files are: arch/alpha/kernel/err_common.c arch/alpha/kernel/err_ev6.c arch/alpha/kernel/err_ev7.c arch/ia64/sn/kernel/huberror.c arch/ia64/sn/kernel/xpnet.c arch/m68knommu/kernel/dma.c arch/mips/lib/iomap.c arch/powerpc/platforms/pseries/ras.c arch/ppc/8260_io/enet.c arch/ppc/8260_io/fcc_enet.c arch/ppc/8xx_io/enet.c arch/ppc/syslib/ppc4xx_sgdma.c arch/sh64/mach-cayman/iomap.c arch/xtensa/kernel/xtensa_ksyms.c arch/xtensa/platform-iss/setup.c drivers/i2c/busses/i2c-at91.c drivers/i2c/busses/i2c-mpc.c drivers/media/video/saa711x.c drivers/misc/hdpuftrs/hdpu_cpustate.c drivers/misc/hdpuftrs/hdpu_nexus.c drivers/net/au1000_eth.c drivers/net/fec_8xx/fec_main.c drivers/net/fec_8xx/fec_mii.c drivers/net/fs_enet/fs_enet-main.c drivers/net/fs_enet/mac-fcc.c drivers/net/fs_enet/mac-fec.c drivers/net/fs_enet/mac-scc.c drivers/net/fs_enet/mii-bitbang.c drivers/net/fs_enet/mii-fec.c drivers/net/ibm_emac/ibm_emac_core.c drivers/net/lasi_82596.c drivers/parisc/hppb.c drivers/sbus/sbus.c drivers/video/g364fb.c drivers/video/platinumfb.c drivers/video/stifb.c drivers/video/valkyriefb.c include/asm-arm/arch-ixp4xx/dma.h sound/oss/au1550_ac97.c I would welcome test reports for these files. I am fine with removing the untested files from the patch if the general opinion is that these changes aren't safe. The tested part would still be nice to have. Note that this patch depends on another header fixup patch I submitted to LKML yesterday: [PATCH] scatterlist.h needs types.h http://lkml.org/lkml/2007/3/01/141 Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:35 -07:00
Jean Delvare	a9dfd281a7	PCI: scatterlist.h needs types.h Most architectures' scatterlist.h use the type dma_addr_t, but omit to include <asm/types.h> which defines it. This could lead to build failures, so let's add the missing includes. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:34 -07:00
Brian King	f7bdd12d23	pci: New PCI-E reset API Adds a new API which can be used to issue various types of PCI-E reset, including PCI-E warm reset and PCI-E hot reset. This is needed for an ipr PCI-E adapter which does not properly implement BIST. Running BIST on this adapter results in PCI-E errors. The only reliable reset mechanism that exists on this hardware is PCI Fundamental reset (warm reset). Since driving this type of reset is architecture unique, this provides the necessary hooks for architectures to add this support. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:34 -07:00
Greg Kroah-Hartman	823bccfc40	remove "struct subsystem" as it is no longer needed We need to work on cleaning up the relationship between kobjects, ksets and ktypes. The removal of 'struct subsystem' is the first step of this, especially as it is not really needed at all. Thanks to Kay for fixing the bugs in this patch. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 18:57:59 -07:00
Sam Ravnborg	dc24f0e708	kbuild: remove dependency on input.h from file2alias Almost all definitions used by file2alias was already present in mod_devicetable.h. Added the last definition and killed the input.h usage. The errornous include was pointed out by: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: Deepak Saxena <dsaxena@plexity.net>	2007-05-02 20:58:08 +02:00
Jan Kiszka	c41bf8fa5e	[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu There are two callers of __unlazy_fpu, unlazy_fpu and __switch_to, and none of them appear to require additional preempt_disable/enable here. Let's open-code save_init_fpu in __unlazy_fpu to save a few ops. Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Jan Kiszka	02b64dab56	[PATCH] i386: white space fixes in i387.h Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Andi Kleen	c5bcb5635a	[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible RDTSCP is already synchronous and doesn't need an explicit CPUID. This is a little faster and more importantly avoids VMEXITs on Hypervisors. Original patch from Joerg Roedel, but reworked by AK Also includes miscompilation fix by Eric Biederman Cc: "Joerg Roedel" <joerg.roedel@amd.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Andi Kleen	9bccb23dc5	[PATCH] i386: Add X86_FEATURE_RDTSCP Following x86-64 Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	3aefbe0746	[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 Syncs up with x86-64. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	e859dc553c	[PATCH] i386: Implement alternative_io for i386 Ported from x86-64. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	3671df8572	[PATCH] i386: Evaluate constant cpu features at runtime Redefine cpu_has() to evaluate cpu features already checked in early boot at compile time. This way the compiler might eliminate some dead code. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	c7f81c9453	[PATCH] i386: Verify important CPUID bits in real mode Check some CPUID bits that are needed for compiler generated early in boot. When the system is still in real mode before changing the VESA BIOS mode it is possible to still display an visible error message on the screen. Similar to x86-64. Includes cleanups from Eric Biederman Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	05cb007dac	[PATCH] x86-64: Use the 32bit wd_ops for 64bit too. This mainly removes a lot of code, replacing it with calls into the new 32bit perfctr-watchdog.c Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	09198e6850	[PATCH] i386: Clean up NMI watchdog code - Introduce a wd_ops structure - Convert the various nmi watchdogs over to it - This allows to split the perfctr reservation from the watchdog setup cleanly. - Do perfctr reservation globally as it should have always been - Remove dead code referenced only by unused EXPORT_SYMBOLs Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Zachary Amsden	9e5e3162b2	[PATCH] i386: pte simplify ops Add comment and condense code to make use of native_local_ptep_get_and_clear function. Also, it turns out the 2-level and 3-level paging definitions were identical, so move the common definition into pgtable.h Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:19 +02:00
Zachary Amsden	142dd97591	[PATCH] i386: pte xchg optimization In situations where page table updates need only be made locally, and there is no cross-processor A/D bit races involved, we need not use the heavyweight xchg instruction to atomically fetch and clear page table entries. Instead, we can just read and clear them directly. This introduces a neat optimization for non-SMP kernels; drop the atomic xchg operations from page table updates. Thanks to Michel Lespinasse for noting this potential optimization. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:19 +02:00
Zachary Amsden	c2c1accd4b	[PATCH] i386: pte clear optimization When exiting from an address space, no special hypervisor notification of page table updates needs to occur; direct page table hypervisors, such as Xen, switch to another address space first (init_mm) and unprotects the page tables to avoid the cost of trapping to the hypervisor for each pte_clear. Shadow mode hypervisors, such as VMI and lhype don't need to do the extra work of calling through paravirt-ops, and can just directly clear the page table entries without notifiying the hypervisor, since all the page tables are about to be freed. So introduce native_pte_clear functions which bypass any paravirt-ops notification. This results in a significant performance win for VMI and removes some indirect calls from zap_pte_range. Note the 3-level paging already had a native_pte_clear function, thus demanding argument conformance and extra args for the 2-level definition. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:19 +02:00
Andi Kleen	57a4f91ae5	[PATCH] x86-64: Auto compute __NR_syscall_max at compile time No need to maintain it anymore Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:18 +02:00
Fernando Luis [ ISO-8859-1 charset ] V�zquezCao	70ae77f497	[PATCH] x86-64: Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64 Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is NMI_VECTOR to avoid potential hangups in the event of crash when kdump tries to stop the other CPUs. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:18 +02:00
Fernando Luis [ ISO-8859-1 charset ] V�zquezCao	9062d888aa	[PATCH] x86-64: __send_IPI_dest_field - x86_64 Implement __send_IPI_dest_field which can be used to send IPIs when the "destination shorthand" field of the ICR is set to 00 (destination field). Use it whenever possible. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:18 +02:00
Fernando Luis VazquezCao	8339e9fba3	[PATCH] x86-64: safe_apic_wait_icr_idle - x86_64 apic_wait_icr_idle looks like this: static __inline__ void apic_wait_icr_idle(void) { while (apic_read(APIC_ICR) & APIC_ICR_BUSY) cpu_relax(); } The busy loop in this function would not be problematic if the corresponding status bit in the ICR were always updated, but that does not seem to be the case under certain crash scenarios. Kdump uses an IPI to stop the other CPUs in the event of a crash, but when any of the other CPUs are locked-up inside the NMI handler the CPU that sends the IPI will end up looping forever in the ICR check, effectively hard-locking the whole system. Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3: "A local APIC unit indicates successful dispatch of an IPI by resetting the Delivery Status bit in the Interrupt Command Register (ICR). The operating system polls the delivery status bit after sending an INIT or STARTUP IPI until the command has been dispatched. A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions. If the IPI is not successfully dispatched, the operating system can abort the command. Alternatively, the operating system can retry the IPI by writing the lower 32-bit double word of the ICR. This “time-out” mechanism can be implemented through an external interrupt, if interrupts are enabled on the processor, or through execution of an instruction or time-stamp counter spin loop." Intel's documentation suggests the implementation of a time-out mechanism, which, by the way, is already being open-coded in some parts of the kernel that tinker with ICR. Create a apic_wait_icr_idle replacement that implements the time-out mechanism and that can be used to solve the aforementioned problem. AK: moved both functions out of line AK: Added improved loop from Keith Owens Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:17 +02:00
Fernando Luis VazquezCao	f2b218dd61	[PATCH] i386: safe_apic_wait_icr_idle - i386 apic_wait_icr_idle looks like this: static __inline__ void apic_wait_icr_idle(void) { while (apic_read(APIC_ICR) & APIC_ICR_BUSY) cpu_relax(); } The busy loop in this function would not be problematic if the corresponding status bit in the ICR were always updated, but that does not seem to be the case under certain crash scenarios. Kdump uses an IPI to stop the other CPUs in the event of a crash, but when any of the other CPUs are locked-up inside the NMI handler the CPU that sends the IPI will end up looping forever in the ICR check, effectively hard-locking the whole system. Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3: "A local APIC unit indicates successful dispatch of an IPI by resetting the Delivery Status bit in the Interrupt Command Register (ICR). The operating system polls the delivery status bit after sending an INIT or STARTUP IPI until the command has been dispatched. A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions. If the IPI is not successfully dispatched, the operating system can abort the command. Alternatively, the operating system can retry the IPI by writing the lower 32-bit double word of the ICR. This “time-out” mechanism can be implemented through an external interrupt, if interrupts are enabled on the processor, or through execution of an instruction or time-stamp counter spin loop." Intel's documentation suggests the implementation of a time-out mechanism, which, by the way, is already being open-coded in some parts of the kernel that tinker with ICR. Create a apic_wait_icr_idle replacement that implements the time-out mechanism and that can be used to solve the aforementioned problem. AK: moved both functions out of line AK: added improved loop from Keith Owens Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:17 +02:00
Bernhard Kaindl	de938c51d5	[PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync If our copy of the MTRRs of the BSP has RdMem or WrMem set, and we are running on an AMD64/K8 system, the boot CPU must have had MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would have copied these bits cleared), so we set them on this CPU as well. This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync across the CPUs of SMP systems in order to fullfill the duty of system software to "initialize and maintain MTRR consistency across all processors." as written in the AMD and Intel manuals. If an WRMSR instruction fails because MtrrFixDramModEn is not set, I expect that also the Intel-style MTRR bits are not updated. AK: minor cleanup, moved MSR defines around Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>	2007-05-02 19:27:17 +02:00
Bernhard Kaindl	2b1f6278d7	[PATCH] x86: Save the MTRRs of the BSP before booting an AP Applied fix by Andew Morton: http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'. AMD and Intel x86 CPU manuals state that it is the responsibility of system software to initialize and maintain MTRR consistency across all processors in Multi-Processing Environments. Quote from page 188 of the AMD64 System Programming manual (Volume 2): 7.6.5 MTRRs in Multi-Processing Environments "In multi-processing environments, the MTRRs located in all processors must characterize memory in the same way. Generally, this means that identical values are written to the MTRRs used by the processors." (short omission here) "Failure to do so may result in coherency violations or loss of atomicity. Processor implementations do not check the MTRR settings in other processors to ensure consistency. It is the responsibility of system software to initialize and maintain MTRR consistency across all processors." Current Linux MTRR code already implements the above in the case that the BIOS does not properly initialize MTRRs on the secondary processors, but the case where the fixed-range MTRRs of the boot processor are changed after Linux started to boot, before the initialsation of a secondary processor, is not handled yet. In this case, secondary processors are currently initialized by Linux with MTRRs which the boot processor had very early, when mtrr_bp_init() did run, but not with the MTRRs which the boot processor uses at the time when that secondary processors is actually booted, causing differing MTRR contents on the secondary processors. Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs of the boot processor when it transitions the system into ACPI mode. The SMI handler of the BIOS does this in SMM, entered while Linux ACPI code runs acpi_enable(). Other occasions where the SMI handler of the BIOS may change bits in the MTRRs could occur as well. To initialize newly booted secodary processors with the fixed-range MTRRs which the boot processor uses at that time, this patch saves the fixed-range MTRRs of the boot processor before new secondary processors are started. When the secondary processors run their Linux initialisation code, their fixed-range MTRRs will be updated with the saved fixed-range MTRRs. If CONFIG_MTRR is not set, we define mtrr_save_state as an empty statement because there is nothing to do. Possible TODOs: ) CPU-hotplugging outside of SMP suspend/resume is not yet tested with this patch. ) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up, then the calls to mtrr_save_state() could be replaced by calls to mtrr_save_fixed_ranges(NULL) and mtrr_save_state() would not be needed. That would need either verification of the CPU-hotplug code or at least a test on a >2 CPU machine. *) The MTRRs of other running processors are not yet checked at this time but it might be interesting to syncronize the MTTRs of all processors before booting. That would be an incremental patch, but of rather low priority since there is no machine known so far which would require this. AK: moved prototypes on x86-64 around to fix warnings Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>	2007-05-02 19:27:17 +02:00
Bernhard Kaindl	2b3b4835c9	[PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches. In this current implementation which is used in other patches, mtrr_save_fixed_ranges() accepts a dummy void pointer because in the current implementation of one of these patches, this function may be called from smp_call_function_single() which requires that this function takes a void pointer argument. This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges which is the element of the static struct which stores our current backup of the fixed-range MTRR values which all CPUs shall be using. Because mtrr_save_fixed_ranges calls get_fixed_ranges after kernel initialisation time, __init needs to be removed from the declaration of get_fixed_ranges(). If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges as an empty statement because there is nothing to do. AK: Moved prototypes for x86-64 around to fix warnings Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>	2007-05-02 19:27:17 +02:00
Andi Kleen	856f44ff4a	[PATCH] x86-64: Move mtrr prototypes from proto.h to mtrr.h Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge	03df4f6ee9	[PATCH] i386: Clean up ELF note generation Three cleanups: 1: ELF notes are never mapped, so there's no need to have any access flags in their phdr. 2: When generating them from asm, tell the assembler to use a SHT_NOTE section type. There doesn't seem to be a way to do this from C. 3: Use ANSI rather than traditional cpp behaviour to stringify the macro argument. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric W. Biederman <ebiederm@xmission.com>	2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge	441d40dca0	[PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org> The other symbols used to delineate the alt-instructions sections have the form __foo/__foo_end. Rename parainstructions to match. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:16 +02:00
Zachary Amsden	e0bb864397	[PATCH] i386: Convert VMI timer to use clock events Convert VMI timer to use clock events, making it properly able to use the NO_HZ infrastructure. On UP systems, with no local APIC, we just continue to route these events through the PIT. On systems with a local APIC, or SMP, we provide a single source interrupt chip which creates the local timer IRQ. It actually gets delivered by the APIC hardware, but we don't want to use the same local APIC clocksource processing, so we create our own handler here. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> CC: Dan Hecht <dhecht@vmware.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	57decbda6a	[PATCH] x86: update for i386 and x86-64 check_bugs Remove spurious comments, headers and keywords from x86-64 bugs.[ch]. Use identify_boot_cpu() AK: merged with other patch Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	c5413fbe89	[PATCH] i386: Fix UP gdt bugs Fixes two problems with the GDT when compiling for uniprocessor: - There's no percpu segment, so trying to load its selector into %fs fails. Use a null selector instead. - The real gdt needs to be loaded at some point. Do it in cpu_init(). Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	1956c73bb5	[PATCH] i386: Define per_cpu_offset Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like asm-generic/percpu.h does for UP. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	978c038ec9	[PATCH] i386: cleanups to help using per-cpu variables from asm This patch does a few small cleanups: - use PER_CPU_NAME to generate the names of per-cpu variables - use lea to add the per_cpu offset in PER_CPU(), because it doesn't affect condition flags - add PER_CPU_VAR which allows direct access to pre-cpu variables with the %fs: prefix on SMP. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	7c3576d261	[PATCH] i386: Convert PDA into the percpu section Currently x86 (similar to x84-64) has a special per-cpu structure called "i386_pda" which can be easily and efficiently referenced via the %fs register. An ELF section is more flexible than a structure, allowing any piece of code to use this area. Indeed, such a section already exists: the per-cpu area. So this patch: (1) Removes the PDA and uses per-cpu variables for each current member. (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU. (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which can be used to calculate addresses for this CPU's variables. (4) Simplifies startup, because %fs doesn't need to be loaded with a special segment at early boot; it can be deferred until the first percpu area is allocated (or never for UP). The result is less code and one less x86-specific concept. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	7a61d35d4b	[PATCH] i386: Page-align the GDT Xen wants a dedicated page for the GDT. I believe VMI likes it too. lguest, KVM and native don't care. Simple transformation to page-aligned "struct gdt_page". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	4cdd9c8931	[PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear In shadow mode hypervisors, ptep_get_and_clear achieves the desired purpose of keeping the shadows in sync by issuing a native_get_and_clear, followed by a call to pte_update, which indicates the PTE has been modified. Direct mode hypervisors (Xen) have no need for this anyway, and will trap the update using writable pagetables. This means no hypervisor makes use of ptep_get_and_clear; there is no reason to have it in the paravirt-ops structure. Change confusing terminology about raw vs. native functions into consistent use of native_pte_xxx for operations which do not invoke paravirt-ops. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	1a45b7aaa5	[PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers Replace all the open-coded macros for generating calls with a pair of more general macros (__PVOP_CALL/VCALL), and redefine all the PVOP_V?CALL[0-4] in terms of them. [ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h" (paravirt_ops-document-asm-i386-paravirth.patch) ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	4e0fa85602	[PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi Remove #defines, add enum for PARAVIRT_LAZY_FLUSH. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	ce6234b529	[PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages Xen and VMI both have special requirements when mapping a highmem pte page into the kernel address space. These can be dealt with by adding a new kmap_atomic_pte() function for mapping highptes, and hooking it into the paravirt_ops infrastructure. Xen specifically wants to map the pte page RO, so this patch exposes a helper function, kmap_atomic_prot, which maps the page with the specified page protections. This also adds a kmap_flush_unused() function to clear out the cached kmap mappings. Xen needs this to clear out any potential stray RW mappings of pages which will become part of a pagetable. [ Zach - vmi.c will need some attention after this patch. It wasn't immediately obvious to me what needs to be done. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	a27fe809b8	[PATCH] i386: PARAVIRT: revert map_pt_hook. Back out the map_pt_hook to clear the way for kmap_atomic_pte. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	d4c104771a	[PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op This patch adds a pv_op for flush_tlb_others. Linux running on native hardware uses cross-CPU IPIs to flush the TLB on any CPU which may have a particular mm's pagetable entries cached in its TLB. This is inefficient in a paravirtualized environment, since the hypervisor knows which real CPUs actually contain cached mappings, which may be a small subset of a guest's VCPUs. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	63f70270cc	[PATCH] i386: PARAVIRT: add common patching machinery Implement the actual patching machinery. paravirt_patch_default() contains the logic to automatically patch a callsite based on a few simple rules: - if the paravirt_op function is paravirt_nop, then patch nops - if the paravirt_op function is a jmp target, then jmp to it - if the paravirt_op function is callable and doesn't clobber too much for the callsite, call it directly paravirt_patch_default is suitable as a default implementation of paravirt_ops.patch, will remove most of the expensive indirect calls in favour of either a direct call or a pile of nops. Backends may implement their own patcher, however. There are several helper functions to help with this: paravirt_patch_nop nop out a callsite paravirt_patch_ignore leave the callsite as-is paravirt_patch_call patch a call if the caller and callee have compatible clobbers paravirt_patch_jmp patch in a jmp paravirt_patch_insns patch some literal instructions over the callsite, if they fit This patch also implements more direct patches for the native case, so that when running on native hardware many common operations are implemented inline. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	294688c028	[PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h Clean things up, and broadly document: - the paravirt_ops functions themselves - the patching mechanism Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	f8822f4201	[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable Wrap a set of interesting paravirt_ops calls in a wrapper which makes the callsites available for patching. Unfortunately this is pretty ugly because there's no way to get gcc to generate a function call, but also wrap just the callsite itself with the necessary labels. This patch supports functions with 0-4 arguments, and either void or returning a value. 64-bit arguments must be split into a pair of 32-bit arguments (lower word first). Small structures are returned in registers. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	42c24fa22e	[PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register Fix a few clobbers to include the return register. The clobbers set is the set of all registers modified (or may be modified) by the code snippet, regardless of whether it was deliberate or accidental. Also, make sure that callsites which are used in contexts which don't allow clobbers actually save and restore all clobberable registers. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	d582203578	[PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure Use patch type identifiers derived from the offset of the operation in the paravirt_ops structure. This avoids having to maintain a separate enum for patch site types. Also, since the identifier is derived from the offset into paravirt_ops, the offset can be derived from the identifier. This is used to remove replicated information in the various callsite macros, which has been a source of bugs in the past. This patch also drops the fused save_fl+cli operation, which doesn't really add much and makes things more complex - specifically because it breaks the 1:1 relationship between identifiers and offsets. If this operation turns out to be particularly beneficial, then the right answer is to define a new entrypoint for it. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	98de032b68	[PATCH] i386: PARAVIRT: rename struct paravirt_patch to paravirt_patch_site for clarity Rename struct paravirt_patch to paravirt_patch_site, so that it clearly refers to a callsite, and not the patch which may be applied to that callsite. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	d6dd61c831	[PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction Add hooks to allow a paravirt implementation to track the lifetime of an mm. Paravirtualization requires three hooks, but only two are needed in common code. They are: arch_dup_mmap, which is called when a new mmap is created at fork arch_exit_mmap, which is called when the last process reference to an mm is dropped, which typically happens on exit and exec. The third hook is activate_mm, which is called from the arch-specific activate_mm() macro/function, and so doesn't need stub versions for other architectures. It's called when an mm is first used. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: linux-arch@vger.kernel.org Cc: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	5311ab62cd	[PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing Normally when running in PAE mode, the 4th PMD maps the kernel address space, which can be shared among all processes (since they all need the same kernel mappings). Xen, however, does not allow guests to have the kernel pmd shared between page tables, so parameterize pgtable.c to allow both modes of operation. There are several side-effects of this. One is that vmalloc will update the kernel address space mappings, and those updates need to be propagated into all processes if the kernel mappings are not intrinsically shared. In the non-PAE case, this is done by maintaining a pgd_list of all processes; this list is used when all process pagetables must be updated. pgd_list is threaded via otherwise unused entries in the page structure for the pgd, which means that the pgd must be page-sized for this to work. Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE pgd to page aligned anyway, so this patch forces the pgd to be page aligned+sized when the kernel pmd is unshared, to accomodate both these requirements. Also, since there may be several distinct kernel pmds (if the user/kernel split is below 3G), there's no point in allocating them from a slab cache; they're just allocated with get_free_page and initialized appropriately. (Of course the could be cached if there is just a single kernel pmd - which is the default with a 3G user/kernel split - but it doesn't seem worthwhile to add yet another case into this code). [ Many thanks to wli for review comments. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	90caccb975	[PATCH] i386: PARAVIRT: Allocate a fixmap slot Allocate a fixmap slot for use by a paravirt_ops implementation. This is intended for early-boot bootstrap mappings. Once the zones and allocator have been set up, it would be better to use get_vm_area() to allocate some virtual space. Xen uses this to map the hypervisor's shared info page, which doesn't have a pseudo-physical page number, and therefore can't be mapped ordinarily. It is needed early because it contains the vcpu state, including the interrupt mask. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	b239fb2501	[PATCH] i386: PARAVIRT: Hooks to set up initial pagetable This patch introduces paravirt_ops hooks to control how the kernel's initial pagetable is set up. In the case of a native boot, the very early bootstrap code creates a simple non-PAE pagetable to map the kernel and physical memory. When the VM subsystem is initialized, it creates a proper pagetable which respects the PAE mode, large pages, etc. When booting under a hypervisor, there are many possibilities for what paging environment the hypervisor establishes for the guest kernel, so the constructon of the kernel's pagetable depends on the hypervisor. In the case of Xen, the hypervisor boots the kernel with a fully constructed pagetable, which is already using PAE if necessary. Also, Xen requires particular care when constructing pagetables to make sure all pagetables are always mapped read-only. In order to make this easier, kernel's initial pagetable construction has been changed to only allocate and initialize a pagetable page if there's no page already present in the pagetable. This allows the Xen paravirt backend to make a copy of the hypervisor-provided pagetable, allowing the kernel to establish any more mappings it needs while keeping the existing ones. A slightly subtle point which is worth highlighting here is that Xen requires all kernel mappings to share the same pte_t pages between all pagetables, so that updating a kernel page's mapping in one pagetable is reflected in all other pagetables. This makes it possible to allocate a page and attach it to a pagetable without having to explicitly enumerate that page's mapping in all pagetables. And: +From: "Eric W. Biederman" <ebiederm@xmission.com> If we don't set the leaf page table entries it is quite possible that will inherit and incorrect page table entry from the initial boot page table setup in head.S. So we need to redo the effort here, so we pick up PSE, PGE and the like. Hypervisors like Xen require that their page tables be read-only, which is slightly incompatible with our low identity mappings, however I discussed this with Jeremy he has modified the Xen early set_pte function to avoid problems in this area. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	3dc494e86d	[PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries Add a set of accessors to pack, unpack and modify page table entries (at all levels). This allows a paravirt implementation to control the contents of pgd/pmd/pte entries. For example, Xen uses this to convert the (pseudo-)physical address into a machine address when populating a pagetable entry, and converting back to pphys address when an entry is read. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	4587623360	[PATCH] i386: PARAVIRT: use paravirt_nop to consistently mark no-op operations Add a _paravirt_nop function for use as a stub for no-op operations, and paravirt_nop #defined void * version to make using it easier (since all its uses are as a void ). This is useful to allow the patcher to automatically identify noop operations so it can simply nop out the callsite. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> [mingo] but only as a cleanup of the current open-coded (void ) casts. My problem with this is that it loses the types. Not that there is much to check for, but still, this adds some assumptions about how function calls look like	2007-05-02 19:27:13 +02:00
Rusty Russell	a75c54f933	[PATCH] i386: i386 separate hardware-defined TSS from Linux additions On Thu, 2007-03-29 at 13:16 +0200, Andi Kleen wrote: > Please clean it up properly with two structs. Not sure about this, now I've done it. Running it here. If you like it, I can do x86-64 as well. == lguest defines its own TSS struct because the "struct tss_struct" contains linux-specific additions. Andi asked me to split the struct in processor.h. Unfortunately it makes usage a little awkward. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	d0175ab644	[PATCH] i386: Remove smp_alt_instructions The .smp_altinstructions section and its corresponding symbols are completely unused, so remove them. Also, remove stray #ifdef __KENREL__ in asm-i386/alternative.h Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:13 +02:00
H. Peter Anvin	4bc5aa91fb	[PATCH] x86: Clean up x86 control register and MSR macros (corrected) This patch is based on Rusty's recent cleanup of the EFLAGS-related macros; it extends the same kind of cleanup to control registers and MSRs. It also unifies these between i386 and x86-64; at least with regards to MSRs, the two had definitely gotten out of sync. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Andi Kleen	f039b75471	[PATCH] x86: Don't use MWAIT on AMD Family 10 It doesn't put the CPU into deeper sleep states, so it's better to use the standard idle loop to save power. But allow to reenable it anyways for benchmarking. I also removed the obsolete idle=halt on i386 Cc: andreas.herrmann@amd.com Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	c169859d6d	[PATCH] x86-64: Clean up asm-x86_64/bugs.h Most of asm-x86_64/bugs.h is code which should be in a C file, so put it there. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	1dbf527c51	[PATCH] i386: Make COMPAT_VDSO runtime selectable. Now that relocation of the VDSO for COMPAT_VDSO users is done at runtime rather than compile time, it is possible to enable/disable compat mode at runtime. This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the kernel command line, or via sysctl. (Switching on a running system shouldn't be done lightly; any process which was relying on the compat VDSO will be upset if it goes away.) The COMPAT_VDSO config option still exists, but if enabled it just makes vdso_enabled default to VDSO_COMPAT. +From: Hugh Dickins <hugh@veritas.com> Fix oops from i386-make-compat_vdso-runtime-selectable.patch. Even mingetty at system startup finds it easy to trigger an oops while reading /proc/PID/maps: though it has a good hold on the mm itself, that cannot stop exit_mm() from resetting tsk->mm to NULL. (It is usually show_map()'s call to get_gate_vma() which oopses, and I expect we could change that to check priv->tail_vma instead; but no matter, even m_start()'s call just after get_task_mm() is racy.) Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	d4f7a2c18e	[PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Some versions of libc can't deal with a VDSO which doesn't have its ELF headers matching its mapped address. COMPAT_VDSO maps the VDSO at a specific system-wide fixed address. Previously this was all done at build time, on the grounds that the fixed VDSO address is always at the top of the address space. However, a hypervisor may reserve some of that address space, pushing the fixmap address down. This patch does the adjustment dynamically at runtime, depending on the runtime location of the VDSO fixmap. [ Patch has been through several hands: Jan Beulich wrote the orignal version; Zach reworked it, and Jeremy converted it to relocate phdrs as well as sections. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	a6c4e076ee	[PATCH] i386: clean up identify_cpu identify_cpu() is used to identify both the boot CPU and secondary CPUs, but it performs some actions which only apply to the boot CPU. Those functions are therefore really __init functions, but because they're called by identify_cpu(), they must be marked __cpuinit. This patch splits identify_cpu() into identify_boot_cpu() and identify_secondary_cpu(), and calls the appropriate init functions from each. Also, identify_boot_cpu() and all the functions it dominates are marked __init. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	1353ebb4b4	[PATCH] i386: Clean up asm-i386/bugs.h Most of asm-i386/bugs.h is code which should be in a C file, so put it there. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-02 19:27:12 +02:00
Avi Kivity	bbf30a1650	[PATCH] x86-64: fix arithmetic in comment The xmm space on x86_64 is 256 bytes. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Andi Kleen	5d02d7ae73	[PATCH] x86-64: Use X86_EFLAGS_IF in x86-64/irqflags.h. As per i386 patch: move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we can include it from irqflags.h and use it in raw_irqs_disabled_flags(). As a side-effect, we could now use these flags in .S files. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:11 +02:00
Jan Beulich	b92e9fac40	[PATCH] x86: fix amd64-agp aperture validation Under CONFIG_DISCONTIGMEM, assuming that a !pfn_valid() implies all subsequent pfn-s are also invalid is wrong. Thus replace this by explicitly checking against the E820 map. AK: make e820 on x86-64 not initdata Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>	2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge	b00742d399	[PATCH] x86-64: Account for module percpu space separately from kernel percpu Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common to all architectures; if an architecture wants to set PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is the only one which does). Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge	07f3331c6b	[PATCH] i386: Add machine_ops interface to abstract halting and rebooting machine_ops is an interface for the machine_* functions defined in <linux/reboot.h>. This is intended to allow hypervisors to intercept the reboot process, but it could be used to implement other x86 subarchtecture reboots. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge	01a2f43556	[PATCH] i386: Add smp_ops interface Add a smp_ops interface. This abstracts the API defined by <linux/smp.h> for use within arch/i386. The primary intent is that it be used by a paravirtualizing hypervisor to implement SMP, but it could also be used by non-APIC-using sub-architectures. This is related to CONFIG_PARAVIRT, but is implemented unconditionally since it is simpler that way and not a highly performance-sensitive interface. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>	2007-05-02 19:27:11 +02:00
Rusty Russell	4fbb596881	[PATCH] i386: cleanup GDT Access Now we have an explicit per-cpu GDT variable, we don't need to keep the descriptors around to use them to find the GDT: expose cpu_gdt directly. We could go further and make load_gdt() pack the descriptor for us, or even assume it means "load the current cpu's GDT" which is what it always does. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:11 +02:00
Adrian Bunk	ca906e4231	[PATCH] x86: sys_ioperm() prototype cleanup - there's no reason for duplicating the prototype from include/linux/syscalls.h in include/asm-x86_64/unistd.h - every file should #include the headers containing the prototypes for it's global functions Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:10 +02:00
Christoph Lameter	2bff73830c	[PATCH] x86-64: use lru instead of page->index and page->private for pgd lists management. x86_64 currently simulates a list using the index and private fields of the page struct. Seems that the code was inherited from i386. But x86_64 does not use the slab to allocate pgds and pmds etc. So the lru field is not used by the slab and therefore available. This patch uses standard list operations on page->lru to realize pgd tracking. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Andi Kleen	b4531e863d	[PATCH] i386: Use X86_EFLAGS_IF in irqflags.h. Move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we can include it from irqflags.h and use it in raw_irqs_disabled_flags(). As a side-effect, we could now use these flags in .S files. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:10 +02:00
Jan Beulich	6fb14755a6	[PATCH] x86: tighten kernel image page access rights On x86-64, kernel memory freed after init can be entirely unmapped instead of just getting 'poisoned' by overwriting with a debug pattern. On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table can also be write-protected. Compared to the first version, this one prevents re-creating deleted mappings in the kernel image range on x86-64, if those got removed previously. This, together with the original changes, prevents temporarily having inconsistent mappings when cacheability attributes are being changed on such pages (e.g. from AGP code). While on i386 such duplicate mappings don't exist, the same change is done there, too, both for consistency and because checking pte_present() before using various other pte_XXX functions is a requirement anyway. At once, i386 code gets adjusted to use pte_huge() instead of open coding this. AK: split out cpa() changes Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:10 +02:00
Jan Beulich	d01ad8dd56	[PATCH] x86: Improve handling of kernel mappings in change_page_attr Fix various broken corner cases in i386 and x86-64 change_page_attr. AK: split off from tighten kernel image access rights Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:10 +02:00
Rusty Russell	90a0a06aa8	[PATCH] i386: rationalize paravirt wrappers paravirt.c used to implement native versions of all low-level functions. Far cleaner is to have the native versions exposed in the headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply #define XXX native_XXX. There are several nice side effects: 1) write_dt_entry() now takes the correct "struct Xgt_desc_struct " not "void ". 2) load_TLS is reintroduced to the for loop, not manually unrolled with a #error in case the bounds ever change. 3) Macros become inlines, with type checking. 4) Access to the native versions is trivial for KVM, lguest, Xen and others who might want it. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@muc.de> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Rusty Russell	d2cbcc49e2	[PATCH] i386: clean up cpu_init() We now have cpu_init() and secondary_cpu_init() doing nothing but calling _cpu_init() with the same arguments. Rename _cpu_init() to cpu_init() and use it as a replcement for secondary_cpu_init(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Rusty Russell	bf50467204	[PATCH] i386: Use per-cpu GDT immediately upon boot Now we are no longer dynamically allocating the GDT, we don't need the "cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the per-cpu GDT. This means initializing the cpu_gdt array in C. The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it switches to the per-cpu copy just allocated. For secondary CPUs, the early_gdt_descr is set to point directly to their per-cpu copy. For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP, but we never have to move. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Rusty Russell	ae1ee11be7	[PATCH] i386: Use per-cpu variables for GDT, PDA Allocating PDA and GDT at boot is a pain. Using simple per-cpu variables adds happiness (although we need the GDT page-aligned for Xen, which we do in a followup patch). [akpm@linux-foundation.org: build fix] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Ian Campbell	79e030114a	[PATCH] i386: Allow i386 crash kernels to handle x86_64 dumps The specific case I am encountering is kdump under Xen with a 64 bit hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to the hypervisor but the dump kernel is 32 bit for maximum compatibility. It's possibly less likely to be useful in a purely native scenario but I see no reason to disallow it. [akpm@linux-foundation.org: build fix] Signed-off-by: Ian Campbell <ian.campbell@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Horms <horms@verge.net.au> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
Rusty Russell	eab0c72aec	[PATCH] x86-64: Introduce load_TLS to the "for" loop. GCC (4.1 at least) unrolls it anyway, but I can't believe this code was ever justifiable. (I've also submitted a patch which cleans up i386, which is even uglier). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
Rusty Russell	692174b97d	[PATCH] i386: Initialize esp0 properly all the time Whenever we schedule, __switch_to calls load_esp0 which does: tss->esp0 = thread->esp0; This is never initialized for the initial thread (ie "swapper"), so when we're scheduling that, we end up setting esp0 to 0. This is fine: the swapper never leaves ring 0, so this field is never used. lguest, however, gets upset that we're trying to used an unmapped page as our kernel stack. Rather than work around it there, let's initialize it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
David Rientjes	8b8ca80e19	[PATCH] x86-64: configurable fake numa node sizes Extends the numa=fake x86_64 command-line option to allow for configurable node sizes. These nodes can be used in conjunction with cpusets for coarse memory resource management. The old command-line option is still supported: numa=fake=32 gives 32 fake NUMA nodes, ignoring the NUMA setup of the actual machine. But now you may configure your system for the node sizes of your choice: numa=fake=2512,1024,2256 gives two 512M nodes, one 1024M node, two 256M nodes, and the rest of system memory to a sixth node. The existing hash function is maintained to support the various node sizes that are possible with this implementation. Each node of the same size receives roughly the same amount of available pages, regardless of any reserved memory with its address range. The total available pages on the system is calculated and divided by the number of equal nodes to allocate. These nodes are then dynamically allocated and their borders extended until such time as their number of available pages reaches the required size. Configurable node sizes are recommended when used in conjunction with cpusets for memory control because it eliminates the overhead associated with scanning the zonelists of many smaller full nodes on page_alloc(). Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
john stultz	5a90cf205c	[PATCH] x86: Log reason why TSC was marked unstable Change mark_tsc_unstable() so it takes a string argument, which holds the reason the TSC was marked unstable. This is then displayed the first time mark_tsc_unstable is called. This should help us better debug why the TSC was marked unstable on certain systems and allow us to make sure we're not being overly paranoid when throwing out this troublesome clocksource. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:08 +02:00
Vivek Goyal	1833d6bc72	[PATCH] i386: modpost apic related warning fixes o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' (at offset 0xc0101eda) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' (at offset 0xc0101ef0) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from .text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 0xc0101ec7) and 'acpi_madt_oem_check' WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 'check_apicid_present' o Some functions which are inline (acpi_madt_oem_check) are not inlined by compiler as these functions are accessed using function pointer. These functions are put in .text section and they in-turn access __init type functions hence modpost generates warnings. o Do not iniline acpi_madt_oem_check, instead make it __init. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:08 +02:00
Ravikiran G Thirumalai	e073ae1b34	[PATCH] x86-64: Set HASHDIST_DEFAULT to 1 for x86_64 NUMA Enable system hashtable memory to be distributed among nodes on x86_64 NUMA Forcing the kernel to use node interleaved vmalloc instead of bootmem for the system hashtable memory (alloc_large_system_hash) reduces the memory imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box: Before the following patch, on bootup of a 8 node box: Node 0 MemTotal: 3407488 kB Node 0 MemFree: 3206296 kB Node 0 MemUsed: 201192 kB Node 0 Active: 7012 kB Node 0 Inactive: 512 kB Node 0 Dirty: 0 kB Node 0 Writeback: 0 kB Node 0 FilePages: 1912 kB Node 0 Mapped: 420 kB Node 0 AnonPages: 5612 kB Node 0 PageTables: 468 kB Node 0 NFS_Unstable: 0 kB Node 0 Bounce: 0 kB Node 0 Slab: 5408 kB Node 0 SReclaimable: 644 kB Node 0 SUnreclaim: 4764 kB After the patch (or using hashdist=1 on the kernel command line): Node 0 MemTotal: 3407488 kB Node 0 MemFree: 3247608 kB Node 0 MemUsed: 159880 kB Node 0 Active: 3012 kB Node 0 Inactive: 616 kB Node 0 Dirty: 0 kB Node 0 Writeback: 0 kB Node 0 FilePages: 2424 kB Node 0 Mapped: 380 kB Node 0 AnonPages: 1200 kB Node 0 PageTables: 396 kB Node 0 NFS_Unstable: 0 kB Node 0 Bounce: 0 kB Node 0 Slab: 6304 kB Node 0 SReclaimable: 1596 kB Node 0 SUnreclaim: 4708 kB I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA since x86_64 has no dearth of vmalloc space? Or maybe enable hash distribution for all 64bit NUMA arches? The following patch does it only for x86_64. I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of memory and uses up tlb entries. This was on a 4 way, 2 socket Tyan AMD box (non vsmp), with 8G total memory (4G pernode). The results with and without hash distribution are: 1. Vanilla - runtime of 1188.000s 2. With hashdist=1 runtime of 1154.000s Oprofile output for the duration of run is: 1. Vanilla: PU: AMD64 processors, speed 2411.16 MHz (estimated) Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 500 samples % app name symbol name 163054 6.5513 libansys1.so MultiFront::decompose(int, int, Elemset , int , int, int, int) 162061 6.5114 libansys3.so blockSaxpy6L_fd 162042 6.5107 libansys3.so blockInnerProduct6L_fd 156286 6.2794 libansys3.so maxb33_ 87879 3.5309 libansys1.so elmatrixmultpcg_ 84857 3.4095 libansys4.so saxpy_pcg 58637 2.3560 libansys4.so .st4560 46612 1.8728 libansys4.so .st4282 43043 1.7294 vmlinux-t copy_user_generic_string 41326 1.6604 libansys3.so blockSaxpyBackSolve6L_fd 41288 1.6589 libansys3.so blockInnerProductBackSolve6L_fd 2. With hashdist=1 CPU: AMD64 processors, speed 2411.13 MHz (estimated) Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 500 samples % app name symbol name 162993 6.9814 libansys1.so MultiFront::decompose(int, int, Elemset , int , int, int, int) 160799 6.8874 libansys3.so blockInnerProduct6L_fd 160459 6.8729 libansys3.so blockSaxpy6L_fd 156018 6.6826 libansys3.so maxb33_ 84700 3.6279 libansys4.so saxpy_pcg 83434 3.5737 libansys1.so elmatrixmultpcg_ 58074 2.4875 libansys4.so .st4560 46000 1.9703 libansys4.so .st4282 41166 1.7632 libansys3.so blockSaxpyBackSolve6L_fd 41033 1.7575 libansys3.so blockInnerProductBackSolve6L_fd 35762 1.5318 libansys1.so inner_product_sub 35591 1.5245 libansys1.so inner_product_sub2 28259 1.2104 libansys4.so addVectors Signed-off-by: Pravin B. Shelar <pravin.shelar@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Christoph Lameter <clameter@engr.sgi.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:08 +02:00
Andrew Morton	184c44d204	[PATCH] x86-64: fix x86_64-mm-sched-clock-share Fix for the following patch. Provide dummy cpufreq functions when CPUFREQ is not compiled in. Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:08 +02:00
Vivek Goyal	6a50a664ca	[PATCH] x86-64: build-time checking o X86_64 kernel should run from 2MB aligned address for two reasons. - Performance. - For relocatable kernels, page tables are updated based on difference between compile time address and load time physical address. This difference should be multiple of 2MB as kernel text and data is mapped using 2MB pages and PMD should be pointing to a 2MB aligned address. Life is simpler if both compile time and load time kernel addresses are 2MB aligned. o Flag the error at compile time if one is trying to build a kernel which does not meet alignment restrictions. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:08 +02:00
Vivek Goyal	1ab60e0f72	[PATCH] x86-64: Relocatable Kernel Support This patch modifies the x86_64 kernel so that it can be loaded and run at any 2M aligned address, below 512G. The technique used is to compile the decompressor with -fPIC and modify it so the decompressor is fully relocatable. For the main kernel the page tables are modified so the kernel remains at the same virtual address. In addition a variable phys_base is kept that holds the physical address the kernel is loaded at. __pa_symbol is modified to add that when we take the address of a kernel symbol. When loaded with a normal bootloader the decompressor will decompress the kernel to 2M and it will run there. This both ensures the relocation code is always working, and makes it easier to use 2M pages for the kernel and the cpu. AK: changed to not make RELOCATABLE default in Kconfig Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	0dbf7028c0	[PATCH] x86: __pa and __pa_symbol address space separation Currently __pa_symbol is for use with symbols in the kernel address map and __pa is for use with pointers into the physical memory map. But the code is implemented so you can usually interchange the two. __pa which is much more common can be implemented much more cheaply if it is it doesn't have to worry about any other kernel address spaces. This is especially true with a relocatable kernel as __pa_symbol needs to peform an extra variable read to resolve the address. There is a third macro that is added for the vsyscall data __pa_vsymbol for finding the physical addesses of vsyscall pages. Most of this patch is simply sorting through the references to __pa or __pa_symbol and using the proper one. A little of it is continuing to use a physical address when we have it instead of recalculating it several times. swapper_pgd is now NULL. leave_mm now uses init_mm.pgd and init_mm.pgd is initialized at boot (instead of compile time) to the physmem virtual mapping of init_level4_pgd. The physical address changed. Except for the for EMPTY_ZERO page all of the remaining references to __pa_symbol appear to be during kernel initialization. So this should reduce the cost of __pa in the common case, even on a relocated kernel. As this is technically a semantic change we need to be on the lookout for anything I missed. But it works for me (tm). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	cfd243d4af	[PATCH] x86-64: Remove the identity mapping as early as possible With the rewrite of the SMP trampoline and the early page allocator there is nothing that needs identity mapped pages, once we start executing C code. So add zap_identity_mappings into head64.c and remove zap_low_mappings() from much later in the code. The functions are subtly different thus the name change. This also kills boot_level4_pgt which was from an earlier attempt to move the identity mappings as early as possible, and is now no longer needed. Essentially I have replaced boot_level4_pgt with trampoline_level4_pgt in trampoline.S Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	7db681d7e4	[PATCH] x86-64: wakeup.S rename registers to reflect right names o Use appropriate names for 64bit regsiters. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	3c321bceb4	[PATCH] x86-64: Add EFER to the register set saved by save_processor_state EFER varies like %cr4 depending on the cpu capabilities, and which cpu capabilities we want to make use of. So save/restore it make certain we have the same EFER value when we are done. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	30f4728954	[PATCH] x86-64: cleanup segments Move __KERNEL32_CS up into the unused gdt entry. __KERNEL32_CS is used when entering the kernel so putting it first is useful when trying to keep boot gdt sizes to a minimum. Set the accessed bit on all gdt entries. We don't care so there is no need for the cpu to burn the extra cycles, and it potentially allows the pages to be immutable. Plus it is confusing when debugging and your gdt entries mysteriously change. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	67dcbb6bc6	[PATCH] x86-64: Clean up the early boot page table - Merge physmem_pgt and ident_pgt, removing physmem_pgt. The merge is broken as soon as mm/init.c:init_memory_mapping is run. - As physmem_pgt is gone don't export it in pgtable.h. - Use defines from pgtable.h for page permissions. - Fix the physical memory identity mapping so it is at the correct address. - Remove the physical memory mapping from wakeup_level4_pgt it is at the wrong address so we can't possibly be usinging it. - Simply NEXT_PAGE the work to calculate the phys_ alias of the labels was very cool. Unfortuantely it was a brittle special purpose hack that makes maitenance more difficult. Instead just use label - __START_KERNEL_map like we do everywhere else in assembly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Vivek Goyal	9d291e787b	[PATCH] x86-64: Assembly safe page.h and pgtable.h This patch makes pgtable.h and page.h safe to include in assembly files like head.S. Allowing us to use symbolic constants instead of hard coded numbers when refering to the page tables. This patch copies asm-sparc64/const.h to asm-x86_64 to get a definition of _AC() a very convinient macro that allows us to force the type when we are compiling the code in C and to drop all of the type information when we are using the constant in assembly. Previously this was done with multiple definition of the same constant. const.h was modified slightly so that it works when given CONFIG options as arguments. This patch adds #ifndef __ASSEMBLY__ ... #endif and _AC(1,UL) where appropriate so the assembler won't choke on the header files. Otherwise nothing should have changed. AK: added const.h to exported headers to fix headers_check Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Stephen Hemminger	e658450455	[PATCH] x86-64: dma_ops as const The dma_ops structure can be const since it never changes after boot. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Joerg Roedel	6b37f5a20c	[PATCH] x86-64: fix cpu MHz reporting on constant_tsc cpus This patch fixes the reporting of cpu_mhz in /proc/cpuinfo on CPUs with a constant TSC rate and a kernel with disabled cpufreq. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Andi Kleen <ak@suse.de> arch/x86_64/kernel/apic.c \| 2 - arch/x86_64/kernel/time.c \| 58 +++++++++++++++++++++++++++++++++++++++--- arch/x86_64/kernel/tsc.c \| 12 +++++--- arch/x86_64/kernel/tsc_sync.c \| 2 - include/asm-x86_64/proto.h \| 1 5 files changed, 65 insertions(+), 10 deletions(-)	2007-05-02 19:27:06 +02:00
Glauber de Oliveira Costa	fbc16f2c2a	[PATCH] x86-64: Remove duplicated code for reading control registers On Tue, Mar 13, 2007 at 05:33:09AM -0700, Randy.Dunlap wrote: > On Tue, 13 Mar 2007, Glauber de Oliveira Costa wrote: > > > Tiny cleanup: > > > > In x86_64, the same functions for reading cr3 and writing cr{3,4} are > > defined in tlbflush.h and system.h, whith just a name change. > > The only difference is the clobbering of memory, which seems a safe, and > > even needed change for the write_cr4. This patch removes the duplicate. > > write_cr3() is moved to system.h for consistency. > > missing patch..... > thanks. Attached now -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Rusty Russell	f9d09645d6	[PATCH] x86-64: Remove unused set_seg_base The set_seg_base function isn't used anywhere (2.6.21-rc3-git1) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Jeremy Fitzhardinge	973efae21b	[PATCH] i386: clean up mach_reboot_fixups The reboot_fixups stuff seems to be a bit of a mess, specifically the header is in linux/ when its a purely i386-specific piece of code. I'm not sure why it has its config option; its only currently needed for "geode-gx1/cs5530a", so perhaps whatever config option controls that hardware should enable this? Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Aneesh Kumar K.V	6d1c426158	[PATCH] i386: Update __copy_to_user_inatomic linuxdoc description Explicity specify that the caller should pin the user memory otherwise the function will sleep Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Prarit Bhargava	86c0baf123	[PATCH] i386: Change sysenter_setup to __cpuinit & improve __INIT, __INITDATA Change sysenter_setup to __cpuinit. Change __INIT & __INITDATA to be cpu hotplug aware. Resolve MODPOST warnings similar to: WARNING: vmlinux - Section mismatch: reference to .init.text:sysenter_setup from .text between 'identify_cpu' (at offset 0xc040a380) and 'detect_ht' and WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_end from .text between 'sysenter_setup' (at offset 0xc041a269) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_start from .text between 'sysenter_setup' (at offset 0xc041a26e) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_sysenter_end from .text between 'sysenter_setup' (at offset 0xc041a275) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_sysenter_start from .text between 'sysenter_setup' (at offset 0xc041a27a) and 'enable_sep_cpu' Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Andi Kleen	803d80f650	[PATCH] x86-64: Some cleanup in time.c Move prototypes into header files Remove unneeded includes. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Simon Arlott	0949be3509	[PATCH] i386: Add an option for the VIA C7 which sets appropriate L1 cache The VIA C7 is a 686 (with TSC) that supports MMX, SSE and SSE2, it also has a cache line length of 64 according to http://www.digit-life.com/articles2/cpu/rmma-via-c7.html. This patch sets gcc to -march=686 and select s the correct cache shift. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:05 +02:00
takada	f5e8861583	[PATCH] i386: pit_latch_buggy has no effect Eliminated the arch/i386/kernel/timers in 2.6.18, use clocksoures instead. pit_latch_buggy was referred in timers/timer_tsc.c, and currently removed. Therefore nobody refer it. Until 2.6.17, MediaGX's TSC works correctly. after 2.6.18, warned "TSC appears to be running slowly. Marking it as unstable". So marked unstable TSC when CS55x0. Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Jeremy Fitzhardinge	f76c392380	[PATCH] i386: No need to use -traditional for processing asm in i386/kernel/ No need to use -traditional for processing asm in i386/kernel/ Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Jan Beulich	9964cf7d77	[PATCH] x86: consolidate smp_send_stop() Synchronize i386's smp_send_stop() with x86-64's in only try-locking the call lock to prevent deadlocks when called from panic(). In both version, disable interrupts before clearing the CPU off the online map to eliminate races with IRQ handlers inspecting this map. Also in both versions, save/restore interrupts rather than disabling/ enabling them. On x86-64, eliminate one function used here by folding it into its single caller, convert to static, and rename for consistency with i386 (lkcd may like this). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Jan Beulich	b0354795c9	[PATCH] x86-64: adjust inclusion of asm/vsyscall32.h Avoid including asm/vsyscall32.h in virtually every source file. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:04 +02:00
Jan Beulich	00f1ea6967	[PATCH] x86: adjust inclusion of asm/fixmap.h Move inclusion of asm/fixmap.h to where it is really used rather than where it may have been used long ago (requires a few other adjustments to includes due to previous implicit dependencies). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:04 +02:00
Ingo Molnar	3c43f03908	[PATCH] x86: default to physical mode on hotplug CPU kernels Default to physical mode on hotplug CPU kernels. Furher simplify and clean up the APIC initialization code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-05-02 19:27:04 +02:00
Ingo Molnar	07c7c47444	[PATCH] x86-64: always use physical delivery mode on > 8 CPUs Remove clustered APIC mode. There's little point in the use of clustered APIC mode, broadcasting is limited to within the cluster only, and chipsets have bugs in this area as well. So default to physical APIC mode when the CPU count is large, and default to logical APIC mode when the CPU count is 8 or smaller. (this patch only removes the use of genapic_cluster and cleans up the resulting genapic.c file - removal of all remaining traces of clustered mode will be done by another patch.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-05-02 19:27:04 +02:00
Andrew Morton	a86f34b49f	[PATCH] x86: revert x86_64-mm-fix-the-irqbalance-quirk-for-e7320-e7520-e7525 Obsoleted by Ingo's genapic stuff. Cc: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:04 +02:00
Andrew Morton	3dc68d9b58	[PATCH] x86-64: revert x86_64-mm-add-genapic_force This is obsoleted by new Ingo genapic patches. Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:04 +02:00
Christoph Hellwig	9f90b997de	[POWERPC] Minor fault path optimization Call the kprobes pagefault handler directly instead of going through the complex notifier chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:57:39 +10:00
Segher Boessenkool	d7e5a5462f	[RSLIB] Support non-canonical GF representations For the CAFÉ NAND controller, we need to support non-canonical representations of the Galois field. Allow the caller to provide its own function for generating the field, and CAFÉ can use rslib instead of its own implementation. Signed-off-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>	2007-05-02 11:56:33 +01:00
Srinivasa Ds	eb609e52d1	[POWERPC] Transparently handle <.symbol> lookup for kprobes When data symbols are not present in kernel image, user needs to add dot(".") before function name explicitly, that he wants to probe in kprobe module on ppc64. for ex:- When data symbols are missing on ppc64, ==================== [root@llm27lp1 ~]# cat /proc/kallsyms \| grep do_fork c00000000006283c T .do_fork ============================== User needs add "." to "do_fork" kp.symbol_name = ".do_fork"; ============================ This makes kprobe modules unportable. This fixes the problem. Signed-off-by: Srinivasa Ds <srinivasa@in.ibm.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:04:32 +10:00
Geoff Levand	dc4f60c25a	[POWERPC] PS3: Interrupt routine fixups. Fixups for the ps3 interrupt routines to support all HV device in a generic way. Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:04:31 +10:00
Johannes Berg	be9c94dd77	[POWERPC] Fix suspend states again In commit `0fba3a1f39` (a very long time ago, May 2006), I fixed a bug that caused powermacs to crash when you tried entering standby/mem suspend states. As I'm now getting more familiar with the suspend code I notice a few more things: 1. we previously misunderstood what pm_ops is for, it isn't supposed to be for doing platform dependent suspend/resume stuff that needs to be done for suspend to disk (as we currently try to use it!), it is instead for entering platform dependent suspend states ("standby", "mem"). 2. due to the first point, we never properly save FPU and altivec states when suspending to disk. It probably hasn't hurt yet because the process that writes the "disk" to /sys/power/state uses neither and its context is used. This patch addresses these points as follows: 1. remove all pm_ops from powermac, powermac suspend to ram isn't currently usable via /sys/power/state but is done via the PMU instead. 2. move the code responsible for storing FPU/altivec state into save_processor_state and the set_context() call to restore_processor_state. 3. add a call to kernel_enable_spe() It may look like there is some code removal missing but that is actually because the new suspend.h file overrides the ppc/suspend.h one which was previously used. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:04:30 +10:00
David Gibson	f88df14b1f	[POWERPC] Remove arch/powerpc's dependence on asm-ppc/pg{alloc,table}.h Currently, all 32-bit powerpc platforms use asm-ppc/pgtable.h and asm-ppc/pgalloc.h, even when otherwise compiled with ARCH=powerpc. Those asm-ppc files are a fairly nasty tangle of #ifdefs including a bunch of things which shouldn't be necessary any more in arch/powerpc. Cleaning up that mess is going to take a while, but this patch is a first step. It separates the asm-powerpc/pg{alloc,table}.h into 64 bit and 32 bit versions in asm-powerpc, which the basic .h files in asm-powerpc select based on config. We make a few tiny tweaks to the innards of the files along the way, making the outermost ifdefs (double-inclusion protection and __KERNEL__) a little cleaner, and #including asm-generic/pgtable.h from the top-level asm-powerpc/pgtable.h (since both the old 32-bit and 64-bit versions ended with such an #include). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:04:30 +10:00
David Gibson	69d48b409c	[POWERPC] Fix STRICT_MM_TYPECHECKS Since we don't have it active by default, the STRICT_MM_TYPECHECKS option has bitrotted again. This patch fixes a couple of simple build fixes if the option is selected. First, pud_t mustn't be defined in page.h on 32-bit systems, because it conflicts with the version in the generic pud-folding code. Second, pci_32.c is missing a __pgprot() wrapper call. Third, a couple of PS3 files use constants of type pgprot_t when they need the raw values, we add pgprot_val() calls to fix this. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:04:30 +10:00
David Gibson	57d7909e0d	[POWERPC] Revise PPC44x MMU code for arch/powerpc This patch takes the definitions for the PPC44x MMU (a software loaded TLB) from asm-ppc/mmu.h, cleans them up of things no longer necessary in arch/powerpc and puts them in a new asm-powerpc/mmu_44x.h file. It also substantially simplifies arch/powerpc/mm/44x_mmu.c and makes a couple of small fixes necessary for the 44x MMU code to build and work properly in arch/powerpc. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:04:29 +10:00
Christian Krafft	c3e8011ad1	[POWERPC] Uninline of_iomap function There is no big reason to have that function inlined. Signed-off-by: Christian Krafft <krafft@de.ibm.com> Acked-by: Arnd Bergmann <arnd.bergmann@de.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 20:04:29 +10:00
Mathieu Desnoyers	c0b3ae14f1	[POWERPC] Move of_irq_to_resource from prom.h to prom_parse.c In the powerpc architecture, of_irq_to_resource, currently sitting in prom.h, needs irq_of_parse_and_map and NO_IRQ from asm-powerpc/irq.h. The solution suggested by Benjamin Herrenschmidt is to move it to arch/powerpc/kernel/prom_parse.c. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 16:40:55 +10:00
Benjamin Herrenschmidt	0eb2e6019a	[POWERPC] pmac_feature_call checks platform This patch makes sure that a caller of pmac_call_feature() won't try to call into ppc_md.feature_call of another platform, which might happen if some powermac drivers are loaded on non-powermac machines. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-05-02 16:35:01 +10:00
Herbert Xu	e196d62591	[CRYPTO] api: Add ablkcipher_request_set_tfm This patch adds ablkcipher_request_set_tfm for those users that need to manage the memory for ablkcipher requests directly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:33 +10:00
Herbert Xu	124b53d020	[CRYPTO] cryptd: Add software async crypto daemon This patch adds the cryptd module which is a template that takes a synchronous software crypto algorithm and converts it to an asynchronous one by executing it in a kernel thread. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:32 +10:00
Herbert Xu	a73e69965f	[CRYPTO] api: Do not remove users unless new algorithm matches As it is whenever a new algorithm with the same name is registered users of the old algorithm will be removed so that they can take advantage of the new algorithm. This presents a problem when the new algorithm is not equivalent to the old algorithm. In particular, the new algorithm might only function on top of the existing one. Hence we should not remove users unless they can make use of the new algorithm. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:32 +10:00
Herbert Xu	b5b7f08869	[CRYPTO] api: Add async blkcipher type This patch adds the mid-level interface for asynchronous block ciphers. It also includes a generic queueing mechanism that can be used by other asynchronous crypto operations in future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:31 +10:00
Herbert Xu	ebc610e5bc	[CRYPTO] templates: Pass type/mask when creating instances This patch passes the type/mask along when constructing instances of templates. This is in preparation for templates that may support multiple types of instances depending on what is requested. For example, the planned software async crypto driver will use this construct. For the moment this allows us to check whether the instance constructed is of the correct type and avoid returning success if the type does not match. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:31 +10:00
Herbert Xu	32e3983fe5	[CRYPTO] api: Add async block cipher interface This patch adds the frontend interface for asynchronous block ciphers. In addition to the usual block cipher parameters, there is a callback function pointer and a data pointer. The callback will be invoked only if the encrypt/decrypt handlers return -EINPROGRESS. In other words, if the return value of zero the completion handler (or the equivalent code) needs to be invoked by the caller. The request structure is allocated and freed by the caller. Its size is determined by calling crypto_ablkcipher_reqsize(). The helpers ablkcipher_request_alloc/ablkcipher_request_free can be used to manage the memory for a request. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:30 +10:00
Haavard Skinnemoen	1c23af90dc	i2c: Bitbanging I2C bus driver using the GPIO API This is a very simple bitbanging I2C bus driver utilizing the new arch-neutral GPIO API. Useful for chips that don't have a built-in I2C controller, additional I2C busses, or testing purposes. To use, include something similar to the following in the board-specific setup code: #include <linux/i2c-gpio.h> static struct i2c_gpio_platform_data i2c_gpio_data = { .sda_pin = GPIO_PIN_FOO, .scl_pin = GPIO_PIN_BAR, }; static struct platform_device i2c_gpio_device = { .name = "i2c-gpio", .id = 0, .dev = { .platform_data = &i2c_gpio_data, }, }; Register this platform_device, set up the I2C pins as GPIO if required and you're ready to go. This will use default values for udelay and timeout, and will work with GPIO hardware that does not support open drain mode, but allows sensing of the SDA and SCL lines even when they are being driven. Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:34 +02:00
Jean Delvare	b86a1bc8e3	i2c: Restore i2c_smbus_read_block_data Add back the i2c_smbus_read_block_data helper function, it is needed by the upcoming lm93 hardware monitoring driver and possibly others. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:34 +02:00
Jean Delvare	424ed67c7d	i2c-algo-bit: Implement a 50/50 SCL duty cycle The original i2c-algo-bit implementation uses a 33/66 SCL duty cycle when bits are being written on the bus. While the I2C specification doesn't forbid it, this prevents us from driving the I2C bus to its max speed, limiting us to 66 kbps max on standard I2C busses. Implementing a 50/50 duty cycle instead lets us max out the bandwidth up to the theoretical max of 100 kbps on standard I2C busses. This is particularly important when large amounts of data need to be transfered over the bus, as is the case with some TV adapters when the firmware is being uploaded. In fact this change even allows, at least in theory, fast-mode I2C support at 125, 166 and 250 kbps. There's no way to reach the theoretical max of 400 kbps with this implementation. But I don't think we want to put efforts in that direction anyway: software-driven I2C is very CPU-intensive and bad for latency. Other timing changes: * Don't set SDA high explicitly on error, we're going to issue a stop condition before we leave anyway. * If an error occurs when sending the slave address, yield the CPU before retrying, and remove the additional delay after the new start condition. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:33 +02:00
Bryan Wu	d24ecfcc39	i2c: Blackfin Two Wire Interface driver The i2c linux driver for blackfin architecture which supports blackfin on-chip TWI controller i2c operation. Signed-off-by: Bryan Wu <bryan.wu@analog.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:32 +02:00
Jean Delvare	b3e820968a	i2c: Make i2c_del_driver a void function Make i2c_del_driver a void function, like all other driver removal functions. It always returned 0 even when errors occured, and nobody ever actually checked the return value anyway. And we cannot fail a module removal anyway. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:32 +02:00
Jean Delvare	a97f1ed090	i2c: Move i2c-isa-only exported symbol declarations Move the declaration of i2c-isa-only exported symbols to i2c-isa itself, that's the best way to ensure nobody will attempt to use them. Hopefully we'll get rid of the exports themselves soon anyway. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:32 +02:00
Jean Delvare	12b5053ac5	i2c: Add i2c_new_probed_device() Add a new helper function to instantiate an i2c device. It is meant as a replacement for i2c_new_device() when you don't know for sure at which address your I2C/SMBus device lives. This happens frequently on TV adapters for example, you know there is a tuner chip on the bus, but depending on the exact board model and revision, it can live at different addresses. So, the new i2c_new_probed_device() function will probe the bus according to a list of addresses, and as soon as one of these addresses responds, it will call i2c_new_device() on that one address. This function will make it possible to port the old i2c drivers to the new model quickly. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:31 +02:00
Jean Delvare	0f3b483852	i2c-algo-bit: Add i2c_bit_add_numbered_bus Add i2c_bit_add_numbered_bus(), which is equivalent to i2c_bit_add_bus except that it calls i2c_add_numbered_adapter() at the end instead of i2c_add_adapter(). Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:31 +02:00
David Brownell	6e13e64184	i2c: Add i2c_add_numbered_adapter() This adds a call, i2c_add_numbered_adapter(), registering an I2C adapter with a specific bus number and then creating I2C device nodes for any pre-declared devices on that bus. It builds on previous patches adding I2C probe() and remove() support, and that pre-declaration of devices. This completes the core support for "new style" I2C device drivers. Those follow the standard driver model for binding devices to drivers (using probe and remove methods) rather than a legacy model (where the driver tries to autoconfigure each bus, and registers devices itself). Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:31 +02:00
David Brownell	9c1600eda4	i2c: Add i2c_board_info and i2c_new_device() This provides partial support for new-style I2C driver binding. It builds on "struct i2c_board_info" declarations that identify I2C devices on a given board. This is needed on systems with I2C devices that can't be fully probed and/or autoconfigured, such as many embedded Linux configurations where the way a given I2C device is wired may affect how it must be used. There are two models for declaring such devices: * LATE -- using a public function i2c_new_device(). This lets modules declare I2C devices found AFTER a given I2C adapter becomes available. For example, a PCI card could create adapters giving access to utility chips on that card, and this would be used to associate those chips with those adapters. * EARLY -- from arch_initcall() level code, using a non-exported function i2c_register_board_info(). This copies the declarations BEFORE such an i2c_adapter becomes available, arranging that i2c_new_device() will be called later when i2c-core registers the relevant i2c_adapter. For example, arch/.../.../board-*.c files would declare the I2C devices along with their platform data, and I2C devices would behave much like PNPACPI devices. (That is, both enumerate from board-specific tables.) To match the exported i2c_new_device(), the previously-private function i2c_unregister_device() is now exported. Pending later patches using these new APIs, this is effectively a NOP. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:31 +02:00
David Brownell	a1d9e6e49f	i2c: i2c stack can remove() More update for new style driver support: add a remove() method, and use it in the relevant code paths. Again, nothing will use this yet since there's nothing to create devices feeding this infrastructure. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:30 +02:00
David Brownell	7b4fbc50fa	i2c: i2c stack can probe() One of a series of I2C infrastructure updates to support enumeration using the standard Linux driver model. This patch updates probe() and associated hotplug/coldplug support, but not remove(). Nothing yet _uses_ it to create I2C devices, so those hotplug/coldplug mechanisms will be the only externally visible change. This patch will be an overall NOP since the I2C stack doesn't yet create clients/devices except as part of binding them to legacy drivers. Some code is moved earlier in the source code, helping group more of the per-device infrastructure in one place and simplifying handling per-device attributes. Terminology being adopted: "legacy drivers" create devices (i2c_client) themselves, while "new style" ones follow the driver model (the i2c_client is handed to the probe routine). It's an either/or thing; the two models don't mix, and drivers that try mixing them won't even be registered. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:30 +02:00
Jean Delvare	7c59b6615f	i2c: Cleanup the includes of <linux/i2c.h> Clean up the includes of <linux/i2c.h>. Only include this header file when we actually need it. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:29 +02:00
Jean Delvare	f75803de6a	i2c-nforce2: Add support for the MCP61 and MCP65 Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Hans-Frieder Vogt <hfvogt@gmx.net>	2007-05-01 23:26:29 +02:00
Jean Delvare	209d27c3b1	i2c: Emulate SMBus block read over I2C Let the I2C bus drivers emulate the SMBus Block Read and Block Process Call transactions if they wish. This requires to define a new message flag, which i2c-core will use to let the underlying I2C bus driver know that the first received byte will specify the length of the read message. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:29 +02:00
David Brownell	ef2c8321f5	i2c: Rename dev_to_i2c_adapter() Rename dev_to_i2c_adapter() as to_i2c_adapter(), since the previous syntax was a surprising and needless difference from normal naming conventions in Linux. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:28 +02:00
David Brownell	2096b956d2	i2c: Shrink struct i2c_client This shrinks the size of "struct i2c_client" by 40 bytes: - Substantially shrinks the string used to identify the chip type - The "flags" don't need to be so big - Removes some internal padding It also adds kerneldoc for that struct, explaining how "name" is really a chip type identifier; it's otherwise potentially confusing. Because the I2C_NAME_SIZE symbol was abused for both i2c_client.name and for i2c_adapter.name, this needed to affect i2c_adapter too. The adapters which used that symbol now use the more-obviously-correct idiom of taking the size of that field. JD: Shorten i2c_adapter.name from 50 to 48 bytes while we're here, to avoid wasting space in padding. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:28 +02:00
Jean Delvare	b31366f439	i2c: i2c_adapter devices need no driver Kill i2c_adapter_driver as it doesn't make sense and it prevents further i2c-core cleanups. i2c_adapter devices are virtual devices (ex-class devices) and as such they don't need a driver. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:28 +02:00
Jean Delvare	fccb56e4d8	i2c: Kill i2c_adapter.class_dev Kill i2c_adapter.class_dev. Instead, set the class of i2c_adapter.dev to i2c_adapter_class, so that a symlink will be created for every i2c_adapter in /sys/class/i2c-adapter. The same change must be mirrored to i2c-isa as it duplicates some of the i2c-core functionalities. User-space tools and libraries might need some adjustments. In particular, libsensors from lm_sensors 2.10.3 or later is required for proper discovery of i2c adapter names after this change. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:27 +02:00
Christoph Hellwig	f3402a4e52	[VOYAGER] Convert the monitor thread to use the kthread API full kthread conversion on the voyager power switch handling thread. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2007-05-01 10:09:29 -05:00
Pierre Ossman	bd76631261	mmc: remove old card states Remove card states that no longer make any sense. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 16:11:57 +02:00
Philip Langdale	55556da012	MMC: Fix handling of low-voltage cards Fix handling of low voltage MMC cards. The latest MMC and SD specs both agree that support for low-voltage operations is indicated by bit 7 in the OCR. The MMC spec states that the low voltage range is 1.65-1.95V while the SD spec leaves the actual voltage range undefined - meaning that there is still no such thing as a low voltage SD card. However, an old Sandisk spec implied that bits 7.0 represented voltages below 2.0V in 1V or 0.5V increments, and the code was accordingly written with that expectation. This confusion meant that host drivers attempting to support the typical low voltage (1.8V) would set the wrong bits in the host OCR mask (usually bits 5 and/or 6) resulting in the the low voltage mode never being used. This change corrects the low voltage range and adds sanity checks on the reserved bits (0-6) and for SD cards that claim to support low-voltage operations. Signed-off-by: Philip Langdale <philipl@overt.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 14:14:50 +02:00
Tejun Heo	31daabda16	libata: reimplement reset sequencing libata previously depended upon waits in prereset to get resets after hotplug right for both spin up and device ready wait. This was necessary both for reliablity and speed as reset was likely to fail if initiated too early and each try usually took more than 30secs to fail. Previous patches fixed the reliability part by fixing status and SCR handling in resets. This patch remedies the speed part by improving reset sequencing. Prereset waiting timeout is adjusted to 10s because spinup wait is replaced by reset sequencing and !BSY wait is not as important as before. During boot or module loading where the drive is already fully spun up, !BSY wait succeeds immediately, so 10s should be enough in most cases. It matters after hotplugging or other error conditions, but in those cases, !BSY wait in prereset simply can't be relied upon due to the varied and weird behaviors ATA controllers and devices show. Reset is now driven by ata_eh_reset_timeouts[] table which contains timeouts for each reset try. The first reset can be softreset but the following ones are always hardreset if available. Each timeout defines deadline for the reset try. If a reset try fails, reset is retried with the next timeout till the end of the timeout table is reached. If a reset try fails before the timeout with error, libata waits till the deadline of the failed try before retrying. IOW, the timeout table defines timetable of reset tries such that the n'th try always begins at least after the sum of all previous timeouts has passed. The current timetable defines 4 tries and takes around 1 minute. @0 : First try. This should succeed most of the time during boot. @10 : 10s is enough to spin up most consumer harddrives. Give it another shot. @20 : 20s should spin up > 99% of working drives. This has 30s timeout for retarded devices needing long idleness post reset. @55 : Final try with 5s timeout just in case. The above timetable is trade off between not annoying the device too much with frequent resets and taking reasonable amount of time in most cases. Some controllers may do better with shorter timeouts while others may fare better with longer but we just can't rely upon LLD writers to test each controller with wide variety of devices using various scenarios. We need default behavior which reasonably fits most cases. I've tested the above timetable on a dozen SATA controllers and a few PATA controllers with about a dozen different drives from all major vendors and 4 different ODDs from three different vendors for both boot and hotplug (if available) cases. Boot probing is not affected unless the device is broken in which cases new code gives up on the port after a minute rather than five or nine minutes. When hotplugging, most devices get detected on the first or second try. Multi-platter drives with long spin up time which sometimes took > 40 secs with the original code, now usually comes up during the second try and at least right after the third try @20. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-05-01 07:49:54 -04:00
Tejun Heo	d4b2bab4f2	libata: add deadline support to prereset and reset methods Add @deadline to prereset and reset methods and make them honor it. ata_wait_ready() which directly takes @deadline is implemented to be used as the wait function. This patch is in preparation for EH timing improvements. * ata_wait_ready() never does busy sleep. It's only used from EH and no wait in EH is that urgent. This function also prints 'be patient' message automatically after 5 secs of waiting if more than 3 secs is remaining till deadline. * ata_bus_post_reset() now fails with error code if any of its wait fails. This is important because earlier reset tries will have shorter timeout than the spec requires. If a device fails to respond before the short timeout, reset should be retried with longer timeout rather than silently ignoring the device. There are three behavior differences. 1. Timeout is applied to both devices at once, not separately. This is more consistent with what the spec says. 2. When a device passes devchk but fails to become ready before deadline. Previouly, post_reset would just succeed and let device classification remove the device. New code fails the reset thus causing reset retry. After a few times, EH will give up disabling the port. 3. When slave device passes devchk but fails to become accessible (TF-wise) after reset. Original code disables dev1 after 30s timeout and continues as if the device doesn't exist, while the patched code fails reset. When this happens, new code fails reset on whole port rather than proceeding with only the primary device. If the failing device is suffering transient problems, new code retries reset which is a better behavior. If the failing device is actually broken, the net effect is identical to it, but not to the other device sharing the channel. In the previous code, reset would have succeeded after 30s thus detecting the working one. In the new code, reset fails and whole port gets disabled. IMO, it's a pathological case anyway (broken device sharing bus with working one) and doesn't really matter. * ata_bus_softreset() is changed to return error code from ata_bus_post_reset(). It used to return 0 unconditionally. * Spin up waiting is to be removed and not converted to honor deadline. * To be on the safe side, deadline is set to 40s for the time being. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-05-01 07:49:53 -04:00
Philip Langdale	4be34c99a2	MMC: Consolidate voltage definitions Consolidate the list of available voltages. Up until now, a separate set of defines has been used for host->vdd than that used for the OCR voltage mask values. Having two sets of defines allows them to get out of sync and the current sets are already inconsistent with one claiming to describe ranges and the other specific voltages. Only the SDHCI driver uses the host->vdd defines and it is easily fixed to use the OCR defines. Signed-off-by: Philip Langdale <philipl@overt.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:42:28 +02:00
Pierre Ossman	7ea239d9e6	mmc: add bus handler Delegate protocol handling to "bus handlers". This allows the core to just handle the task of arbitrating the bus. Initialisation and pampering of cards is now done by the different bus handlers. This design also allows MMC and SD (and later SDIO) to be more cleanly separated, allowing easier maintenance. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:41:06 +02:00
Pierre Ossman	da7fbe58d2	mmc: Separate out protocol ops Move protocol operations and definitions into their own files in an effort to separate protocol handling and bus arbitration more clearly. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:18 +02:00
Pierre Ossman	aaac1b470b	mmc: Move core functions to subdir Create a "core" subdirectory to house the central bus handling functions. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:18 +02:00
Pierre Ossman	b855885e3b	mmc: deprecate mmc bus topology The classic MMC bus was defined as multi card bus system, which is reflected in the design in the MMC layer. When SD showed up, the bus topology was abandoned and a star topology (one card per host) was mandated. MMC version 4 has followed this, officially deprecating the bus topology. As we do not have any known users of the bus topology we can remove support for it. This will simplify the code and rectify some incorrect assumptions in the newer additions. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:18 +02:00
Pierre Ossman	3b91e5507c	mmc: Flush pending detects on host removal Make sure we kill of any pending detection runs when the host is removed instead of when it is freed. Also add some debugging to make sure the driver doesn't queue up more detection after it has removed the host. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:17 +02:00
Pierre Ossman	f74d132cec	mmc: Move OCR bit defines All host drivers were #include:ing mmc/protocol.h just to get access to the OCR bit defines. Move these to host.h instead. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:16 +02:00
Pierre Ossman	9c2c0af950	mmc: add type field to cards Split out the type of card into its own field as it hardly qualifies as a state. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:16 +02:00
Pierre Ossman	85a18ad93e	mmc: MMC sector based cards Support for MMC 4.2 sector based cards. This tweaks the init a bit and reads a new field out of the EXT_CSD. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:15 +02:00
Alex Dubov	91f8d0118a	tifm: layout fixes, small changes to comments and printfs Cosmetic changes to the code. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:15 +02:00
Alex Dubov	13cdf48ef1	tifm_sd: implement software scatter-gather It was found that delays associated with issue and completion of the commands severely limit performance of the new, fast SD cards. To alleviate this issue scatter-gather emulation in software is implemented for both dma and pio transfer modes. Non-block aligned and high memory sg entries are accounted for. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:15 +02:00
Alex Dubov	72dc9d9619	tifm_sd: replace command completion state machine with full checking State machine used to to track mmc command state was found to be fragile and unreliable, making many cards unusable. The safer solution is to perform all needed checks at every card event. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:14 +02:00
Alex Dubov	2428a8fe22	tifm: move common device management tasks from tifm_7xx1 to tifm_core Some details of the device management (create, add, remove) are really belong to the tifm_core, as they are not hardware specific. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:13 +02:00
Alex Dubov	6113ed73e6	tifm: move common adapter management tasks from tifm_7xx1 to tifm_core Some details of the adapter management (create, add, remove) are really belong to the tifm_core, as they are not hardware specific. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:13 +02:00
Alex Dubov	3540af8ffd	tifm: replace per-adapter kthread with freezeable workqueue Freezeable workqueue makes sure that adapter work items (device insertions and removals) would be handled after the system is fully resumed. Previously this was achieved by explicit freezing of the kthread. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:13 +02:00
Alex Dubov	e23f2b8a1a	tifm: simplify bus match and uevent handlers Remove code duplicating the kernel functionality and clean up data structures involved in driver matching. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:13 +02:00
Alex Dubov	8dc4a61eca	tifm: use bus methods to handle probe/remove instead of driver ones. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:12 +02:00
Alex Dubov	4552f0cbd4	tifm: hide details of interrupt processing from socket drivers Instead of passing transformed value of adapter interrupt status to socket drivers, implement two separate callbacks - one for card events and another for dma events. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:12 +02:00
David Teigland	72c2be776b	[DLM] interface for purge (2/2) Add code to accept purge commands from userland. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:12 +01:00
Steve Dickson	74dd34e6e8	NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. READDIRPLUS can be a performance hindrance when the client is working with large directories. In addition, some servers still have bugs in their implementations (e.g. Tru64 returns wrong values for the fsid). Add a mount flag to enable users to turn it off at mount time following the implementation in Apple's NFS client. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:16 -07:00
Chuck Lever	4c2eaf073f	SUNRPC: remove old portmapper net/sunrpc/pmap_clnt.c has been replaced by net/sunrpc/rpcb_clnt.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:15 -07:00
Chuck Lever	a509050bd3	SUNRPC: introduce rpcbind: replacement for in-kernel portmapper Introduce a replacement for the in-kernel portmapper client that supports all 3 versions of the rpcbind protocol. This code is not used yet. Original code by Groupe Bull updated for the latest kernel, with multiple bug fixes. Note that rpcb_clnt.c does not yet support registering via versions 3 and 4 of the rpcbind protocol. That is planned for a later patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:12 -07:00
Chuck Lever	c5a4dd8b7c	SUNRPC: Eliminate side effects from rpc_malloc Currently rpc_malloc sets req->rq_buffer internally. Make this a more generic interface: return a pointer to the new buffer (or NULL) and make the caller set req->rq_buffer and req->rq_bufsize. This looks much more like kmalloc and eliminates the side effects. To fix a potential deadlock, this patch also replaces GFP_NOFS with GFP_NOWAIT in rpc_malloc. This prevents async RPCs from sleeping outside the RPC's task scheduler while allocating their buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:11 -07:00
Chuck Lever	2bea90d43a	SUNRPC: RPC buffer size estimates are too large The RPC buffer size estimation logic in net/sunrpc/clnt.c always significantly overestimates the requirements for the buffer size. A little instrumentation demonstrated that in fact rpc_malloc was never allocating the buffer from the mempool, but almost always called kmalloc. To compute the size of the RPC buffer more precisely, split p_bufsiz into two fields; one for the argument size, and one for the result size. Then, compute the sum of the exact call and reply header sizes, and split the RPC buffer precisely between the two. That should keep almost all RPC buffers within the 2KiB buffer mempool limit. And, we can finally be rid of RPC_SLACK_SPACE! Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:10 -07:00
Chuck Lever	511d2e8855	NLM: Shrink the maximum request size of NLM4 requests NLM version 4 requests estimate the call and reply header sizes rather conservatively, using the very maximum size allowed in the protocol even though Linux always uses only a small fraction of the allowable space. Reduce the size of caller and lock arguments to conserve RPC buffer space while XDR encoding NLM4 arguments. Add compile-time checks to ensure the hostname string won't overflow NLM protocol maximums. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	ca52fec152	NFS: Use pgoff_t in structures and functions that pass page cache offsets Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	8d5658c949	NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:07 -07:00
Trond Myklebust	c63c7b0513	NFS: Fix a race when doing NFS write coalescing Currently we do write coalescing in a very inefficient manner: one pass in generic_writepages() in order to lock the pages for writing, then one pass in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather the locked pages for coalescing into RPC requests of size "wsize". In fact, it turns out there is actually a deadlock possible here since we only start I/O on the second pass. If the user signals the process while we're in nfs_sync_mapping_wait(), for instance, then we may exit before starting I/O on all the requests that have been queued up. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:06 -07:00
Trond Myklebust	8b09bee308	NFS: Cleanup for nfs_readpages() Do the coalescing of read requests into block sized requests at start of I/O as we scan through the pages instead of going through a second pass. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:05 -07:00
Trond Myklebust	bcb71bba7e	NFS: Another cleanup of the read/write request coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:04 -07:00
Trond Myklebust	d8a5ad75cc	NFS: Cleanup the coalescing code Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:04 -07:00
Roman Moravcik	84767d00a8	Input: gpio_keys - add support for switches (EV_SW) Signed-off-by: Roman Moravcik <roman.moravcik@gmail.com> Signed-off-by: Paul Sokolovsky <pmiscml@gmail.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2007-05-01 00:39:13 -04:00
Dmitry Torokhov	bc95f3669f	Merge master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/usb/input/Makefile drivers/usb/input/gtco.c	2007-05-01 00:24:54 -04:00
David Rientjes	14e38ac823	pm: include EIO from errno-base.h For backwards compatibility, call_platform_enable_wakeup() can return 0 instead of -EIO since we aren't guaranteed to have errno defined. Cc: David Brownell <david-b@pacbell.net> Signed-off-by: David Rientjes <rientjes@google.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:41 -07:00
Jeremy Fitzhardinge	11443ec7d9	Add kvasprintf() Add a kvasprintf() function to complement kasprintf(). No in-tree users yet, but I have some coming up. [akpm@linux-foundation.org: EXPORT it] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Keir Fraser <keir@xensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Johannes Berg	9684e51cd1	power management: force pm_ops.valid callback to be assigned This patch changes the docs and behaviour from "all states valid" to "no states valid" if no .valid callback is assigned. Users of pm_ops that only need mem sleep can assign pm_valid_only_mem without any overhead, others will require more elaborate callbacks. Now that all users of pm_ops have a .valid callback this is a safe thing to do and prevents things from getting messy again as they were before. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl> Cc: <linux-pm@lists.linux-foundation.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Johannes Berg	e8c9c50269	power management: implement pm_ops.valid for everybody Almost all users of pm_ops only support mem sleep, don't check in .valid and don't reject any others in .prepare so users can be confused if they check /sys/power/state, especially when new states are added (these would then result in s-t-r although they're supposed to be something different). This patch implements a generic pm_valid_only_mem function that is then exported for users and puts it to use in almost all existing pm_ops. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: linux-pm@lists.linux-foundation.org Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Johannes Berg	11d77d0c01	power management: remove firmware disk mode This patch removes the firmware disk suspend mode which is the wrong approach, it is supposed to be used for implementing firmware-based disk suspend but cannot actually be used for that. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: David Brownell <david-b@pacbell.net> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Johannes Berg	fe0c935a6c	rework pm_ops pm_disk_mode, kill misuse This patch series cleans up some misconceptions about pm_ops. Some users of the pm_ops structure attempt to use it to stop the user from entering suspend to disk, this, however, is not possible since the user can always use "shutdown" in /sys/power/disk and then the pm_ops are never invoked. Also, platforms that don't support suspend to disk simply should not allow configuring SOFTWARE_SUSPEND (read the help text on it, it only selects suspend to disk and nothing else, all the other stuff depends on PM). The pm_ops structure is actually intended to provide a way to enter platform-defined sleep states (currently supported states are "standby" and "mem" (suspend to ram)) and additionally (if SOFTWARE_SUSPEND is configured) allows a platform to support a platform specific way to enter low-power mode once everything has been saved to disk. This is currently only used by ACPI (S4). This patch: The pm_ops.pm_disk_mode is used in totally bogus ways since nobody really seems to understand what it actually does. This patch clarifies the pm_disk_mode description. It also removes all the arm and sh users that think they can veto suspend to disk via pm_ops; not so since the user can always do echo shutdown > /sys/power/disk, they need to find a better way involving Kconfig or such. ACPI is the only user left with a non-zero pm_disk_mode. The patch also sets the default mode to shutdown again, but when a new pm_ops is registered its pm_disk_mode is selected as default, that way the default stays for ACPI where it is apparently required. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: David Brownell <david-b@pacbell.net> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <linux-pm@lists.linux-foundation.org> Cc: Len Brown <lenb@kernel.org> Acked-by: Russell King <rmk@arm.linux.org.uk> Cc: Greg KH <greg@kroah.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:40 -07:00
Robert Peterson	42e380832a	Extend print_symbol capability Today's print_symbol function dumps a kernel symbol with printk. This patch extends the functionality of kallsyms.c so that the symbol lookup function may be used without the printk. This is useful for modules that want to dump symbols elsewhere, for example, to debugfs. I intend to use the new function call in the GFS2 file system (which will be a separate patch). [akpm@linux-foundation.org: build fix] [clameter@sgi.com: sprint_symbol should return length of string like sprintf] Signed-off-by: Robert Peterson <rpeterso@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-04-30 16:40:39 -07:00
Kristian Høgsberg	9640d3d775	firewire: Rename fw-device-cdev.c to fw-cdev.c and move header to include/linux. Signed-off-by: Kristian Høgsberg <krh@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>	2007-04-30 23:08:13 +02:00
Tony Luck	d29182534c	Pull mem-attribute into release branch	2007-04-30 13:56:17 -07:00
Tony Luck	b643b0fdbc	Pull percpu-dtc into release branch	2007-04-30 13:56:00 -07:00
Tony Luck	e0cc09e295	Pull error-inject into release branch	2007-04-30 13:55:43 -07:00
Linus Torvalds	d6454706c3	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jikos/hid: (21 commits) USB HID: don't warn on idVendor == 0 USB HID: add 'quirks' module parameter USB HID: add support for dynamically-created quirks USB HID: clarify static quirk handling as squirks USB HID: encapsulate quirk handling into hid-quirks.c USB HID: EMS USBII device needs HID_QUIRK_MULTI_INPUT HID: update copyright and authorship macro HID: introduce proper zeroing of unused bits in output reports USB HID: add support for WiseGroup MP-8800 Quad Joypad USB HID: add FF support for Logitech Force 3D Pro Joystick USB HID: numlock quirk for dell W7658 keyboard USB HID: Logitech MX3000 keyboard needs report descriptor quirk USB HID: extend quirk for Logitech S510 keyboard USB HID: usbkbd/usbmouse - handle errors when registering devices USB HID: add QUIRK_HIDDEV for Belkin Flip KVM HID: enable dead keys on a belkin wireless keyboard USB HID: Thustmaster firestorm dual power v1 support USB HID: specify explicit size for hid_blacklist.quirks USB HID: fix retry & reset logic USB HID: consolidate vendor/product ids ...	2007-04-30 08:58:21 -07:00
Linus Torvalds	152a6a9da1	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) [IPV4] SNMP: Support OutMcastPkts and OutBcastPkts [IPV4] SNMP: Support InMcastPkts and InBcastPkts [IPV4] SNMP: Support InTruncatedPkts [IPV4] SNMP: Support InNoRoutes [SNMP]: Add definitions for {In,Out}BcastPkts [TCP] FRTO: RFC4138 allows Nagle override when new data must be sent [TCP] FRTO: Delay skb available check until it's mandatory [XFRM]: Restrict upper layer information by bundle. [TCP]: Catch skb with S+L bugs earlier [PATCH] INET : IPV4 UDP lookups converted to a 2 pass algo [L2TP]: Add the ability to autoload a pppox protocol module. [SKB]: Introduce skb_queue_walk_safe() [AF_IUCV/IUCV]: smp_call_function deadlock [IPV6]: Fix slab corruption running ip6sic [TCP]: Update references in two old comments [XFRM]: Export SPD info [IPV6]: Track device renames in snmp6. [SCTP]: Fix sctp_getsockopt_local_addrs_old() to use local storage. [NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats. [NETPOLL]: Remove CONFIG_NETPOLL_RX ...	2007-04-30 08:14:42 -07:00
Linus Torvalds	cd9bb7e736	Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block * 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: [PATCH] elevator: elv_list_lock does not need irq disabling [BLOCK] Don't pin lots of memory in mempools cfq-iosched: speedup cic rb lookup ll_rw_blk: add io_context private pointer cfq-iosched: get rid of cfqq hash cfq-iosched: tighten queue request overlap condition cfq-iosched: improve sync vs async workloads cfq-iosched: never allow an async queue idling cfq-iosched: get rid of ->dispatch_slice cfq-iosched: don't pass unused preemption variable around cfq-iosched: get rid of ->cur_rr and ->cfq_list cfq-iosched: slice offset should take ioprio into account [PATCH] cfq-iosched: style cleanups and comments cfq-iosched: sort IDLE queues into the rbtree cfq-iosched: sort RT queues into the rbtree [PATCH] cfq-iosched: speed up rbtree handling cfq-iosched: rework the whole round-robin list concept cfq-iosched: minor updates cfq-iosched: development update cfq-iosched: improve preemption for cooperating tasks	2007-04-30 08:12:39 -07:00
Linus Torvalds	24a77daf3d	Merge branch 'for-2.6.22' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'for-2.6.22' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (255 commits) [POWERPC] Remove dev_dbg redefinition in drivers/ps3/vuart.c [POWERPC] remove kernel module option for booke wdt [POWERPC] Avoid putting cpu node twice [POWERPC] Spinlock initializer cleanup [POWERPC] ppc4xx_sgdma needs dma-mapping.h [POWERPC] arch/powerpc/sysdev/timer.c build fix [POWERPC] get_property cleanups [POWERPC] Remove the unused HTDMSOUND driver [POWERPC] cell: cbe_cpufreq cleanup and crash fix [POWERPC] Declare enable_kernel_spe in a header [POWERPC] Add dt_xlate_addr() to bootwrapper [POWERPC] bootwrapper: CONFIG_ -> CONFIG_DEVICE_TREE [POWERPC] Don't define a custom bd_t for Xilixn Virtex based boards. [POWERPC] Add sane defaults for Xilinx EDK generated xparameters files [POWERPC] Add uartlite boot console driver for the zImage wrapper [POWERPC] Stop using ppc_sys for Xilinx Virtex boards [POWERPC] New registration for common Xilinx Virtex ppc405 platform devices [POWERPC] Merge common virtex header files [POWERPC] Rework Kconfig dependancies for Xilinx Virtex ppc405 platform [POWERPC] Clean up cpufreq Kconfig dependencies ...	2007-04-30 08:10:12 -07:00
Dan Williams	84c981ffb3	[ARM] 4343/1: iop13xx: automatically detect the internal bus frequency Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-04-30 15:24:54 +01:00
Dan Williams	7dcad376e8	[ARM] 4341/1: iop13xx: fix i/o address translation PCI devices were being programmed with an incorrect base address value. This patch moves I/O space into a 16-bit addressable region and corrects the i/o offset. Much thanks to Martin Michlmayr for tracking this issue and testing debug patches. Cc: Martin Michlmayr <tbm@cyrius.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2007-04-30 15:24:50 +01:00
Mitsuru Chinen	71ff6c0a85	[SNMP]: Add definitions for {In,Out}BcastPkts The updated IP-MIB RFC (RFC4293) specifys new objects, InBcastPkts and OutBcastPkts. This adds definitions for them. Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:19 -07:00
Ilpo Järvinen	d551e4541d	[TCP] FRTO: RFC4138 allows Nagle override when new data must be sent This is a corner case where less than MSS sized new data thingie is awaiting in the send queue. For F-RTO to work correctly, a new data segment must be sent at certain point or F-RTO cannot be used at all. RFC4138 allows overriding of Nagle at that point. Implementation uses frto_counter states 2 and 3 to distinguish when Nagle override is needed. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:16 -07:00
Masahide NAKAMURA	157bfc2502	[XFRM]: Restrict upper layer information by bundle. On MIPv6 usage, XFRM sub policy is enabled. When main (IPsec) and sub (MIPv6) policy selectors have the same address set but different upper layer information (i.e. protocol number and its ports or type/code), multiple bundle should be created. However, currently we have issue to use the same bundle created for the first time with all flows covered by the case. It is useful for the bundle to have the upper layer information to be restructured correctly if it does not match with the flow. 1. Bundle was created by two policies Selector from another policy is added to xfrm_dst. If the flow does not match the selector, it goes to slow path to restructure new bundle by single policy. 2. Bundle was created by one policy Flow cache is added to xfrm_dst as originated one. If the flow does not match the cache, it goes to slow path to try searching another policy. Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:58:09 -07:00
Ilpo Järvinen	34588b4c04	[TCP]: Catch skb with S+L bugs earlier SACKED_ACKED and LOST are mutually exclusive with SACK, thus having their sum larger than packets_out is bug with SACK. Eventually these bugs trigger traps in the tcp_clean_rtx_queue with SACK but it's much more informative to do this here. Non-SACK TCP, however, could get more than packets_out duplicate ACKs which each increment sacked_out, so it makes sense to do this kind of limitting for non-SACK TCP but not for SACK enabled one. Perhaps the author had the opposite in mind but did the logic accidently wrong way around? Anyway, the sacked_out incrementer code for non-SACK already deals this issue before calling sync_left_out so this trapping can be done unconditionally. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:57:33 -07:00
Jens Axboe	07e4470805	Merge branch 'cfq' into for-linus	2007-04-30 09:09:27 +02:00
Jens Axboe	5972511b77	[BLOCK] Don't pin lots of memory in mempools Currently we scale the mempool sizes depending on memory installed in the machine, except for the bio pool itself which sits at a fixed 256 entry pre-allocation. There's really no point in "optimizing" this OOM path, we just need enough preallocated to make progress. A single unit is enough, lets scale it down to 2 just to be on the safe side. This patch saves ~150kb of pinned kernel memory on a 32-bit box. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:08:17 +02:00
James Chapman	46f8914e53	[SKB]: Introduce skb_queue_walk_safe() This patch provides a method for walking skb lists while inserting or removing skbs from the list. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-30 00:07:31 -07:00
Jens Axboe	4e521c27ee	ll_rw_blk: add io_context private pointer To be used by as/cfq as they see fit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-04-30 09:01:23 +02:00
Dmitry Torokhov	0dcd807367	Input: add skeleton for simple polled devices input-polldev provides a skeleton for supporting simple input devices that need to be periodically scanned or polled to detect changes in their state. Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2007-04-29 23:42:45 -04:00
Paul Mackerras	49e1900d4c	Merge branch 'linux-2.6' into for-2.6.22	2007-04-30 12:38:01 +10:00
Johannes Berg	d169d14094	[POWERPC] Declare enable_kernel_spe in a header This patch puts enable_kernel_spe into <asm-powerpc/system.h> along with enable_kernel_altivec etc. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-04-30 11:02:05 +10:00
Grant Likely	8c38fc2b74	[POWERPC] Stop using ppc_sys for Xilinx Virtex boards The arch/ppc/syslib/ppc_sys.c infrastructure does not work well for the virtex ports. Move the ml300 and ml403 board ports over to use the new virtex_devices infrastructure. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-04-30 11:02:04 +10:00
Grant Likely	5ff084f21d	[POWERPC] Merge common virtex header files The header files for the ml403 and ml300 are virtually identical, merge them into a single file. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Paul Mackerras <paulus@samba.org>	2007-04-30 11:02:04 +10:00
Linus Torvalds	e389f9aec6	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (107 commits) smc911x: fix compilation breakage wjen debug is on [netdrvr] eexpress: minor corrections add NAPI support to sb1250-mac.c ixgb: ROUND_UP macro cleanup in drivers/net/ixgb e1000: ROUND_UP macro cleanup in drivers/net/e1000 Generic HDLC sparse annotations e100: Optionally use I/O mode only to access register space e100: allow bad MAC address when running with invalid eeprom csum ehea: fix for dlpar support ehea: fix for sysfs entries 3C509: Remove unnecessary include of <linux/pm_legacy.h> NetXen: Fix for vmalloc issues NetXen: Fixes for Power PC architecture NetXen: Port swap feature for multi port cards NetXen: Removal of redundant macros NetXen: Multi PCI support for Quad cards NetXen: Removal of redundant argument passing NetXen: Use multiple PCI functions [netdrvr e100] experiment with doing RX in a similar manner to eepro100 [PATCH] ieee80211: add missing global needed by IEEE80211_DEBUG_XXXX ...	2007-04-29 10:48:48 -07:00
Linus Torvalds	f73b0a08ea	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: (86 commits) SPIN_LOCK_UNLOCKED cleanup in drivers/ata/pata_winbond.c drivers/ata/pata_cmd640.c: fix build with CONFIG_PM=n pata_hpt37x: Further small fixes pata_hpt3x2n: Add HPT371N support and other bits ata: printk warning fixes libata: separate ATA_EHI_DID_RESET into DID_SOFTRESET and DID_HARDRESET ahci: consolidate common port flags ata_timing: ensure t->cycle is always correct libata: add missing call to ->cable_detect() in new EH path pata_amd: remove contamination added during cable_detect conversion libata: Handle drives that require a spin-up command before first access libata: HPA support libata: kill probe_ent and related helpers libata: convert the remaining PATA drivers to new init model libata: convert the remaining SATA drivers to new init model libata: convert ata_pci_init_native_mode() users to new init model libata: convert drivers with combined SATA/PATA ports to new init model libata: add init helpers including ata_pci_prepare_native_host() libata: convert native PCI host handling to new init model libata: convert legacy PCI host handling to new init model ...	2007-04-29 10:48:21 -07:00
Linus Torvalds	6b06d2cc6d	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (105 commits) sonypi: use mutex instead of semaphore sony-laptop: remove user visible camera controls as platform attributes meye: make meye use sony-laptop instead of sonypi sony-laptop: add a meye-usable include file for camera ops sony-laptop: complete the motion eye camera support in sony-laptop sonypi: try to detect if sony-laptop has already taken one of the known ioports sonypi: suggest sonypi users to try sony-laptop instead sony-laptop: add edge modem support (also called WWAN) sony-laptop: add locking on accesses to the ioport and global vars sony-laptop: add camera enable/disable parameter, better handle possible infinite loop thinkpad-acpi: make drivers/misc/thinkpad_acpi:fan_mutex static ACPI: thinkpad-acpi: add sysfs support to wan and bluetooth subdrivers ACPI: thinkpad-acpi: add sysfs support to hotkey subdriver ACPI: thinkpad-acpi: improve dock subdriver initialization ACPI: thinkpad-acpi: improve debugging for acpi helpers ACPI: thinkpad-acpi: improve fan control documentation ACPI: thinkpad-acpi: map ENXIO to EINVAL for fan sysfs ACPI: thinkpad-acpi: fix a fan watchdog invocation ACPI: thinkpad-acpi: do not arm fan watchdog if it would not work ACPI: thinkpad-acpi: add a fan-control feature master toggle ...	2007-04-29 10:47:25 -07:00
Martin Schwidefsky	04b090d50c	[AF_IUCV/IUCV]: smp_call_function deadlock Calling smp_call_function can lead to a deadlock if it is called from tasklet context. Fixing this deadlock requires to move the smp_call_function from the tasklet context to a work queue. To do that queue the path pending interrupts to a separate list and move the path cleanup out of iucv_path_sever to iucv_path_connect and iucv_path_pending. This creates a new requirement for iucv_path_connect: it may not be called from tasklet context anymore. Also fixed compile problem for CONFIG_HOTPLUG_CPU=n and another one when walking the cpu_online mask. When doing this, we must disable cpu hotplug. Signed-off-by: Frank Pavlic <fpavlic@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 23:03:59 -07:00
Jamal Hadi Salim	ecfd6b1837	[XFRM]: Export SPD info With this patch you can use iproute2 in user space to efficiently see how many policies exist in different directions. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 21:20:32 -07:00
Rusty Russell	5a1b5898ee	[NET]: Remove NETIF_F_INTERNAL_STATS, default to internal stats. Herbert Xu conviced me that a new flag was overkill; every driver currently overrides get_stats, so we might as well make the internal one the default. If someone did fail to set get_stats, they would now get all 0 stats instead of "No statistics available". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 21:04:03 -07:00
Sergei Shtylyov	5f286e113f	[NETPOLL]: Fix TX queue overflow in trapped mode. CONFIG_NETPOLL_TRAP causes the TX queue controls to be completely bypassed in the netpoll's "trapped" mode which easily causes overflows in the drivers with short TX queues (most notably, in 8139too with its 4-deep queue). So, make this option more sensible by making it only bypass the TX softirq wakeup. Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Acked-by: Jeff Garzik <jgarzik@pobox.com> Acked-by: Tom Rini <trini@kernel.crashing.org> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-04-28 20:57:37 -07:00
Len Brown	fb16596997	Pull misc-for-upstream into release branch	2007-04-28 23:12:03 -04:00
Len Brown	cfaae3ee4a	Pull sony into release branch	2007-04-28 23:09:57 -04:00
malattia@linux.it	1ce82c14d0	sony-laptop: add a meye-usable include file for camera ops Copy and rename (for easier co-existence) the MEYE-wise exported interface. Signed-off-by: Mattia Dongili <malattia@linux.it> Signed-off-by: Len Brown <len.brown@intel.com>	2007-04-28 22:06:01 -04:00
Tejun Heo	0d64a233fe	libata: separate ATA_EHI_DID_RESET into DID_SOFTRESET and DID_HARDRESET Separate ATA_EHI_DID_RESET into ATA_EHI_DID_SOFTRESET and ATA_EHI_DID_HARDRESET. ATA_EHI_DID_RESET is redefined as OR of the two flags. This patch doesn't introduce any behavior change. This will be used later to determine whether _SDD is necessary or not. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-04-28 14:51:33 -04:00
Mark Lord	169439c2e3	libata: Handle drives that require a spin-up command before first access (S)ATA drives can be configured for "power-up in standby", a mode whereby a specific "spin up now!" command is required before the first media access. Currently, a drive with this feature enabled can not be used at all with libata, and once in this mode, the drive becomes a doorstop. The older drivers/ide subsystem at least enumerates the drive, so that it can be woken up after the fact from a userspace HDIO_* command, but not libata. This patch adds support to libata for the "power-up in standby" mode where a "spin up now!" command (SET_FEATURES) is needed. With this, libata will recognize such drives, spin them up, and then re-IDENTIFY them if necessary to get a full/complete set of drive features data. Drives in this state are determined by looking for special values in id[2], as documented in the current ATA specs. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-04-28 14:40:40 -04:00
Alan Cox	1e999736ca	libata: HPA support Signed-off-by: Alan Cox <alan@redhat.com> Add support for ignoring the BIOS HPA result (off by default) and setting the disk to the full available size unless already frozen. Tested with various platforms/disks and confirmed to work with the Macintosh (which broke earlier) and ata_piix (breakage due to the LBA48 readback that Tejun fixed). For normal users this brings us, I believe, to feature parity with old IDE (and of course more featured in some areas too). Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-04-28 14:16:06 -04:00

... 6 7 8 9 10 ...

13715 Commits