linux

Author	SHA1	Message	Date
David S. Miller	f0af97070a	sparc64: Fix missing put_cpu_var() in tlb_batch_add_one() when not batching. Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-04-24 16:52:18 -07:00
David S. Miller	f36391d279	sparc64: Fix race in TLB batch processing. As reported by Dave Kleikamp, when we emit cross calls to do batched TLB flush processing we have a race because we do not synchronize on the sibling cpus completing the cross call. So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.) and either flushes are missed or flushes will flush the wrong addresses. Fix this by using generic infrastructure to synchonize on the completion of the cross call. This first required getting the flush_tlb_pending() call out from switch_to() which operates with locks held and interrupts disabled. The problem is that smp_call_function_many() cannot be invoked with IRQs disabled and this is explicitly checked for with WARN_ON_ONCE(). We get the batch processing outside of locked IRQ disabled sections by using some ideas from the powerpc port. Namely, we only batch inside of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a region, we flush TLBs synchronously. 1) Get rid of xcall_flush_tlb_pending and per-cpu type implementations. 2) Do TLB batch cross calls instead via: smp_call_function_many() tlb_pending_func() __flush_tlb_pending() 3) Batch only in lazy mmu sequences: a) Add 'active' member to struct tlb_batch b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE c) Set 'active' in arch_enter_lazy_mmu_mode() d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode() e) Check 'active' in tlb_batch_add_one() and do a synchronous flush if it's clear. 4) Add infrastructure for synchronous TLB page flushes. a) Implement __flush_tlb_page and per-cpu variants, patch as needed. b) Likewise for xcall_flush_tlb_page. c) Implement smp_flush_tlb_page() to invoke the cross-call. d) Wire up global_flush_tlb_page() to the right routine based upon CONFIG_SMP 5) It turns out that singleton batches are very common, 2 out of every 3 batch flushes have only a single entry in them. The batch flush waiting is very expensive, both because of the poll on sibling cpu completeion, as well as because passing the tlb batch pointer to the sibling cpus invokes a shared memory dereference. Therefore, in flush_tlb_pending(), if there is only one entry in the batch perform a completely asynchronous global_flush_tlb_page() instead. Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2013-04-19 17:26:26 -04:00
Akinobu Mita	9a0ac1b6af	sparc/iommu: fix typo s/265KB/256KB/ IOMMU_NPTES is 64K PTEs, so the size is 256KB (= 64K * sizeof(iopte_t)) Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-31 19:29:12 -04:00
Akinobu Mita	54df2db36c	sparc/srmmu: clear trailing edge of bitmap properly srmmu_nocache_bitmap is cleared by bit_map_init(). But bit_map_init() attempts to clear by memset(), so it can't clear the trailing edge of bitmap properly on big-endian architecture if the number of bits is not a multiple of BITS_PER_LONG. Actually, the number of bits in srmmu_nocache_bitmap is not always a multiple of BITS_PER_LONG. It is calculated as below: bitmap_bits = srmmu_nocache_size >> SRMMU_NOCACHE_BITMAP_SHIFT; srmmu_nocache_size is decided proportionally by the amount of system RAM and it is rounded to a multiple of PAGE_SIZE. SRMMU_NOCACHE_BITMAP_SHIFT is defined as (PAGE_SHIFT - 4). So it can only be said that bitmap_bits is a multiple of 16. This fixes the problem by using bitmap_clear() instead of memset() in bit_map_init() and this also uses BITS_TO_LONGS() to calculate correct size at bitmap allocation time. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2013-03-31 19:29:12 -04:00
Shaohua Li	ec8acf20af	swap: add per-partition lock for swapfile swap_lock is heavily contended when I test swap to 3 fast SSD (even slightly slower than swap to 2 such SSD). The main contention comes from swap_info_get(). This patch tries to fix the gap with adding a new per-partition lock. Global data like nr_swapfiles, total_swap_pages, least_priority and swap_list are still protected by swap_lock. nr_swap_pages is an atomic now, it can be changed without swap_lock. In theory, it's possible get_swap_page() finds no swap pages but actually there are free swap pages. But sounds not a big problem. Accessing partition specific data (like scan_swap_map and so on) is only protected by swap_info_struct.lock. Changing swap_info_struct.flags need hold swap_lock and swap_info_struct.lock, because scan_scan_map() will check it. read the flags is ok with either the locks hold. If both swap_lock and swap_info_struct.lock must be hold, we always hold the former first to avoid deadlock. swap_entry_free() can change swap_list. To delete that code, we add a new highest_priority_index. Whenever get_swap_page() is called, we check it. If it's valid, we use it. It's a pity get_swap_page() still holds swap_lock(). But in practice, swap_lock() isn't heavily contended in my test with this patch (or I can say there are other much more heavier bottlenecks like TLB flush). And BTW, looks get_swap_page() doesn't really need the lock. We never free swap_info[] and we check SWAP_WRITEOK flag. The only risk without the lock is we could swapout to some low priority swap, but we can quickly recover after several rounds of swap, so sounds not a big deal to me. But I'd prefer to fix this if it's a real problem. "swap: make each swap partition have one address_space" improved the swapout speed from 1.7G/s to 2G/s. This patch further improves the speed to 2.3G/s, so around 15% improvement. It's a multi-process test, so TLB flush isn't the biggest bottleneck before the patches. [arnd@arndb.de: fix it for nommu] [hughd@google.com: add missing unlock] [minchan@kernel.org: get rid of lockdep whinge on sys_swapon] Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Seth Jennings <sjenning@linux.vnet.ibm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:17 -08:00
Tang Chen	0197518cd3	memory-hotplug: remove memmap of sparse-vmemmap Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Yasuaki Ishimatsu	46723bfa54	memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node() when platform doesn't support it. It's implemented by adding a new Kconfig option named CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected by memory-hotplug feature fully supported archs(currently only on x86_64). Since we have 2 config options called MEMORY_HOTPLUG and MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately, and codes in function register_page_bootmem_info_node() are only used for collecting infomation for hot-remove, so reside it under MEMORY_HOTREMOVE. Besides page_isolation.c selected by MEMORY_ISOLATION under MEMORY_HOTPLUG is also such case, move it too. [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE] [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()] [mhocko@suse.cz: remove the arch specific functions without any implementation] [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed] [rientjes@google.com: fix defined but not used warning] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wu Jianguo <wujianguo@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2013-02-23 17:50:12 -08:00
Linus Torvalds	2ef14f465b	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The really big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...	2013-02-21 18:06:55 -08:00
David S. Miller	0fbebed682	sparc64: Fix tsb_grow() in atomic context. If our first THP installation for an MM is via the set_pmd_at() done during khugepaged's collapsing we'll end up in tsb_grow() trying to do a GFP_KERNEL allocation with several locks held. Simply using GFP_ATOMIC in this situation is not the best option because we really can't have this fail, so we'd really like to keep this an order 0 GFP_KERNEL allocation if possible. Also, doing the TSB allocation from khugepaged is a really bad idea because we'll allocate it potentially from the wrong NUMA node in that context. So what we do is defer the hugepage TSB allocation until the first TLB miss we take on a hugepage. This is slightly tricky because we have to handle two unusual cases: 1) Taking the first hugepage TLB miss in the window trap handler. We'll call the winfix_trampoline when that is detected. 2) An initial TSB allocation via TLB miss races with a hugetlb fault on another cpu running the same MM. We handle this by unconditionally loading the TSB we see into the current cpu even if it's non-NULL at hugetlb_setup time. Reported-by: Meelis Roos <mroos@ut.ee> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-20 09:46:08 -08:00
David S. Miller	bcd896bae0	sparc64: Handle hugepage TSB being NULL. Accomodate the possibility that the TSB might be NULL at the point that update_mmu_cache() is invoked. This is necessary because we will sometimes need to defer the TSB allocation to the first fault that happens in the 'mm'. Seperate out the hugepage PTE test into a seperate function so that the logic is clearer. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-20 09:46:08 -08:00
David S. Miller	a55ee1ff75	sparc64: Fix gfp_flags setting in tsb_grow(). We should "\|= more_flags" rather than "= more_flags". Reported-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-20 09:45:53 -08:00
David S. Miller	89a77915e0	sparc64: Fix get_user_pages_fast() wrt. THP. Mostly mirrors the s390 logic, as unlike x86 we don't need the SetPageReferenced() bits. On sparc64 we also lack a user/privileged bit in the huge PMDs. In order to make this work for THP and non-THP builds, some header file adjustments were necessary. Namely, provide the PMD_HUGE_* bit defines and the pmd_large() inline unconditionally rather than protected by TRANSPARENT_HUGEPAGE. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-02-13 12:22:14 -08:00
H. Peter Anvin	de65d816aa	Merge remote-tracking branch 'origin/x86/boot' into x86/mm2 Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:10:15 -08:00
Greg Kroah-Hartman	7c9503b838	SPARC: drivers: remove __dev* attributes. CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitdata, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-03 15:57:04 -08:00
Linus Torvalds	9977d9b379	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull big execve/kernel_thread/fork unification series from Al Viro: "All architectures are converted to new model. Quite a bit of that stuff is actually shared with architecture trees; in such cases it's literally shared branch pulled by both, not a cherry-pick. A lot of ugliness and black magic is gone (-3KLoC total in this one): - kernel_thread()/kernel_execve()/sys_execve() redesign. We don't do syscalls from kernel anymore for either kernel_thread() or kernel_execve(): kernel_thread() is essentially clone(2) with callback run before we return to userland, the callbacks either never return or do successful do_execve() before returning. kernel_execve() is a wrapper for do_execve() - it doesn't need to do transition to user mode anymore. As a result kernel_thread() and kernel_execve() are arch-independent now - they live in kernel/fork.c and fs/exec.c resp. sys_execve() is also in fs/exec.c and it's completely architecture-independent. - daemonize() is gone, along with its parts in fs/.c - struct pt_regs is no longer passed to do_fork/copy_process/ copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump. - sys_fork()/sys_vfork()/sys_clone() unified; some architectures still need wrappers (ones with callee-saved registers not saved in pt_regs on syscall entry), but the main part of those suckers is in kernel/fork.c now." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits) do_coredump(): get rid of pt_regs argument print_fatal_signal(): get rid of pt_regs argument ptrace_signal(): get rid of unused arguments get rid of ptrace_signal_deliver() arguments new helper: signal_pt_regs() unify default ptrace_signal_deliver flagday: kill pt_regs argument of do_fork() death to idle_regs() don't pass regs to copy_process() flagday: don't pass regs to copy_thread() bfin: switch to generic vfork, get rid of pointless wrappers xtensa: switch to generic clone() openrisc: switch to use of generic fork and clone unicore32: switch to generic clone(2) score: switch to generic fork/vfork/clone c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone() take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h mn10300: switch to generic fork/vfork/clone h8300: switch to generic fork/vfork/clone tile: switch to generic clone() ... Conflicts: arch/microblaze/include/asm/Kbuild	2012-12-12 12:22:13 -08:00
Michel Lespinasse	2aea28b975	mm: use vm_unmapped_area() in hugetlbfs on sparc64 architecture Update the sparc64 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-11 17:22:26 -08:00
Yinghai Lu	961f8fa04b	sparc, mm: Remove calling of free_all_bootmem_node() Now NO_BOOTMEM version free_all_bootmem_node() does not really do free_bootmem at all, and it only call register_page_bootmem_info_node instead. That is confusing, try to kill that free_all_bootmem_node(). Before that, this patch will remove calling of free_all_bootmem_node() We add register_page_bootmem_info() to call register_page_bootmem_info_node directly. Also could use free_all_bootmem() for numa case, and it is just the same as free_low_memory_core_early(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-45-git-send-email-yinghai@kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: sparclinux@vger.kernel.org Acked-by: "David S. Miller" <davem@davemloft.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:48 -08:00
Al Viro	d05f06e60d	Merge branch 'arch-frv' into no-rebases	2012-11-16 22:27:58 -05:00
David S. Miller	916ca14aaf	sparc64: Add global PMU register dumping via sysrq. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-16 09:34:01 -07:00
Al Viro	dff933da76	sparc64: clear syscall_noerror on the entry to syscall, not on the exit Move that sucker to just before TI_FPDEPTH and replace stb with sth in etrap_save(). Take current_ds to its old place, so that we don't push wsaved into TI_... flags. That allows to lose clearing syscall_noerror on return from syscall. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-10-14 19:26:52 -04:00
David S. Miller	f88620b9c5	sparc64: Fix deficiencies in sun4v error reporting. Missing error types, attributes, and report fields. Pad out to 64-bytes. Make string reporting cleaner and easier to extend in the future using "const char *" arrays that index by either bit position, or absolute field value. Report the raw 64-byte error report as a sequence of u64s before the annotated version. Only report fields which are valid, given the context and the attribute bits which are set. For shutdown requests, use the local copy of the error report not the one we just freed up back to the queue. Also, use orderly_poweroff() just like the Domain Services shutdown request code does. If the real-address reported is "-1" (unknown) try to disassemble the instruction to report the effective address of the access. Only do this in privileged mode. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-10 17:19:32 -07:00
David Miller	9e695d2ecc	sparc64: Support transparent huge pages. This is relatively easy since PMD's now cover exactly 4MB of memory. Our PMD entries are 32-bits each, so we use a special encoding. The lowest bit, PMD_ISHUGE, determines the interpretation. This is possible because sparc64's page tables are purely software entities so we can use whatever encoding scheme we want. We just have to make the TLB miss assembler page table walkers aware of the layout. set_pmd_at() works much like set_pte_at() but it has to operate in two page from a table of non-huge PTEs, so we have to queue up TLB flushes based upon what mappings are valid in the PTE table. In the second regime we are going from huge-page to non-huge-page, and in that case we need only queue up a single TLB flush to push out the huge page mapping. We still have 5 bits remaining in the huge PMD encoding so we can very likely support any new pieces of THP state tracking that might get added in the future. With lots of help from Johannes Weiner. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:06 +09:00
David Miller	c460bec78d	sparc64: Eliminate PTE table memory wastage. We've split up the PTE tables so that they take up half a page instead of a full page. This is in order to facilitate transparent huge page support, which works much better if our PMDs cover 4MB instead of 8MB. What we do is have a one-behind cache for PTE table allocations in the mm struct. This logic triggers only on allocations. For example, we don't try to keep track of free'd up page table blocks in the style that the s390 port does. There were only two slightly annoying aspects to this change: 1) Changing pgtable_t to be a "pte_t *". There's all of this special logic in the TLB free paths that needed adjustments, as did the PMD populate interfaces. 2) init_new_context() needs to zap the pointer, since the mm struct just gets copied from the parent on fork. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:05 +09:00
David Miller	15b9350a17	sparc64: Only support 4MB huge pages and 8KB base pages. Narrowing the scope of the page size configurations will make the transparent hugepage changes much simpler. In the end what we really want to do is have the kernel support multiple huge page sizes and use whatever is appropriate as the context dictactes. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:23:04 +09:00
Shaohua Li	45cac65b0f	readahead: fault retry breaks mmap file read random detection .fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <shaohua.li@fusionio.com> Cc: Rik van Riel <riel@redhat.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-10-09 16:22:47 +09:00
Akinobu Mita	5da444aae5	sparc: fix format string argument for prom_printf() prom_printf() takes printf style arguments. Specifing GCC's format attribute reveals that there are several wrong usages of prom_printf(). This fixes those wrong format strings and arguments, and also leaves format attributes in order to detect similar mistakes at compile time. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2012-10-02 23:20:34 -04:00
David S. Miller	c69ad0a3f7	sparc64: Use cpu_pgsz_mask for linear kernel mapping config. This required a little bit of reordering of how we set up the memory management early on. We now only know the final values of kern_linear_pte_xor[] after we take over the trap table and start processing TLB misses ourselves. So once we fill those values in we re-clear the kernel's 4M TSB and flush the TLBs. That way if we find we support larger than 4M pages we won't have any stale smaller page size entries in the TSB. SUN4U Panther support for larger page sizes should now be extremely trivial but I have no hardware on which to test it and I believe that some of the sun4u TLB miss assembler needs to be audited first to make sure it really can handle larger than 4M PTEs properly. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-06 20:35:36 -07:00
David S. Miller	ce33fdc52a	sparc64: Probe cpu page size support more portably. On sun4v, interrogate the machine description. This code is extremely defensive in nature, and a lot of the checks can probably be removed. On sun4u things are a lot simpler. There are the page sizes all chips support, and then Panther adds 32MB and 256MB pages. Report the probed value in /proc/cpuinfo Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-06 19:01:25 -07:00
David S. Miller	4f93d21d25	sparc64: Support 2GB and 16GB page sizes for kernel linear mappings. SPARC-T4 supports 2GB pages. So convert kpte_linear_bitmap into an array of 2-bit values which index into kern_linear_pte_xor. Now kern_linear_pte_xor is used for 4 page size aligned regions, 4MB, 256MB, 2GB, and 16GB respectively. Enabling 2GB pages is currently hardcoded using a check against sun4v_chip_type. In the future this will be done more cleanly by interrogating the machine description which is the correct way to determine this kind of thing. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-09-06 18:13:58 -07:00
David S. Miller	2856cc2e4d	sparc64: Be less verbose during vmemmap population. On a 2-node machine with 256GB of ram we get 512 lines of console output, which is just too much. This mimicks Yinghai Lu's x86 commit `c2b91e2eec` (x86_64/mm: check and print vmemmap allocation continuous) except that we aren't ever going to get contiguous block pointers in between calls so just print when the virtual address or node changes. This decreases the output by an order of 16. Also demote this to KERN_DEBUG. Signed-off-by: David S. Miller <davem@davemloft.net>	2012-08-15 00:37:29 -07:00
Sam Ravnborg	a0ce3ba03f	sparc32: delete dead code in show_mem() Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:17 -07:00
Sam Ravnborg	9a4d5b93cb	sparc32: move kmap_init() to highmem.c Try to keep highmem support in a more central place. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:17 -07:00
Sam Ravnborg	d8a1b2b94c	sparc32: move probe_memory() to srmmu.c Only one user so move it to the file using it. It had nothing to do in fault_32. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:17 -07:00
Sam Ravnborg	b585e8551b	sparc32: centralize all mmu context handling in srmmu.c Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:16 -07:00
Sam Ravnborg	59b00c792f	sparc32: drop quicklist The quicklist stuff is not used anymore - drop it. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:16 -07:00
Sam Ravnborg	cc52aea9dc	sparc32: drop sparc model check in paging_init We already check the model in head_32.S so no need to repeat the check here Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:16 -07:00
Sam Ravnborg	c966a337fa	sparc32: drop sparc_unmapped_base The base is always the same so no need to use a variable for this. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:16 -07:00
Sam Ravnborg	d884297aca	sparc32,leon: drop leon_init() This function was only used to set of_pdt_build_more to leon_node_init(). But the leon_node_init() was a nop as prom_amba_init was never assigned. Cc: Daniel Hellstrom <daniel@gaisler.com> Cc: Konrad Eisele <konrad@gaisler.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:16 -07:00
Sam Ravnborg	1b6d06d820	sparc32: drop fixmap.h sparc32 does not support fixmaps - so do not pretend so by having the fixmap.h file. Move relevant parts to vaddrs.h. I looked at simplifying this even more but failed to understand the reasoning behind the extra guard page involved and due to missing testing possibilities only the trivial conversion was done. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:16 -07:00
Sam Ravnborg	5bbeed12bd	sparc32: drop unused kmap_atomic_to_page No users left of this function - drop it. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:15 -07:00
Sam Ravnborg	7cdfbc74c8	sparc32: beautify srmmu_inherit_prom_mappings() Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:15 -07:00
Sam Ravnborg	f71a2aacc6	sparc32: use void * in nocache get/free This allowed to us to kill a lot of casts, with no loss of readability in any places Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:15 -07:00
Sam Ravnborg	605ae96240	sparc32: fix coding-style in srmmu.c Fix the most annoying issues that distracts me: - whitespace - missing space after "if" and "while" - spaces around operators and similar simple things. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:15 -07:00
Sam Ravnborg	4a049b0341	sparc32: sort includes in srmmu.c Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:15 -07:00
Sam Ravnborg	32442467ed	sparc32: define a few srmmu functions __init They are only used during early init so lets get rid of them after init to save some RAM. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-07-26 16:46:15 -07:00
Sam Ravnborg	805918f80f	sparc32: srmmu_probe now knows about leon too Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Hellstrom <daniel@gaisler.com> Cc: Konrad Eisele <konrad@gaisler.com>	2012-05-27 23:52:51 -07:00
Sam Ravnborg	6729cf7967	sparc32: introduce run-time patching of srmmu access functions LEON uses a different ASI than SUN for MMUREGS Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Hellstrom <daniel@gaisler.com> Cc: Konrad Eisele <konrad@gaisler.com>	2012-05-27 23:52:49 -07:00
Sam Ravnborg	3107948848	sparc32,leon: always include leon_smp + leon_mm in build Fix-up leon specific assembler to use ASI_LEON_MMUREGS Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Hellstrom <daniel@gaisler.com> Cc: Konrad Eisele <konrad@gaisler.com>	2012-05-27 23:52:47 -07:00
Sam Ravnborg	29af0ebaa2	sparc32: use the common implementation of alloc_thread_info_node() With sun4c removed we can fall-back to the common implementation. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-05-22 12:02:56 -07:00
Linus Torvalds	9daeaa3705	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next Pull sparc updates from David Miller: 1) Kill off support for sun4c and Cypress sun4m chips. And as a result we were able to also kill off that ugly btfixup thing that required multi-stage links of the final vmlinux image in the Kbuild system. This should make the kbuild maintainers really happy. Thanks a lot to Sam Ravnborg for his tireless efforts to get this going. 2) Convert sparc64 to nobootmem. I suspect now with sparc32 being a lot cleaner, it should be able to fall in line and modernize in this area too. 3) Make sparc32 use generic clockevents, from Tkhai Kirill. [ I fixed up the BPF rules, and tried to clean up the build rules too. But I don't have - or want - a sparc cross-build environment, so the BPF rule bug and the related build cleanup was all done with just a bare "make -n" pseudo-test. - Linus ] * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (110 commits) sparc32: use flushi when run-time patching in per_cpu_patch sparc32: fix cpuid_patch run-time patching sparc32: drop unused inline functions in srmmu.c sparc32: drop unused functions in pgtsrmmu.h sparc32,leon: move leon mmu functions to leon_mm.c sparc32,leon: remove duplicate definitions in leon.h sparc32,leon: remove duplicate UART register definitions sparc32,leon: move leon ASI definitions to asi.h sparc32: move trap table to a separate file sparc64: renamed ttable.S to ttable_64.S sparc32: Remove asm/sysen.h header. sparc32: Delete asm/smpprim.h sparc32: Remove unused empty_bad_page{,_table} declarations. sparc32: Kill boot_cpu_id4 sparc32: Move GET_PROCESSOR*_ID() out of asm/asmmacro.h sparc32: Remove completely unused code from asm/cache.h sparc32: Add ucmpdi2.o to obj-y instead of lib-y. sparc32: add ucmpdi2 sparc: introduce arch/sparc/Kbuild sparc: remove obsolete documentation ...	2012-05-21 10:32:01 -07:00

1 2 3 4 5 ...

275 Commits