linux

Author	SHA1	Message	Date
Anton Blanchard	c40dd2f766	powerpc: Add System RAM to /proc/iomem We've resisted adding System RAM to /proc/iomem because it is the wrong place for it. Unfortunately we continue to find tools that rely on this behaviour so give up and add it in. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-11-08 14:51:46 +11:00
Linus Torvalds	32aaeffbd4	Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h	2011-11-06 19:44:47 -08:00
Linus Torvalds	1197ab2942	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits) powerpc/p3060qds: Add support for P3060QDS board powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX powerpc/85xx: Make kexec to interate over online cpus powerpc/fsl_booke: Fix comment in head_fsl_booke.S powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver powerpc/86xx: Correct Gianfar support for GE boards powerpc/cpm: Clear muram before it is in use. drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager powerpc/fsl_msi: add support for "msi-address-64" property powerpc/85xx: Setup secondary cores PIR with hard SMP id powerpc/fsl-booke: Fix settlbcam for 64-bit powerpc/85xx: Adding DCSR node to dtsi device trees powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards powerpc/85xx: fix PHYS_64BIT selection for P1022DS powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map powerpc: respect mem= setting for early memory limit setup powerpc: Update corenet64_smp_defconfig powerpc: Update mpc85xx/corenet 32-bit defconfigs ... Fix up trivial conflicts in: - arch/powerpc/configs/40x/hcu4_defconfig removed stale file, edited elsewhere - arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c: added opal and gelic drivers vs added ePAPR driver - drivers/tty/serial/8250.c moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support	2011-11-06 17:12:03 -08:00
Andrea Arcangeli	b35a35b556	thp: share get_huge_page_tail() This avoids duplicating the function in every arch gup_fast. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:58 -07:00
Andrea Arcangeli	cf592bf768	powerpc: gup_huge_pmd() return 0 if pte changes powerpc didn't return 0 in that case, if it's rolling back the *nr pointer it should also return zero to avoid adding pages to the array at the wrong offset. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	3526741f09	powerpc: gup_hugepte() support THP based tail recounting Up to this point the code assumed old refcounting for hugepages (pre-thp). This updates the code directly to the thp mapcount tail page refcounting. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	8596468487	powerpc: gup_hugepte() avoid freeing the head page too many times We only taken "refs" pins on the head page not "*nr" pins. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	405e44f2e3	powerpc: get_hugepte() don't put_page() the wrong page "page" may have changed to point to the next hugepage after the loop completed, The references have been taken on the head page, so the put_page must happen there too. This is a longstanding issue pre-thp inclusion. It's totally unclear how these page_cache_add_speculative and pte_val(pte) != pte_val(ptep) checks are necessary across all the powerpc gup_fast code, when x86 doesn't need any of that: there's no way the page can be freed with irq disabled so we're guaranteed the atomic_inc will happen on a page with page_count > 0 (so not needing the speculative check). The pte check is also meaningless on x86: no need to rollback on x86 if the pte changed, because the pte can still change a CPU tick after the check succeeded and it won't be rolled back in that case. The important thing is we got a reference on a valid page that was mapped there a CPU tick ago. So not knowing the soft tlb refill code of ppc64 in great detail I'm not removing the "speculative" page_count increase and the pte checks across all the code, but unless there's a strong reason for it they should be later cleaned up too. If a pte can change from huge to non-huge (like it could happen with THP) passing a pte_t ptep to gup_hugepte() would also require to repeat the is_hugepd in gup_hugepte(), but that shouldn't happen with hugetlbfs only so I'm not altering that. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	2839bdc1bf	powerpc: remove superfluous PageTail checks on the pte gup_fast This part of gup_fast doesn't seem capable of handling hugetlbfs ptes, those should be handled by gup_hugepd only, so these checks are superfluous. Plus if this wasn't a noop, it would have oopsed because, the insistence of using the speculative refcounting would trigger a VM_BUG_ON if a tail page was encountered in the page_cache_get_speculative(). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Andrea Arcangeli	70b50f94f1	mm: thp: tail page refcounting fix Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Paul Gortmaker	4b16f8e2d6	powerpc: various straight conversions from module.h --> export.h All these files were including module.h just for the basic EXPORT_SYMBOL infrastructure. We can shift them off to the export.h header which is a way smaller footprint and thus realize some compile time gains. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:44 -04:00
Paul Gortmaker	9308794884	powerpc: include export.h for files using EXPORT_SYMBOL/THIS_MODULE Fix failures in powerpc associated with the previously allowed implicit module.h presence that now lead to things like this: arch/powerpc/mm/mmu_context_hash32.c:76:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' arch/powerpc/mm/tlb_hash32.c:48:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' arch/powerpc/kernel/pci_32.c:51:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' arch/powerpc/kernel/iomap.c:36:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' arch/powerpc/platforms/44x/canyonlands.c:126:1: error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL' arch/powerpc/kvm/44x.c:168:59: error: 'THIS_MODULE' undeclared (first use in this function) [with several contibutions from Stephen Rothwell <sfr@canb.auug.org.au>] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:38 -04:00
Paul Gortmaker	66b15db69c	powerpc: add export.h to files making use of EXPORT_SYMBOL With module.h being implicitly everywhere via device.h, the absence of explicitly including something for EXPORT_SYMBOL went unnoticed. Since we are heading to fix things up and clean module.h from the device.h file, we need to explicitly include these files now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:30:37 -04:00
Becky Bruce	4559424a0c	powerpc/fsl-booke: Fix settlbcam for 64-bit Currently, it does a cntlzd on the size and then subtracts it from 21.... this doesn't take into account the varying size of a "long". Just use __ilog instead (and subtract the 10 we have to subtract to get to the tsize encoding). Also correct the comment about page sizes supported. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-10-12 23:39:10 -05:00
Kumar Gala	1dc91c3eb3	powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map On FSL Book-E devices we support multiple large TLB sizes and so we can get into situations in which the initial 1G TLB size is too big and we're asked for a size that is not mappable by a single entry (like 512M). The single entry is important because when we bring up secondary cores they need to ensure any data structure they need to access (eg PACA or stack) is always mapped. So we really need to determine what size will actually be mapped by the first TLB entry to ensure we limit early memory references to that region. We refactor the map_mem_in_cams() code to provider a helper function that we can utilize to determine the size of the first TLB entry while taking into account size and alignment constraints. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-10-11 23:30:41 -05:00
Paul Mackerras	25c29f9e32	powerpc: Fix hugetlb with CONFIG_PPC_MM_SLICES=y Commit `41151e77a4` ("powerpc: Hugetlb for BookE") added some #ifdef CONFIG_MM_SLICES conditionals to hugetlb_get_unmapped_area() and vma_mmu_pagesize(). Unfortunately this is not the correct config symbol; it should be CONFIG_PPC_MM_SLICES. The result is that attempting to use hugetlbfs on 64-bit Power server processors results in an infinite stack recursion between get_unmapped_area() and hugetlb_get_unmapped_area(). This fixes it by changing the #ifdef to use CONFIG_PPC_MM_SLICES in those functions and also in book3e_hugetlb_preload(). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-23 10:21:33 +10:00
Anton Blanchard	8bdafa39a4	powerpc: Fix deadlock in icswx code The icswx code introduced an A-B B-A deadlock: CPU0 CPU1 ---- ---- lock(&anon_vma->mutex); lock(&mm->mmap_sem); lock(&anon_vma->mutex); lock(&mm->mmap_sem); Instead of using the mmap_sem to keep mm_users constant, take the page table spinlock. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:23 +10:00
Anton Blanchard	a11940978b	powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probe If we echo an address the hypervisor doesn't like to /sys/devices/system/memory/probe we oops the box: # echo 0x10000000000 > /sys/devices/system/memory/probe kernel BUG at arch/powerpc/mm/hash_utils_64.c:541! The backtrace is: create_section_mapping arch_add_memory add_memory memory_probe_store sysdev_class_store sysfs_write_file vfs_write SyS_write In create_section_mapping we BUG if htab_bolt_mapping returned an error. A better approach is to return an error which will propagate back to userspace. Rerunning the test with this patch applied: # echo 0x10000000000 > /sys/devices/system/memory/probe -bash: echo: write error: Invalid argument Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:23 +10:00
Anton Blanchard	dfbe93a222	powerpc: Coding style cleanups While converting code to use for_each_node_by_type I noticed a number of coding style issues. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:23 +10:00
Anton Blanchard	94db7c5e14	powerpc: Use for_each_node_by_type instead of open coding it Use for_each_node_by_type instead of open coding it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:23 +10:00
Anton Blanchard	6083184269	powerpc/numa: Remove double of_node_put in hot_add_node_scn_to_nid During memory hotplug testing, I got the following warning: ERROR: Bad of_node_put() on /memory@0 of_node_release kref_put of_node_put of_find_node_by_type hot_add_node_scn_to_nid hot_add_scn_to_nid memory_add_physaddr_to_nid ... of_find_node_by_type() loop does the of_node_put for us so we only need the handle the case where we terminate the loop early. As suggested by Stephen Rothwell we can do the of_node_put unconditionally outside of the loop since of_node_put handles a NULL argument fine. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 15:53:22 +10:00
Becky Bruce	41151e77a4	powerpc: Hugetlb for BookE Enable hugepages on Freescale BookE processors. This allows the kernel to use huge TLB entries to map pages, which can greatly reduce the number of TLB misses and the amount of TLB thrashing experienced by applications with large memory footprints. Care should be taken when using this on FSL processors, as the number of large TLB entries supported by the core is low (16-64) on current processors. The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g. Page sizes larger than the max zone size are called "gigantic" pages and must be allocated on the command line (and cannot be deallocated). This is currently only fully implemented for Freescale 32-bit BookE processors, but there is some infrastructure in the code for 64-bit BooKE. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-09-20 09:19:40 +10:00
Linus Torvalds	184475029a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (99 commits) drivers/virt: add missing linux/interrupt.h to fsl_hypervisor.c powerpc/85xx: fix mpic configuration in CAMP mode powerpc: Copy back TIF flags on return from softirq stack powerpc/64: Make server perfmon only built on ppc64 server devices powerpc/pseries: Fix hvc_vio.c build due to recent changes powerpc: Exporting boot_cpuid_phys powerpc: Add CFAR to oops output hvc_console: Add kdb support powerpc/pseries: Fix hvterm_raw_get_chars to accept < 16 chars, fixing xmon powerpc/irq: Quieten irq mapping printks powerpc: Enable lockup and hung task detectors in pseries and ppc64 defeconfigs powerpc: Add mpt2sas driver to pseries and ppc64 defconfig powerpc: Disable IRQs off tracer in ppc64 defconfig powerpc: Sync pseries and ppc64 defconfigs powerpc/pseries/hvconsole: Fix dropped console output hvc_console: Improve tty/console put_chars handling powerpc/kdump: Fix timeout in crash_kexec_wait_realmode powerpc/mm: Fix output of total_ram. powerpc/cpufreq: Add cpufreq driver for Momentum Maple boards powerpc: Correct annotations of pmu registration functions ... Fix up trivial Kconfig/Makefile conflicts in arch/powerpc, drivers, and drivers/cpufreq	2011-07-25 22:59:39 -07:00
Linus Torvalds	5fabc487c9	Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits) KVM: IOMMU: Disable device assignment without interrupt remapping KVM: MMU: trace mmio page fault KVM: MMU: mmio page fault support KVM: MMU: reorganize struct kvm_shadow_walk_iterator KVM: MMU: lockless walking shadow page table KVM: MMU: do not need atomicly to set/clear spte KVM: MMU: introduce the rules to modify shadow page table KVM: MMU: abstract some functions to handle fault pfn KVM: MMU: filter out the mmio pfn from the fault pfn KVM: MMU: remove bypass_guest_pf KVM: MMU: split kvm_mmu_free_page KVM: MMU: count used shadow pages on prepareing path KVM: MMU: rename 'pt_write' to 'emulate' KVM: MMU: cleanup for FNAME(fetch) KVM: MMU: optimize to handle dirty bit KVM: MMU: cache mmio info on page fault path KVM: x86: introduce vcpu_mmio_gva_to_gpa to cleanup the code KVM: MMU: do not update slot bitmap if spte is nonpresent KVM: MMU: fix walking shadow page table KVM guest: KVM Steal time registration ...	2011-07-24 09:07:03 -07:00
Linus Torvalds	4d4abdcb1d	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits) perf: Remove the nmi parameter from the oprofile_perf backend x86, perf: Make copy_from_user_nmi() a library function perf: Remove perf_event_attr::type check x86, perf: P4 PMU - Fix typos in comments and style cleanup perf tools: Make test use the preset debugfs path perf tools: Add automated tests for events parsing perf tools: De-opt the parse_events function perf script: Fix display of IP address for non-callchain path perf tools: Fix endian conversion reading event attr from file header perf tools: Add missing 'node' alias to the hw_cache[] array perf probe: Support adding probes on offline kernel modules perf probe: Add probed module in front of function perf probe: Introduce debuginfo to encapsulate dwarf information perf-probe: Move dwarf library routines to dwarf-aux.{c, h} perf probe: Remove redundant dwarf functions perf probe: Move strtailcmp to string.c perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END tracing/kprobe: Update symbol reference when loading module tracing/kprobes: Support module init function probing kprobes: Return -ENOENT if probe point doesn't exist ...	2011-07-22 16:44:39 -07:00
Benjamin Herrenschmidt	4b575f3e8a	Merge remote-tracking branch 'jwb/next' into next	2011-07-22 13:16:41 +10:00
Tony Breeds	f7ba2991e9	powerpc/mm: Fix output of total_ram. On 32bit platforms that support >= 4GB memory total_ram was truncated. This creates a confusing printk: Top of RAM: 0x100000000, Total RAM: 0x0 Fix that: Top of RAM: 0x100000000, Total RAM: 0x100000000 Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-07-19 15:13:04 +10:00
Tejun Heo	5dfe8660a3	bootmem: Replace work_with_active_regions() with for_each_mem_pfn_range() Callback based iteration is cumbersome and much less useful than for_each_*() iterator. This patch implements for_each_mem_pfn_range() which replaces work_with_active_regions(). All the current users of work_with_active_regions() are converted. This simplifies walking over early_node_map and will allow converting internal logics in page_alloc to use iterator instead of walking early_node_map directly, which in turn will enable moving node information to memblock. powerpc change is only compile tested. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110714074610.GD3455@htj.dyndns.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:45:29 -07:00
Dave Kleikamp	9661534d6a	powerpc/47x: allow kernel to be loaded in higher physical memory The 44x code (which is shared by 47x) assumes the available physical memory begins at 0x00000000. This is not necessarily the case in an AMP environment. Support CONFIG_RELOCATABLE for 476 in order to allow the kernel to be loaded into a higher memory range. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-07-12 10:34:24 -04:00
Dave Kleikamp	91b191c71e	powerpc/44x: don't use tlbivax on AMP systems Since other OS's may be running on the other cores don't use tlbivax Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-07-12 09:21:55 -04:00
Paul Mackerras	9e368f2915	KVM: PPC: book3s_hv: Add support for PPC970-family processors This adds support for running KVM guests in supervisor mode on those PPC970 processors that have a usable hypervisor mode. Unfortunately, Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to 1), but the YDL PowerStation does have a usable hypervisor mode. There are several differences between the PPC970 and POWER7 in how guests are managed. These differences are accommodated using the CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature bits. Notably, on PPC970: * The LPCR, LPID or RMOR registers don't exist, and the functions of those registers are provided by bits in HID4 and one bit in HID0. * External interrupts can be directed to the hypervisor, but unlike POWER7 they are masked by MSR[EE] in non-hypervisor modes and use SRR0/1 not HSRR0/1. * There is no virtual RMA (VRMA) mode; the guest must use an RMO (real mode offset) area. * The TLB entries are not tagged with the LPID, so it is necessary to flush the whole TLB on partition switch. Furthermore, when switching partitions we have to ensure that no other CPU is executing the tlbie or tlbsync instructions in either the old or the new partition, otherwise undefined behaviour can occur. * The PMU has 8 counters (PMC registers) rather than 6. * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist. * The SLB has 64 entries rather than 32. * There is no mediated external interrupt facility, so if we switch to a guest that has a virtual external interrupt pending but the guest has MSR[EE] = 0, we have to arrange to have an interrupt pending for it so that we can get control back once it re-enables interrupts. We do that by sending ourselves an IPI with smp_send_reschedule after hard-disabling interrupts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:59 +03:00
Paul Mackerras	969391c58a	powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to indicate that we have a usable hypervisor mode, and another to indicate that the processor conforms to PowerISA version 2.06. We also add another bit to indicate that the processor conforms to ISA version 2.01 and set that for PPC970 and derivatives. Some PPC970 chips (specifically those in Apple machines) have a hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode is not useful in the sense that there is no way to run any code in supervisor mode (HV=0 PR=0). On these processors, the LPES0 and LPES1 bits in HID4 are always 0, and we use that as a way of detecting that hypervisor mode is not useful. Where we have a feature section in assembly code around code that only applies on POWER7 in hypervisor mode, we use a construct like END_FTR_SECTION_IFSET(CPU_FTR_HVMODE \| CPU_FTR_ARCH_206) The definition of END_FTR_SECTION_IFSET is such that the code will be enabled (not overwritten with nops) only if all bits in the provided mask are set. Note that the CPU feature check in __tlbie() only needs to check the ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called if we are running bare-metal, i.e. in hypervisor mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>	2011-07-12 13:16:58 +03:00
Becky Bruce	3160b09796	powerpc: Create next_tlbcam_idx percpu variable for FSL_BOOKE This is used to round-robin TLBCAM entries. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2011-07-08 00:21:34 -05:00
Peter Zijlstra	a8b0ca17b8	perf: Remove the nmi parameter from the swevent and overflow interface The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-01 11:06:35 +02:00
Dave Carroll	a9c0f41b3a	powerpc: Add printk companion for ppc_md.progress This patch adds a printk companion to replace the udbg progress function when initmem is freed. Suggested-by: Milton Miller <miltonm@bga.com> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-30 15:28:05 +10:00
Dave Carroll	2773fcc8c4	powerpc: Move free_initmem to common code The free_initmem function is basically duplicated in mm/init_32, and init_64, and is moved to the common 32/64-bit mm/mem.c. All other sections except init were removed in v2.6.15 by `6c45ab992e` (powerpc: Remove section free() and linker script bits), and therefore the bulk of the executed code is identical. This patch also removes updating ppc_md.progress to NULL in the powermac late_initcall. Suggested-by: Milton Miller <miltonm@bga.com> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-30 15:28:05 +10:00
Benjamin Herrenschmidt	6da49a2925	Merge remote branch 'origin/master' into next	2011-06-30 15:23:59 +10:00
Becky Bruce	3d41e0f6d9	powerpc: mem_init should call memblock_is_reserved with phys_addr_t This has been broken for a while but hasn't been an issue until now because nobody was reserving regions at high addresses. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-29 17:48:18 +10:00
Scott Wood	f67f4ef5fc	powerpc/book3e-64: use a separate TLB handler when linear map is bolted On MMUs such as FSL where we can guarantee the entire linear mapping is bolted, we don't need to worry about linear TLB misses. If on top of that we do a full table walk, we get rid of all recursive TLB faults, and can dispense with some state saving. This gains a few percent on TLB-miss-heavy workloads, and around 50% on a benchmark that had a high rate of virtual page table faults under the normal handler. While touching the EX_TLB layout, remove EX_TLB_MMUCR0, EX_TLB_SRR0, and EX_TLB_SRR1 as they're not used. [BenH: Fixed build with 64K pages (wsp config)] Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-29 17:47:48 +10:00
Christian Dietrich	76462232c2	arch/powerpc: use printk_ratelimited instead of printk_ratelimit Since printk_ratelimit() shouldn't be used anymore (see comment in include/linux/printk.h), replace it with printk_ratelimited. Signed-off-by: Christian Dietrich <christian.dietrich@informatik.uni-erlangen.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-29 15:31:01 +10:00
Kumar Gala	32d206eb56	powerpc/book3e: Clarify HW table walk enable/disable message Before if we didn't support or enable HW table walk we'd get a messaage like: MMU: Book3E Page Tables Disabled Which is a bit misleading. Now it will say: MMU: Book3E HW tablewalk not supported Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-17 16:19:51 +10:00
Benjamin Herrenschmidt	307cfe7153	powerpc: Force page alignment for initrd reserved memory When using 64K pages with a separate cpio rootfs, U-Boot will align the rootfs on a 4K page boundary. When the memory is reserved, and subsequent early memblock_alloc is called, it will allocate memory between the 64K page alignment and reserved memory. When the reserved memory is subsequently freed, it is done so by pages, causing the early memblock_alloc requests to be re-used, which in my case, caused the device-tree to be clobbered. This patch forces the reserved memory for initrd to be kernel page aligned, and will move the device tree if it overlaps with the range extension of initrd. This patch will also consolidate the identical function free_initrd_mem() from mm/init_32.c, init_64.c to mm/mem.c, and adds the same range extension when freeing initrd. free_initrd_mem() is also moved to the __init section. Many thanks to Milton Miller for his input on this patch. [BenH: Fixed build without CONFIG_BLK_DEV_INITRD] Signed-off-by: Dave Carroll <dcarroll@astekcorp.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-06-09 16:52:38 +10:00
Peter Zijlstra	2672391169	mm, powerpc: move the RCU page-table freeing into generic code In case other architectures require RCU freed page-tables to implement gup_fast() and software filled hashes and similar things, provide the means to do so by moving the logic into generic code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Requested-by: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:16 -07:00
Peter Zijlstra	d6bf29b44d	powerpc: mmu_gather rework Fix up powerpc to the new mmu_gather stuff. PPC has an extra batching queue to RCU free the actual pagetable allocations, use the ARCH extentions for that for now. For the ppc64_tlb_batch, which tracks the vaddrs to unhash from the hardware hash-table, keep using per-cpu arrays but flush on context switch and use a TLF bit to track the lazy_mmu state. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:13 -07:00
Anton Blanchard	40f1ce7fb7	powerpc: Remove ioremap_flags We have a confusing number of ioremap functions. Make things just a bit simpler by merging ioremap_flags and ioremap_prot. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-19 14:30:43 +10:00
Anton Blanchard	be135f4089	powerpc: Add ioremap_wc Add ioremap_wc so drivers can request write combining on kernel mappings. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-19 14:30:42 +10:00
Stephen Rothwell	79af2187fa	powerpc: Fix compile with icwsx support Due to a collision between NO_CONTEXT->MMU_NO_CONTEXT change and Anton's patch. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-06 13:18:34 +10:00
KOSAKI Motohiro	104699c0ab	powerpc: Convert old cpumask API into new one Adapt new API. Almost change is trivial. Most important change is the below line because we plan to change task->cpus_allowed implementation. - ctx->cpus_allowed = current->cpus_allowed; Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-04 15:22:59 +10:00
Tseng-Hui (Frank) Lin	851d2e2fe8	powerpc: Add Initiate Coprocessor Store Word (icswx) support Icswx is a PowerPC instruction to send data to a co-processor. On Book-S processors the LPAR_ID and process ID (PID) of the owning process are registered in the window context of the co-processor at initialization time. When the icswx instruction is executed the L2 generates a cop-reg transaction on PowerBus. The transaction has no address and the processor does not perform an MMU access to authenticate the transaction. The co-processor compares the LPAR_ID and the PID included in the transaction and the LPAR_ID and PID held in the window context to determine if the process is authorized to generate the transaction. The OS needs to assign a 16-bit PID for the process. This cop-PID needs to be updated during context switch. The cop-PID needs to be destroyed when the context is destroyed. Signed-off-by: Sonny Rao <sonnyrao@linux.vnet.ibm.com> Signed-off-by: Tseng-Hui (Frank) Lin <thlin@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-04 15:19:26 +10:00
Michael Neuling	a32e252f7c	powerpc: Use new CPU feature bit to select 2.06 tlbie This removes MMU_FTR_TLBIE_206 as we can now use CPU_FTR_HVMODE_206. It also changes the logic to select which tlbie to use to be based on this new CPU feature bit. This also duplicates the ASM_FTR_IF/SET/CLR defines for CPU features (copied from MMU features). Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-05-04 15:19:26 +10:00
Matt Evans	44ae3ab335	powerpc: Free up some CPU feature bits by moving out MMU-related features Some of the 64bit PPC CPU features are MMU-related, so this patch moves them to MMU_FTR_ bits. All cpu_has_feature()-style tests are moved to mmu_has_feature(), and seven feature bits are freed as a result. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:52 +10:00
Michael Ellerman	e70606eb9b	powerpc/numa: Look for ibm, associativity-reference-points at the root If we don't find ibm,associativity-reference-points as a child of /rtas, look for it at the root of the tree instead. We use this on Book3E where we have no RTAS but still use the sPAPR conventions for NUMA. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 14:18:35 +10:00
Benjamin Herrenschmidt	bd49178109	powerpc: Add TLB size detection for TYPE_3E MMUs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-27 13:02:10 +10:00
Anton Blanchard	b68a70c496	powerpc: Replace open coded instruction patching with patch_instruction/patch_branch There are a few places we patch instructions without using patch_instruction and patch_branch, probably because they predated it. Fix it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-20 17:01:18 +10:00
Michael Ellerman	f5be2dc0bd	powerpc/nohash: Allocate stale_map[cpu] on CPU_UP_PREPARE not CPU_ONLINE Currently we allocate the stale_map for a cpu when it comes online, this leaves open a small window where a process can be scheduled on the cpu before the stale_map is allocated. Instead allocate the stale_map at CPU_UP_PREPARE time, that way it will be always available before tasks start running. It is possible the cpu fails to come up, in which case we should free the stale_map, so add a CPU_UP_CANCELED case to do that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-20 17:01:18 +10:00
Michael Ellerman	5e8e7b404a	powerpc/mm: Standardise on MMU_NO_CONTEXT Use MMU_NO_CONTEXT as the initialiser for mm_context.id on nohash and hash64. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-04-20 16:59:20 +10:00
Linus Torvalds	42933bac11	Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6 * 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings	2011-04-07 11:14:49 -07:00
Sylvestre Ledru	f65e51d740	Documentation: fix minor typos/spelling Fix some minor typos: * informations => information * there own => their own * these => this Signed-off-by: Sylvestre Ledru <sylvestre.ledru@scilab.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-04 17:51:47 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Benjamin Herrenschmidt	6090912c4a	powerpc: Implement dma_mmap_coherent() This is used by Alsa to mmap buffers allocated with dma_alloc_coherent() into userspace. We need a special variant to handle machines with non-coherent DMAs as those buffers have "special" virt addresses and require non-cachable mappings Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-30 10:44:00 +11:00
Linus Torvalds	0a95d92c00	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (62 commits) powerpc/85xx: Fix signedness bug in cache-sram powerpc/fsl: 85xx: document cache sram bindings powerpc/fsl: define binding for fsl mpic interrupt controllers powerpc/fsl_msi: Handle msi-available-ranges better drivers/serial/ucc_uart.c: Add of_node_put to avoid memory leak powerpc/85xx: Fix SPE float to integer conversion failure powerpc/85xx: Update sata controller compatible for p1022ds board ATA: Add FSL sata v2 controller support powerpc/mpc8xxx_gpio: simplify searching for 'fsl, qoriq-gpio' compatiable powerpc/8xx: remove obsolete mgsuvd board powerpc/82xx: rename and update mgcoge board support powerpc/83xx: rename and update kmeter1 powerpc/85xx: Workaroudn e500 CPU erratum A005 powerpc/fsl_pci: Add support for FSL PCIe controllers v2.x powerpc/85xx: Fix writing to spin table 'cpu-release-addr' on ppc64e powerpc/pseries: Disable MSI using new interface if possible powerpc: Enable GENERIC_HARDIRQS_NO_DEPRECATED. powerpc: core irq_data conversion. powerpc: sysdev/xilinx_intc irq_data conversion. powerpc: sysdev/uic irq_data conversion. ... Fix up conflicts in arch/powerpc/sysdev/fsl_msi.c (due to getting rid of of_platform_driver in arch/powerpc)	2011-03-18 06:31:43 -07:00
Benjamin Herrenschmidt	831532035b	Merge remote branch 'jwb/next' into next	2011-03-17 17:59:01 +11:00
Benjamin Herrenschmidt	36e8695ca5	powerpc/pseries: Disable VPNH feature This feature triggers nasty races in the scheduler between the rebuilding of the topology and the load balancing code, causing the machine to hang. Disable it for now until the races are fixed. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-10 10:06:41 +11:00
Scott Wood	6dd2270029	powerpc: Fix memory limits when starting at a non-zero address memblock_enforce_memory_limit() takes the desired maximum quantity of memory to end up with, not an address above which memory will not be used. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-02 16:56:15 +11:00
Peter Zijlstra	f342552b91	powerpc/mm: Make hpte_need_flush() safe for preemption hpte_need_flush() might be called outside of a preempt section when manipulating the kernel page tables, so we need to use the appopriate variants of per-cpu variable accesses. There should be no risk of being in the middle of a batch and a context switch will flush any pending batch. [Patch extracted from a larger patch in Peter's preemptible mmu_gather series] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-03-02 14:56:48 +11:00
Anton Blanchard	429f4d8d20	powerpc/numa: Fix bug in unmap_cpu_from_node When converting to the new cpumask code I screwed up: - if (cpu_isset(cpu, numa_cpumask_lookup_table[node])) { - cpu_clear(cpu, numa_cpumask_lookup_table[node]); + if (cpumask_test_cpu(cpu, node_to_cpumask_map[node])) { + cpumask_set_cpu(cpu, node_to_cpumask_map[node]); This was introduced in commit `25863de07a` (powerpc/cpumask: Convert NUMA code to new cpumask API) Fix it. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:06:06 +11:00
Anton Blanchard	fe5cfd6355	powerpc/numa: Disable VPHN on dedicated processor partitions There is no need to start up the timer and monitor topology changes on a dedicated processor partition, so disable it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:06:04 +11:00
Anton Blanchard	c0e5e46f39	powerpc/numa: Add length when creating OF properties via VPHN The rest of the NUMA code expects an OF associativity property with the first cell containing the length. Without this fix all topology changes cause us to misparse the property and put the cpu into node 0. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:06:03 +11:00
Anton Blanchard	d69043e806	powerpc/numa: Check for all VPHN changes The hypervisor uses unsigned 1 byte counters to signal topology changes to the OS. Since they can wrap we need to check for any difference, not just if the hypervisor count is greater than the previous count. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:06:01 +11:00
Anton Blanchard	5de1669910	powerpc/numa: Only use active VPHN count fields VPHN supports up to 8 distance fields but the number of entries in ibm,associativity-reference-points signifies how many are in use. Don't look at all the VPHN counts, only distance_ref_points_depth worth. Since we already cap our distance metrics at MAX_DISTANCE_REF_POINTS, use that to size the VPHN arrays and add a BUILD_BUG_ON to avoid it growing larger than the VPHN maximum of 8. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:05:59 +11:00
Jesse Larrew	cd9d6cc726	powerpc/pseries: Remove unnecessary variable initializations in numa.c Remove unnecessary variable initializations in VPHN functions. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 13:05:36 +11:00
Jesse Larrew	7639adaafb	powerpc/pseries: Fix brace placement in numa.c Fix brace placement in VPHN code. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 12:58:23 +11:00
Jesse Larrew	bd03403ad5	powerpc/pseries: Fix typo in VPHN comments Correct a spelling error in VPHN comments in numa.c. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-02-07 12:58:21 +11:00
Dave Kleikamp	21a06b0459	powerpc/476: Workaround for PLB6 hang The 476FP core may hang if an instruction fetch happens during an msync following a tlbsync. This workaround makes sure that enough instruction cache lines are pre-fetched before executing the msync. (sync and msync are the same to the compiler.) Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2011-02-02 06:59:02 -05:00
Andrea Arcangeli	9180706344	thp: alter compound get_page/put_page Alter compound get_page/put_page to keep references on subpages too, in order to allow __split_huge_page_refcount to split an hugepage even while subpages have been pinned by one of the get_user_pages() variants. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:39 -08:00
Benjamin Herrenschmidt	5d7d8072ed	powerpc/pseries: Fix build of topology stuff without CONFIG_NUMA Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-01-12 10:56:29 +11:00
Jesse Larrew	39bf990ead	powerpc/pseries: Fix VPHN build errors on non-SMP systems The header asm/hvcall.h was previously included indirectly via smp.h. On non-SMP systems, however, these declarations are excluded and the build breaks. This is easily fixed by including asm/hvcall.h directly. The VPHN feature is only meaningful on NUMA systems that implement the SPLPAR option, so exclude the VPHN code on systems without SPLPAR enabled. Also, expose unmap_cpu_from_node() on systems with SPLPAR enabled, even if CONFIG_HOTPLUG_CPU is disabled. Lastly, map_cpu_to_node() is now needed by VPHN to manipulate the node masks after boot time, so remove the __cpuinit annotation to fix a section mismatch. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>	2011-01-11 16:06:16 +11:00
Jesper Juhl	ae9fd31a36	powerpc: Remove unnecessary casts of void ptr Hi, The [vk][cmz]alloc(_node) family of functions return void pointers which it's completely unnecessary/pointless to cast to other pointer types since that happens implicitly. This patch removes such casts from arch/powerpc/ Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-12-09 15:36:30 +11:00
Jesse Larrew	9eff1a3840	powerpc/pseries: Poll VPA for topology changes and update NUMA maps This patch sets a timer during boot that will periodically poll the associativity change counters in the VPA. When a change in associativity is detected, it retrieves the new associativity domain information via the H_HOME_NODE_ASSOCIATIVITY hcall and updates the NUMA node maps and sysfs entries accordingly. Note that since the ibm,associativity device tree property does not exist on configurations with both NUMA and SPLPAR enabled, no device tree updates are necessary. Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-12-09 15:36:29 +11:00
Michael Ellerman	7a9d12568e	powerpc: Record vma->phys_addr in ioremap() The vmalloc code can track the physical address of a vma, when the vma is used for ioremap, if set it is displayed in /proc/vmallocinfo. Because get_vm_area_caller() doesn't know it's being called for ioremap() it's up to the arch code to set the phys_addr. A bunch of other arch's do this, I'm not sure why powerpc doesn't? Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-12-09 15:35:32 +11:00
Benjamin Herrenschmidt	f4b9841595	Merge branch 'nvram' into next	2010-12-09 14:36:38 +11:00
Peter Zijlstra	f2e785ed5f	powerpc: Use call_rcu_sched() for pagetables PowerPC relies on IRQ-disable to guard against RCU quiecent states, use the appropriate RCU call version. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-30 10:42:20 +11:00
Michael Neuling	0b97fee0ef	powerpc/mm: Avoid avoidable void* pointer Change pgdir from a void to real type. Having this as a void is stupid and has already caused 1 bug. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-29 15:48:23 +11:00
Nishanth Aravamudan	cd34206e94	powerpc: Add memory_hotplug_max() Add a function to get the maximum address that can be hotplug added. This is needed to calculate the size of the tce table needed to cover all memory in 1:1 mode. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-29 15:48:21 +11:00
Vaidyanathan Srinivasan	99d8670525	powerpc: Cleanup APIs for cpu/thread/core mappings These APIs take logical cpu number as input Change cpu_first_thread_in_core() to cpu_first_thread_sibling() Change cpu_last_thread_in_core() to cpu_last_thread_sibling() These APIs convert core number (index) to logical cpu/thread numbers Add cpu_first_thread_of_core(int core) Changed cpu_thread_to_core() to cpu_core_index_of_thread(int cpu) The goal is to make 'threads_per_core' accessible to the pseries_energy module. Instead of making an API to read threads_per_core, this is a higher level wrapper function to convert from logical cpu number to core number. The current APIs cpu_first_thread_in_core() and cpu_last_thread_in_core() returns logical CPU number while cpu_thread_to_core() returns core number or index which is not a logical CPU number. The new APIs are now clearly named to distinguish 'core number' versus first and last 'logical cpu number' in that core. The new APIs cpu_{first,last}_thread_sibling() work on logical cpu numbers. While cpu_first_thread_of_core() and cpu_core_index_of_thread() work on core index. Example usage: (4 threads per core system) cpu_first_thread_sibling(5) = 4 cpu_last_thread_sibling(5) = 7 cpu_core_index_of_thread(5) = 1 cpu_first_thread_of_core(1) = 4 cpu_core_index_of_thread() is used in cpu_to_drc_index() in the module and cpu_first_thread_of_core() is used in drc_index_to_cpu() in the module. Make API changes to few callers. Export symbols for use in modules. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-29 15:48:19 +11:00
Kumar Gala	82ae5eaffa	powerpc/mm: Fix module instruction tlb fault handling on Book-E 64 We were seeing oops like the following when we did an rmmod on a module: Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0x8000000000008010 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2 P5020 DS last sysfs file: /sys/devices/qman-portals.2/qman-pool.9/uevent Modules linked in: qman_tester(-) NIP: 8000000000008010 LR: c000000000074858 CTR: 8000000000008010 REGS: c00000002e29bab0 TRAP: 0400 Not tainted (2.6.34.6-00744-g2d21f14) MSR: 0000000080029000 <EE,ME,CE> CR: 24000448 XER: 00000000 TASK = c00000007a8be600[4987] 'rmmod' THREAD: c00000002e298000 CPU: 1 GPR00: 8000000000008010 c00000002e29bd30 8000000000012798 c00000000035fb28 GPR04: 0000000000000002 0000000000000002 0000000024022428 c000000000009108 GPR08: fffffffffffffffe 800000000000a618 c0000000003c13c8 0000000000000000 GPR12: 0000000022000444 c00000000fffed00 0000000000000000 0000000000000000 GPR16: 00000000100c0000 0000000000000000 00000000100dabc8 0000000010099688 GPR20: 0000000000000000 00000000100cfc28 0000000000000000 0000000010011a44 GPR24: 00000000100017b2 0000000000000000 0000000000000000 0000000000000880 GPR28: c00000000035fb28 800000000000a7b8 c000000000376d80 c0000000003cce50 NIP [8000000000008010] .test_exit+0x0/0x10 [qman_tester] LR [c000000000074858] .SyS_delete_module+0x1f8/0x2f0 Call Trace: [c00000002e29bd30] [c0000000000748b4] .SyS_delete_module+0x254/0x2f0 (unreliable) [c00000002e29be30] [c000000000000580] syscall_exit+0x0/0x2c Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 38600000 4e800020 60000000 60000000 <4e800020> 60000000 60000000 60000000 ---[ end trace 4f57124939a84dc8 ]--- This appears to be due to checking the wrong permission bits in the instruction_tlb_miss handling if the address that faulted was in vmalloc space. We need to look at the supervisor execute (_PAGE_BAP_SX) bit and not the user bit (_PAGE_BAP_UX/_PAGE_EXEC). Also removed a branch level since it did not appear to be used. Reported-by: Jeffrey Ladouceur <Jeffrey.Ladouceur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-18 14:54:23 +11:00
Michael Neuling	1c2c25c787	powerpc: Fix call to subpage_protection() In: powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT commit `d28513bc7f` Author: David Gibson <david@gibson.dropbear.id.au> subpage_protection() was changed to to take an mm rather a pgdir but it didn't change calling site in hashpage_preload(). The change wasn't noticed at compile time since hashpage_preload() used a void* as the parameter to subpage_protection(). This is obviously wrong and can trigger the following crash when CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_PPC_64K_PAGES CONFIG_PPC_SUBPAGE_PROT are enabled. Freeing unused kernel memory: 704k freed Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6c49b7 Faulting instruction address: 0xc0000000000410f4 cpu 0x2: Vector: 300 (Data Access) at [c00000004233f590] pc: c0000000000410f4: .hash_preload+0x258/0x338 lr: c000000000041054: .hash_preload+0x1b8/0x338 sp: c00000004233f810 msr: 8000000000009032 dar: 6b6b6b6b6b6c49b7 dsisr: 40000000 current = 0xc00000007e2c0070 paca = 0xc000000007fe0500 pid = 1, comm = init enter ? for help [c00000004233f810] c000000000041020 .hash_preload+0x184/0x338 (unreliable) [c00000004233f8f0] c00000000003ed98 .update_mmu_cache+0xb0/0xd0 [c00000004233f990] c000000000157754 .__do_fault+0x48c/0x5dc [c00000004233faa0] c000000000158fd0 .handle_mm_fault+0x508/0xa8c [c00000004233fb90] c0000000006acdd4 .do_page_fault+0x428/0x6ac [c00000004233fe30] c000000000005260 handle_page_fault+0x20/0x74 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-18 14:54:23 +11:00
Kumar Gala	4a89261b02	powerpc/mm: Fix build error in setup_initial_memory_limit arch/powerpc/mm/tlb_nohash.c: In function 'setup_initial_memory_limit': arch/powerpc/mm/tlb_nohash.c:588:29: error: 'ppc64_memblock_base' undeclared (first use in this function) arch/powerpc/mm/tlb_nohash.c:588:29: note: each undeclared identifier is reported only once for each function it appears in Due to a copy/paste typo with the following commit: commit `cd3db0c4ca` Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Tue Jul 6 15:39:02 2010 -0700 memblock: Remove rmo_size, burry it in arch/powerpc where it belongs Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-11-18 14:54:22 +11:00
Peter Zijlstra	20273941f2	mm: fix race in kunmap_atomic() Christoph reported a nice splat which illustrated a race in the new stack based kmap_atomic implementation. The problem is that we pop our stack slot before we're completely done resetting its state -- in particular clearing the PTE (sometimes that's CONFIG_DEBUG_HIGHMEM). If an interrupt happens before we actually clear the PTE used for the last slot, that interrupt can reuse the slot in a dirty state, which triggers a BUG in kmap_atomic(). Fix this by introducing kmap_atomic_idx() which reports the current slot index without actually releasing it and use that to find the PTE and delay the _pop() until after we're completely done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Christoph Hellwig <hch@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:05 -07:00
Peter Zijlstra	3e4d3af501	mm: stack based kmap_atomic() Keep the current interface but ignore the KM_type and use a stack based approach. The advantage is that we get rid of crappy code like: #define __KM_PTE \ (in_nmi() ? KM_NMI_PTE : \ in_irq() ? KM_IRQ_PTE : \ KM_PTE0) and in general can stop worrying about what context we're in and what kmap slots might be appropriate for that. The downside is that FRV kmap_atomic() gets more expensive. For now we use a CPP trick suggested by Andrew: #define kmap_atomic(page, args...) __kmap_atomic(page) to avoid having to touch all kmap_atomic() users in a single patch. [ not compiled on: - mn10300: the arch doesn't actually build with highmem to begin with ] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Linus Torvalds	d4429f608a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (71 commits) powerpc/44x: Update ppc44x_defconfig powerpc/watchdog: Make default timeout for Book-E watchdog a Kconfig option fsl_rio: Add comments for sRIO registers. powerpc/fsl-booke: Add e55xx (64-bit) smp defconfig powerpc/fsl-booke: Add p5020 DS board support powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes powerpc/fsl-booke: Add support for FSL 64-bit e5500 core powerpc/85xx: add cache-sram support powerpc/85xx: add ngPIXIS FPGA device tree node to the P1022DS board powerpc: Fix compile error with paca code on ppc64e powerpc/fsl-booke: Add p3041 DS board support oprofile/fsl emb: Don't set MSR[PMM] until after clearing the interrupt. powerpc/fsl-booke: Add PCI device ids for P2040/P3041/P5010/P5020 QoirQ chips powerpc/mpc8xxx_gpio: Add support for 'qoriq-gpio' controllers powerpc/fsl_booke: Add support to boot from core other than 0 powerpc/p1022: Add probing for individual DMA channels powerpc/fsl_soc: Search all global-utilities nodes for rstccr powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT powerpc/mpc83xx: Support for MPC8308 P1M board ... Fix up conflict with the generic irq_work changes in arch/powerpc/kernel/time.c	2010-10-21 21:19:54 -07:00
Kumar Gala	55fd766b5f	powerpc/fsl-booke64: Use TLB CAMs to cover linear mapping on FSL 64-bit chips On Freescale parts typically have TLB array for large mappings that we can bolt the linear mapping into. We utilize the code that already exists on PPC32 on the 64-bit side to setup the linear mapping to be cover by bolted TLB entries. We utilize a quarter of the variable size TLB array for this purpose. Additionally, we limit the amount of memory to what we can cover via bolted entries so we don't get secondary faults in the TLB miss handlers. We should fix this limitation in the future. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-10-14 00:55:14 -05:00
Kumar Gala	988cf86d4f	powerpc/fsl-booke: Add support for FSL Arch v1.0 MMU in setup_page_sizes Update setup_page_sizes() to support for a MMU v1.0 FSL style MMU implementation. In such a processor, we don't have TLB0PS or EPTCFG registers (and access to these registers may cause exceptions). We need to parse the older format of TLBnCFG for page size support. Additionaly, assume since we are an FSL implementation that we have 2 TLB arrays and the second array contains the variable size pages. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-10-14 00:55:09 -05:00
Paul Gortmaker	92437d4137	powerpc: Fix invalid page flags in create TLB CAM path for PTE_64BIT There exists a four line chunk of code, which when configured for 64 bit address space, can incorrectly set certain page flags during the TLB creation. It turns out that this is code which isn't used, but might still serve a purpose. Since it isn't obvious why it exists or why it causes problems, the below description covers both in detail. For powerpc bootstrap, the physical memory (at most 768M), is mapped into the kernel space via the following path: MMU_init() \| + adjust_total_lowmem() \| + map_mem_in_cams() \| + settlbcam(i, virt, phys, cam_sz, PAGE_KERNEL_X, 0); On settlbcam(), the kernel will create TLB entries according to the flag, PAGE_KERNEL_X. settlbcam() { ... TLBCAM[index].MAS1 = MAS1_VALID \| MAS1_IPROT \| MAS1_TSIZE(tsize) \| MAS1_TID(pid); ^ These entries cannot be invalidated by the kernel since MAS1_IPROT is set on TLB property. ... if (flags & _PAGE_USER) { TLBCAM[index].MAS3 \|= MAS3_UX \| MAS3_UR; TLBCAM[index].MAS3 \|= ((flags & _PAGE_RW) ? MAS3_UW : 0); } For classic BookE (flags & _PAGE_USER) is 'zero' so it's fine. But on boards like the the Freescale P4080, we want to support 36-bit physical address on it. So the following options may be set: CONFIG_FSL_BOOKE=y CONFIG_PTE_64BIT=y CONFIG_PHYS_64BIT=y As a result, boards like the P4080 will introduce PTE format as Book3E. As per the file: arch/powerpc/include/asm/pgtable-ppc32.h * #elif defined(CONFIG_FSL_BOOKE) && defined(CONFIG_PTE_64BIT) * #include <asm/pte-book3e.h> So PAGE_KERNEL_X is __pgprot(_PAGE_BASE \| _PAGE_KERNEL_RWX) and the book3E version of _PAGE_KERNEL_RWX is defined with: (_PAGE_BAP_SW \| _PAGE_BAP_SR \| _PAGE_DIRTY \| _PAGE_BAP_SX) Note the _PAGE_BAP_SR, which is also defined in the book3E _PAGE_USER: #define _PAGE_USER (_PAGE_BAP_UR \| _PAGE_BAP_SR) /* Can be read */ So the possibility exists to wrongly assign the user MAS3_U<RWX> bits to kernel (PAGE_KERNEL_X) address space via the following code fragment: if (flags & _PAGE_USER) { TLBCAM[index].MAS3 \|= MAS3_UX \| MAS3_UR; TLBCAM[index].MAS3 \|= ((flags & _PAGE_RW) ? MAS3_UW : 0); } Here is a dump of the TLB info from Simics with the above code present: ------ L2 TLB1 GT SSS UUU V I Row Logical Physical SS TLPID TID WIMGE XWR XWR F P V ----- ----------------- ------------------- -- ----- ----- ----- --- --- - - - 0 c0000000-cfffffff 000000000-00fffffff 00 0 0 M XWR XWR 0 1 1 1 d0000000-dfffffff 010000000-01fffffff 00 0 0 M XWR XWR 0 1 1 2 e0000000-efffffff 020000000-02fffffff 00 0 0 M XWR XWR 0 1 1 Actually this conditional code was used for two legacy functions: 1: support KGDB to set break point. KGDB already dropped this; now uses its core write to set break point. 2: io_block_mapping() to create TLB in segmentation size (not PAGE_SIZE) for device IO space. This use case is also removed from the latest PowerPC kernel. However, there may still be a use case for it in the future, like large user pages, so we can't remove it entirely. As an alternative, we match on all bits of _PAGE_USER instead of just any bits, so the case where just _PAGE_BAP_SR is set can't sneak through. With this done, the TLB appears without U having XWR as below: ------- L2 TLB1 GT SSS UUU V I Row Logical Physical SS TLPID TID WIMGE XWR XWR F P V ----- ----------------- ------------------- -- ----- ----- ----- --- --- - - - 0 c0000000-cfffffff 000000000-00fffffff 00 0 0 M XWR 0 1 1 1 d0000000-dfffffff 010000000-01fffffff 00 0 0 M XWR 0 1 1 2 e0000000-efffffff 020000000-02fffffff 00 0 0 M XWR 0 1 1 Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-10-14 00:52:55 -05:00
matt mooney	4108d9ba90	powerpc/Makefiles: Change to new flag variables Replace EXTRA_CFLAGS with ccflags-y and EXTRA_AFLAGS with asflags-y. Signed-off-by: matt mooney <mfm@muteddisk.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-10-13 16:19:22 +11:00
Yinghai Lu	c7fc2de0c8	memblock, bootmem: Round pfn properly for memory and reserved regions We need to round memory regions correctly -- specifically, we need to round reserved region in the more expansive direction (lower limit down, upper limit up) whereas usable memory regions need to be rounded in the more restrictive direction (lower limit up, upper limit down). This introduces two set of inlines: memblock_region_memory_base_pfn() memblock_region_memory_end_pfn() memblock_region_reserved_base_pfn() memblock_region_reserved_end_pfn() Although they are antisymmetric (and therefore are technically duplicates) the use of the different inlines explicitly documents the programmer's intention. The lack of proper rounding caused a bug on ARM, which was then found to also affect other architectures. Reported-by: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB4CDFD.4020105@kernel.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-12 15:37:51 -07:00
Matthew McClintock	0d35e1620d	powerpc/mm: Assume first cpu is boot_cpuid not 0 arch/powerpc/mm/mmu_context_nohash.c assumes the boot cpu will always have smp_processor_id() == 0. This patch fixes that assumption Signed-off-by: Matthew McClintock <msm@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:34 +10:00
Anton Blanchard	28b549905b	powerpc: Check end of stack canary at oops time Add a check for the stack canary when we oops, similar to x86. This should make it clear that we overran our stack: Unable to handle kernel paging request for data at address 0x24652f63700ac689 Faulting instruction address: 0xc000000000063d24 Thread overran stack, or stack corrupted Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-09-02 14:07:30 +10:00
Ingo Molnar	daab7fc734	Merge commit 'v2.6.36-rc3' into x86/memblock Conflicts: arch/x86/kernel/trampoline.c mm/memblock.c Merge reason: Resolve the conflicts, update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-31 09:45:46 +02:00
Sonny Rao	79c3095fb3	powerpc: Export memstart_addr and kernstart_addr on ppc64 Some modules (like eHCA) want to map all of kernel memory, for this to work with a relocated kernel, we need to export kernstart_addr so modules can use PHYSICAL_START and memstart_addr so they could use MEMORY_START. Note that the 32bit code already exports these symbols. Signed-off-By: Sonny Rao <sonnyrao@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-24 15:26:26 +10:00
Benjamin Herrenschmidt	b1515af291	Merge remote branch 'jwb/merge' into merge	2010-08-24 14:36:45 +10:00
Dave Kleikamp	32412aa214	powerpc/47x: Add an isync before the tlbivax instruction Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-08-23 07:38:31 -04:00
Cesar Eduardo Barros	597781f3e5	kmap_atomic: make kunmap_atomic() harder to misuse kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse" list[1] ("Follow common convention and you'll get it wrong"), except in some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3]. kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes takes a pointer to within the page itself. This seems to once in a while trip people up (the convention they are following is the one from kunmap()). Make it much harder to misuse, by moving it to level 9 on Rusty's list[4] ("The compiler/linker won't let you get it wrong"). This is done by refusing to build if the type of its first argument is a pointer to a struct page. The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck() (which is what you would call in case for some strange reason calling it with a pointer to a struct page is not incorrect in your code). The previous version of this patch was compile tested on x86-64. [1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html [2] In these cases, it is at level 5, "Do it right or it will always break at runtime." [3] At least mips and powerpc look very similar, and sparc also seems to share a common ancestor with both; there seems to be quite some degree of copy-and-paste coding here. The include/asm/highmem.h file for these three archs mention x86 CPUs at its top. [4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html [5] As an aside, could someone tell me why mn10300 uses unsigned long as the first parameter of kunmap_atomic() instead of void *? Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> Cc: Russell King <linux@arm.linux.org.uk> (arch/arm) Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips) Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300) Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300) Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc) Cc: Helge Deller <deller@gmx.de> (arch/parisc) Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc) Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc) Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc) Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86) Cc: Ingo Molnar <mingo@redhat.com> (arch/x86) Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86) Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic) Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:44:54 -07:00
Linus Torvalds	cdd854bc42	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (79 commits) powerpc/8xx: Add support for the MPC8xx based boards from TQC powerpc/85xx: Introduce support for the Freescale P1022DS reference board powerpc/85xx: Adding DTS for the STx GP3-SSA MPC8555 board powerpc/85xx: Change deprecated binding for 85xx-based boards powerpc/tqm85xx: add a quirk for ti1520 PCMCIA bridge powerpc/tqm85xx: update PCI interrupt-map attribute powerpc/mpc8308rdb: support for MPC8308RDB board from Freescale powerpc/fsl_pci: add quirk for mpc8308 pcie bridge powerpc/85xx: Cleanup QE initialization for MPC85xxMDS boards powerpc/85xx: Fix booting for P1021MDS boards powerpc/85xx: Fix SWIOTLB initalization for MPC85xxMDS boards powerpc/85xx: kexec for SMP 85xx BookE systems powerpc/5200/i2c: improve i2c bus error recovery of/xilinxfb: update tft compatible versions powerpc/fsl-diu-fb: Support setting display mode using EDID powerpc/5121: doc/dts-bindings: update doc of FSL DIU bindings powerpc/5121: shared DIU framebuffer support powerpc/5121: move fsl-diu-fb.h to include/linux powerpc/5121: fsl-diu-fb: fix issue with re-enabling DIU area descriptor powerpc/512x: add clock structure for Video-IN (VIU) unit ...	2010-08-05 09:03:46 -07:00
Benjamin Herrenschmidt	4734b594c6	memblock: Remove memblock_type.size and add memblock.memory_size instead Right now, both the "memory" and "reserved" memblock_type structures have a "size" member. It represents the calculated memory size in the former case and is unused in the latter. This moves it out to the main memblock structure instead Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:11 +10:00
Benjamin Herrenschmidt	cd3db0c4ca	memblock: Remove rmo_size, burry it in arch/powerpc where it belongs The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact server ppc64 though I hijack it on embedded ppc64 for similar purposes) and represents the area of memory that can be accessed in real mode (aka with MMU off), or on embedded, from the exception vectors (which is bolted in the TLB) which pretty much boils down to the same thing. We take that out of the generic MEMBLOCK data structure and move it into arch/powerpc where it belongs, renaming it to "RMA" while at it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:08 +10:00
Benjamin Herrenschmidt	e63075a3c9	memblock: Introduce default allocation limit and use it to replace explicit ones This introduce memblock.current_limit which is used to limit allocations from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE). The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still be used with memblock_alloc_base() to allocate really anywhere. It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears. Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I strongly recommend that you ensure that you set an appropriate limit during boot in order to guarantee that an memblock_alloc() at any time results in something that is accessible with a simple __va(). The reason is that a subsequent patch will introduce the ability for the array to resize itself by reallocating itself. The MEMBLOCK core will honor the current limit when performing those allocations. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:07 +10:00
Benjamin Herrenschmidt	27f574c223	memblock: Expose MEMBLOCK_ALLOC_ANYWHERE Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-05 12:56:06 +10:00
Benjamin Herrenschmidt	28be7072ce	memblock/powerpc: Use new accessors Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-04 14:39:01 +10:00
Benjamin Herrenschmidt	e3239ff92a	memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-08-04 14:21:49 +10:00
Benjamin Herrenschmidt	412a4ac5e9	Merge commit 'gcl/next' into next	2010-08-04 10:26:03 +10:00
Benjamin Herrenschmidt	3fdfd99051	powerpc: Fix erroneous lmb->memblock conversions Oooops... we missed these. We incorrectly converted strings used when parsing the device-tree on pseries, thus breaking access to drconf memory and hotplug memory. While at it, also revert some variable names that represent something the FW calls "lmb" and thus don't need to be converted to "memblock". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---	2010-07-23 12:56:57 +10:00
Benjamin Herrenschmidt	4b8692c022	powerpc/mm: Add some debug output when hash insertion fails This adds some debug output to our MMU hash code to print out some useful debug data if the hypervisor refuses the insertion (which should normally never happen). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---	2010-07-23 12:56:56 +10:00
Benjamin Herrenschmidt	171aa2caaa	powerpc/mm: Fix bugs in huge page hashing There's a couple of nasty bugs lurking in our huge page hashing code. First, we don't check the access permission atomically with setting the _PAGE_BUSY bit, which means that the PTE value we end up using for the hashing might be different than the one we have checked the access permissions for. We've seen cases where that leads us to try to use an invalidated PTE for hashing, causing all sort of "interesting" issues. Then, we also failed to set _PAGE_DIRTY on a write access. Finally, a minor tweak but we should return 0 when we find the PTE busy, in order to just re-execute the access, rather than 1 which means going to do_page_fault(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> ---	2010-07-23 12:55:21 +10:00
Benjamin Herrenschmidt	ca91e6c09d	powerpc/mm: Move around testing of _PAGE_PRESENT in hash code Instead of adding _PAGE_PRESENT to the access permission mask in each low level routine independently, we add it once from hash_page(). We also move the preliminary access check (the racy one before the PTE is locked) up so it applies to the huge page case. This duplicates code in __hash_page_huge() which we'll remove in a subsequent patch to fix a race in there. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-23 08:53:23 +10:00
Anton Blanchard	b1623e7eb2	powerpc/mm: Handle hypervisor pte insert failure in __hash_page_huge If the hypervisor gives us an error on a hugepage insert we panic. The normal page code already handles this by returning an error instead and we end calling low_hash_fault which will just kill the task if possible. The patch below does a similar thing for the hugepage case. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-23 08:44:51 +10:00
Yinghai Lu	95f72d1ed4	lmb: rename to memblock via following scripts FILES=$(find * -type f \| grep -vE 'oprofile\|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N \| sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: Ingo Molnar <mingo@elte.hu> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-14 17:14:00 +10:00
Benjamin Herrenschmidt	f2b26c9235	powerpc/book3e: Adjust the page sizes list based on MMU config Use the MMU config registers to scan for available direct and indirect page sizes and print out the result. Will be needed for future hugetlbfs implementation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-14 14:13:53 +10:00
Benjamin Herrenschmidt	ff82c319e6	powerpc/book3e: Fix single step when using HW page tables We patch the TLB miss exception vectors to point to alternate functions when using HW page table on BookE. However, we were patching in a new branch in the first instruction of the exception handler instead of the second one, thus overriding the nop that is in the first instruction. This cause problems when single stepping as we rely on that nop for the single step to stop properly within the exception vector range rather than on the target of the branch. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-14 14:13:51 +10:00
Christoph Egger	cccd234283	powerpc: Removing dead CONFIG_SMP_750 CONFIG_SMP_750 doesn't exist in Kconfig, therefore removing all references for it from the source code. Signed-off-by: Christoph Egger <siccegge@cs.fau.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 11:28:38 +10:00
Anton Blanchard	41eab6f88f	powerpc/numa: Use form 1 affinity to setup node distance Form 1 affinity allows multiple entries in ibm,associativity-reference-points which represent affinity domains in decreasing order of importance. The Linux concept of a node is always the first entry, but using the other values as an input to node_distance() allows the memory allocator to make better decisions on which node to go first when local memory has been exhausted. We keep things simple and create an array indexed by NUMA node, capped at 4 entries. Each time we lookup an associativity property we initialise the array which is overkill, but since we should only hit this path during boot it didn't seem worth adding a per node valid bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 11:28:35 +10:00
Paul E. McKenney	a591f6b56d	powerpc: Remove all rcu head initializations Remove all rcu head inits. We don't care about the RCU head state before passing it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can keep track of objects on stack. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 11:28:34 +10:00
Becky Bruce	d10ac3734d	powerpc/fsl-booke: Fix comments in mmu code that mention BATS There are no BATS on BookE - we have the TLBCAM instead. Also correct the page size information to included extended sizes. We don't actually allow a 4G page size to be used, so comment on that as well. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-07-09 11:28:26 +10:00
Christoph Egger	8054a3428f	powerpc: Remove dead CONFIG_HIGHPTE CONFIG_HIGHPTE doesn't exist in Kconfig, therefore removing all references for it from the source code. Signed-off-by: Christoph Egger <siccegge@cs.fau.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-06-15 15:02:32 +10:00
Linus Torvalds	98edb6ca41	Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits) KVM: x86: Add missing locking to arch specific vcpu ioctls KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls KVM: MMU: Segregate shadow pages with different cr0.wp KVM: x86: Check LMA bit before set_efer KVM: Don't allow lmsw to clear cr0.pe KVM: Add cpuid.txt file KVM: x86: Tell the guest we'll warn it about tsc stability x86, paravirt: don't compute pvclock adjustments if we trust the tsc x86: KVM guest: Try using new kvm clock msrs KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID KVM: x86: add new KVMCLOCK cpuid feature KVM: x86: change msr numbers for kvmclock x86, paravirt: Add a global synchronization point for pvclock x86, paravirt: Enable pvclock flags in vcpu_time_info structure KVM: x86: Inject #GP with the right rip on efer writes KVM: SVM: Don't allow nested guest to VMMCALL into host KVM: x86: Fix exception reinjection forced to true KVM: Fix wallclock version writing race KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) ...	2010-05-21 17:16:21 -07:00
Anton Blanchard	bc8449cc57	powerpc/numa: Use ibm,architecture-vec-5 to detect form 1 affinity I've been told that the architected way to determine we are in form 1 affinity mode is by reading the ibm,architecture-vec-5 property which mirrors the layout of the fifth vector of the ibm,client-architecture structure. Eventually we may want to parse the ibm,architecture-vec-5 and create FW_FEATURE_* bits. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-21 17:31:12 +10:00
Kumar Gala	78f622377f	powerpc/fsl-booke: Move loadcam_entry back to asm code to fix SMP ftrace When we build with ftrace enabled its possible that loadcam_entry would have used the stack pointer (even though the code doesn't need it). We call loadcam_entry in __secondary_start before the stack is setup. To ensure that loadcam_entry doesn't use the stack pointer the easiest solution is to just have it in asm code. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-05-17 10:56:20 -05:00
Alexander Graf	c83ec269e6	PPC: Split context init/destroy functions We need to reserve a context from KVM to make sure we have our own segment space. While we did that split for Book3S_64 already, 32 bit is still outstanding. So let's split it now. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-05-17 12:18:20 +03:00
Benjamin Herrenschmidt	1ed31d6db9	Merge commit 'origin/master' into next	2010-05-07 11:29:25 +10:00
Anton Blanchard	25863de07a	powerpc/cpumask: Convert NUMA code to new cpumask API Convert NUMA code to new cpumask API. We shift the node to cpumask setup code until after we complete bootmem allocation so we can dynamically allocate the cpumasks. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-06 17:41:58 +10:00
Benjamin Herrenschmidt	e460c2c91a	powerpc: Invoke oom-killer from page fault As explained in commit `1c0fe6e3bd`, we want to call the architecture independent oom killer when getting an unexplained OOM from handle_mm_fault, rather than simply killing current. Cc: linuxppc-dev@ozlabs.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linux-arch@vger.kernel.org Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-06 17:15:58 +10:00
Mark Nelson	91eea67c6d	powerpc/mm: Track backing pages allocated by vmemmap_populate() We need to keep track of the backing pages that get allocated by vmemmap_populate() so that when we use kdump, the dump-capture kernel knows where these pages are. We use a simple linked list of structures that contain the physical address of the backing page and corresponding virtual address to track the backing pages. To save space, we just use a pointer to the next struct vmemmap_backing. We can also do this because we never remove nodes. We call the pointer "list" to be compatible with changes made to the crash utility. vmemmap_populate() is called either at boot-time or on a memory hotplug operation. We don't have to worry about the boot-time calls because they will be inherently single-threaded, and for a memory hotplug operation vmemmap_populate() is called through: sparse_add_one_section() \| V kmalloc_section_memmap() \| V sparse_mem_map_populate() \| V vmemmap_populate() and in sparse_add_one_section() we're protected by pgdat_resize_lock(). So, we don't need a spinlock to protect the vmemmap_list. We allocate space for the vmemmap_backing structs by allocating whole pages in vmemmap_list_alloc() and then handing out chunks of this to vmemmap_list_populate(). This means that we waste at most just under one page, but this keeps the code is simple. Signed-off-by: Mark Nelson <markn@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-06 16:49:27 +10:00
Benjamin Herrenschmidt	75c1d539ea	powerpc: Fix CONFIG_DEBUG_PAGEALLOC on 603/e300 So we tried to speed things up a bit using flush_hash_pages() directly but that falls over on 603 of course meaning we fail to flush the TLB properly and we may even end up having it corrupt memory randomly by accessing a hash table that doesn't exist. This removes the "optimization" by always going through flush_tlb_page() for now at least. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-05-06 16:49:26 +10:00
Dave Kleikamp	e7f75ad01d	powerpc/47x: Base ppc476 support This patch adds the base support for the 476 processor. The code was primarily written by Ben Herrenschmidt and Torez Smith, but I've been maintaining it for a while. The goal is to have a single binary that will run on 44x and 47x, but we still have some details to work out. The biggest is that the L1 cache line size differs on the two platforms, but it's currently a compile-time option. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Torez Smith <lnxtorez@linux.vnet.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-05-05 09:11:10 -04:00
Linus Torvalds	b18262eda3	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: don't needlessly skip PAGE_USER test for Fsl booke	2010-04-29 20:01:42 -07:00
Wufei	56151e7534	kgdb: don't needlessly skip PAGE_USER test for Fsl booke The bypassing of this test is a leftover from 2.4 vintage kernels, and is no longer appropriate, or even used by KGDB. Currently KGDB uses probe_kernel_write() for all access to memory via the KGDB core, so it can simply be deleted. This fixes CVE-2010-1446. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Wufei <fei.wu@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-04-29 21:41:44 -05:00
Anton Blanchard	4b83c330b4	powerpc/numa: Add form 1 NUMA affinity Firmware changed the way it represents memory and cpu affinity on POWER7. Unfortunately the old method now caps the topology to work around issues with legacy operating systems. For Linux to get the correct topology we need to use the new form 1 affinity information. We set the form 1 field in the client architecture, and if we see "1" in the ibm,associativity-form property firmware supports form 1 affinity and we should look at the first field in the ibm,associativity-reference-points array. If not we use the second field as we always have. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-04-28 16:22:33 +10:00
Becky Bruce	e8137341b1	powerpc/fsl_booke: Correct test for MMU_FTR_BIG_PHYS The code was looking for this in cpu_features, not mmu_features. Fix this. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2010-04-19 23:12:44 -05:00
K.Prasad	9c7cc234dc	powerpc: Disable interrupts for data breakpoint exceptions Data address breakpoint exceptions are currently handled along with page-faults which require interrupts to remain in enabled state. Since exception handling for data breakpoints aren't pre-empt safe, we handle them separately. Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-04-07 14:44:38 +10:00
Benjamin Herrenschmidt	55052eeca6	powerpc: Fix ioremap_flags() with book3e pte definition We can't just clear the user read permission in book3e pte, because that will also clear supervisor read permission. This surely isn't desired. Fix the problem by adding the supervisor read back. BenH: Slightly simplified the ifdef and applied to ppc64 too Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-04-07 14:39:47 +10:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
FUJITA Tomonori	a93272969c	powerpc: Fix swiotlb to respect the boot option powerpc initializes swiotlb before parsing the kernel boot options so swiotlb options (e.g. specifying the swiotlb buffer size) are ignored. Any time before freeing bootmem works for swiotlb so this patch moves powerpc's swiotlb initialization after parsing the kernel boot options, mem_init (as x86 does). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Tested-by: Becky Bruce <beckyb@kernel.crashing.org> Tested-by: Albert Herranz <albert_herranz@yahoo.es> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-03-19 16:38:16 +11:00
Jiri Kosina	318ae2edc3	Merge branch 'for-next' into for-linus Conflicts: Documentation/filesystems/proc.txt arch/arm/mach-u300/include/mach/debug-macro.S drivers/net/qlge/qlge_ethtool.c drivers/net/qlge/qlge_main.c drivers/net/typhoon.c	2010-03-08 16:55:37 +01:00
H Hartley Sweeten	72c3368856	nodemask.h: remove macro any_online_node The macro any_online_node() is prone to producing sparse warnings due to the local symbol 'node'. Since all the in-tree users are really requesting the first online node (the mask argument is either NODE_MASK_ALL or node_online_map) just use the first_online_node macro and remove the any_online_node macro since there are no users. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Milton Miller <miltonm@bga.com> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Geoff Levand <geoffrey.levand@am.sony.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: David S. Miller <davem@davemloft.net> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-03-06 11:26:31 -08:00
Linus Torvalds	ac0f6f927d	Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (100 commits) ARM: Eliminate decompressor -Dstatic= PIC hack ARM: 5958/1: ARM: U300: fix inverted clk round rate ARM: 5956/1: misplaced parentheses ARM: 5955/1: ep93xx: move timer defines into core.c and document ARM: 5954/1: ep93xx: move gpio interrupt support to gpio.c ARM: 5953/1: ep93xx: fix broken build of clock.c ARM: 5952/1: ARM: MM: Add ARM_L1_CACHE_SHIFT_6 for handle inside each ARCH Kconfig ARM: 5949/1: NUC900 add gpio virtual memory map ARM: 5948/1: Enable timer0 to time4 clock support for nuc910 ARM: 5940/2: ARM: MMCI: remove custom DBG macro and printk ARM: make_coherent(): fix problems with highpte, part 2 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself ARM: 5945/1: ep93xx: include correct irq.h in core.c ARM: 5933/1: amba-pl011: support hardware flow control ARM: 5930/1: Add PKMAP area description to memory.txt. ARM: 5929/1: Add checks to detect overlap of memory regions. ARM: 5928/1: Change type of VMALLOC_END to unsigned long. ARM: 5927/1: Make delimiters of DMA area globally visibly. ARM: 5926/1: Add "Virtual kernel memory..." printout. ARM: 5920/1: OMAP4: Enable L2 Cache ... Fix up trivial conflict in arch/arm/mach-mx25/clock.c	2010-03-01 09:15:15 -08:00
Russell King	4b3073e1c5	MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself On VIVT ARM, when we have multiple shared mappings of the same file in the same MM, we need to ensure that we have coherency across all copies. We do this via make_coherent() by making the pages uncacheable. This used to work fine, until we allowed highmem with highpte - we now have a page table which is mapped as required, and is not available for modification via update_mmu_cache(). Ralf Beache suggested getting rid of the PTE value passed to update_mmu_cache(): On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables to construct a pointer to the pte again. Passing a pte_t * is much more elegant. Maybe we might even replace the pte argument with the pte_t? Ben Herrenschmidt would also like the pte pointer for PowerPC: Passing the ptep in there is exactly what I want. I want that -instead- of the PTE value, because I have issue on some ppc cases, for I$/D$ coherency, where set_pte_at() may decide to mask out the _PAGE_EXEC. So, pass in the mapped page table pointer into update_mmu_cache(), and remove the PTE value, updating all implementations and call sites to suit. Includes a fix from Stephen Rothwell: sparc: fix fallout from update_mmu_cache API change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2010-02-20 16:41:46 +00:00
Thomas Gleixner	3eb93c558a	powerpc: Convert tlbivax_lock to raw_spinlock tlbivax_lock needs to be a real spinlock in RT. Convert it to raw_spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-19 14:52:33 +11:00
Thomas Gleixner	6b9c9b8a66	powerpc: Convert native_tlbie_lock to raw_spinlock native_tlbie_lock needs to be a real spinlock in RT. Convert it to raw_spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-19 14:52:30 +11:00
Thomas Gleixner	be833f3371	powerpc: Convert context_lock to raw_spinlock context_lock needs to be a real spinlock in RT. Convert it to raw_spinlock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-19 14:52:30 +11:00
Benjamin Herrenschmidt	efd0f0f385	Merge commit 'jwb/next' into next	2010-02-18 09:34:38 +11:00
Anton Blanchard	66d99b8834	powerpc: Convert open coded native hashtable bit lock Now we have real bit locks use them instead of open coding it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-17 14:03:15 +11:00
Benjamin Herrenschmidt	ec144a81ad	Merge commit 'origin/master' into next	2010-02-17 10:00:42 +11:00
Stefan Roese	c7b6669812	powerpc/40x: Add support for PPC40x boards with > 512MB SDRAM This patch adds support for boards with more that 512MByte RAM. Currently only 512MB of memory are enabled in the DCCR/ICCR real-mode cache control registers. This patch now enables caching in real-mode for 2GByte. Signed-off-by: Stefan Roese <sr@denx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2010-02-12 07:54:45 -05:00
David Gibson	77058e1adc	powerpc: Fix address masking bug in hpte_need_flush() Commit `f71dc176aa` 'Make hpte_need_flush() correctly mask for multiple page sizes' introduced bug, which is triggered when a kernel with a 64k base page size is run on a system whose hardware does not 64k hash PTEs. In this case, we emulate 64k pages with multiple 4k hash PTEs, however in hpte_need_flush() we incorrectly only mask the hardware page size from the address, instead of the logical page size. This causes things to go wrong when we later attempt to iterate through the hardware subpages of the logical page. This patch corrects the error. It has been tested on pSeries bare metal by Michael Neuling. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-10 13:58:06 +11:00
Anton Blanchard	7317ac8711	powerpc: Convert mmu context allocator from idr to ida We can use the much more lightweight ida allocator since we don't need the pointer storage idr provides. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-09 13:56:07 +11:00
Uwe Kleine-König	9ddc5b6f18	tree-wide: fix typos "ammount" -> "amount" Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-05 12:22:40 +01:00
Thadeu Lima de Souza Cascardo	2273130de8	fix comment typo leve -> level in powerpc Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-02-05 12:22:38 +01:00
Thadeu Lima de Souza Cascardo	6c504d4231	powerpc: Fix typo s/leve/level/ in TLB code Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-02-03 17:39:50 +11:00
Jiri Slaby	4bf936b9e4	powerpc: Use helpers for rlimits Make sure compiler won't do weird things with limits. E.g. fetching them twice may return 2 different values after writable limits are implemented. I.e. either use rlimit helpers added in `3e10e716ab` or ACCESS_ONCE if not applicable. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2010-01-15 13:20:08 +11:00
David Gibson	a1128f8f0f	powerpc/mm: Fix stupid bug in subpge protection handling Commit `d28513bc7f` ("Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for breakage caused by an earlier clean up patch of mine, contains a stupid bug. I changed the parameters of the subpage_protection() function, but failed to update one of the callers. This patch fixes it, and replaces a void * with a typed pointer so that the compiler will warn on such an error in future. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-18 14:55:44 +11:00
Yang Li	f04b10cddb	powerpc/mm: Fix typo of cpumask_clear_cpu() The function name of cpumask_clear_cpu was not correct. Fortunately nobody uses that code with hotplug yet :-) Reported-by: Jin Qing <b24347@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-18 14:54:27 +11:00
Sachin P. Sant	5c33991987	powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled. This time without the funny characters. Fix following build errors generated with DEBUG=1 cc1: warnings being treated as errors arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes': arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize': arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' ... SNIP ... Signed-off-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-18 14:54:27 +11:00
Benjamin Herrenschmidt	50891457f1	powerpc/mm: Fix a WARN_ON() with CONFIG_DEBUG_PAGEALLOC and CONFIG_DEBUG_VM Set need to call __set_pte_at() and not set_pte_at() from __change_page_attr() since the later will perform checks with CONFIG_DEBUG_VM that aren't suitable to the way we override an existing PTE. (More specifically, it doesn't let you write over a present PTE). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-18 14:54:26 +11:00
Linus Torvalds	a73611b6aa	Merge branch 'next' of git://git.secretlab.ca/git/linux-2.6 * 'next' of git://git.secretlab.ca/git/linux-2.6: (23 commits) powerpc: fix up for mmu_mapin_ram api change powerpc: wii: allow ioremap within the memory hole powerpc: allow ioremap within reserved memory regions wii: use both mem1 and mem2 as ram wii: bootwrapper: add fixup to calc useable mem2 powerpc: gamecube/wii: early debugging using usbgecko powerpc: reserve fixmap entries for early debug powerpc: wii: default config powerpc: wii: platform support powerpc: wii: hollywood interrupt controller support powerpc: broadway processor support powerpc: wii: bootwrapper bits powerpc: wii: device tree powerpc: gamecube: default config powerpc: gamecube: platform support powerpc: gamecube/wii: flipper interrupt controller support powerpc: gamecube/wii: udbg support for usbgecko powerpc: gamecube/wii: do not include PCI support powerpc: gamecube/wii: declare as non-coherent platforms powerpc: gamecube/wii: introduce GAMECUBE_COMMON ... Fix up conflicts in arch/powerpc/mm/fsl_booke_mmu.c. Hopefully even close to correctly.	2009-12-16 13:26:53 -08:00
Stephen Rothwell	ae4cec4736	powerpc: fix up for mmu_mapin_ram api change Today's linux-next build (powerpc ppc44x_defconfig) failed like this: arch/powerpc/mm/pgtable_32.c: In function 'mapin_ram': arch/powerpc/mm/pgtable_32.c:318: error: too many arguments to function 'mmu_mapin_ram' Casued by commit `de32400dd2` ("wii: use both mem1 and mem2 as ram"). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2009-12-14 09:04:24 -07:00
Albert Herranz	c5df7f7751	powerpc: allow ioremap within reserved memory regions Add a flag to let a platform ioremap memory regions marked as reserved. This flag will be used later by the Nintendo Wii support code to allow ioremapping the I/O region sitting between MEM1 and MEM2 and marked as reserved RAM in the patch "wii: use both mem1 and mem2 as ram". This will no longer be needed when proper discontig memory support for 32-bit PowerPC is added to the kernel. Signed-off-by: Albert Herranz <albert_herranz@yahoo.es> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2009-12-12 22:24:32 -07:00
Albert Herranz	de32400dd2	wii: use both mem1 and mem2 as ram The Nintendo Wii video game console has two discontiguous RAM regions: - MEM1: 24MB @ 0x00000000 - MEM2: 64MB @ 0x10000000 Unfortunately, the kernel currently does not support discontiguous RAM memory regions on 32-bit PowerPC platforms. This patch adds a series of workarounds to allow the use of the second memory region (MEM2) as RAM by the kernel. Basically, a single range of memory from the beginning of MEM1 to the end of MEM2 is reported to the kernel, and a memory reservation is created for the hole between MEM1 and MEM2. With this patch the system is able to use all the available RAM and not just ~27% of it. This will no longer be needed when proper discontig memory support for 32-bit PowerPC is added to the kernel. Signed-off-by: Albert Herranz <albert_herranz@yahoo.es> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2009-12-12 22:24:31 -07:00
Joakim Tjernlund	5efab4a02c	powerpc/8xx: Invalidate non present TLBs 8xx sometimes need to load a invalid/non-present TLBs in it DTLB asm handler. These must be invalidated separaly as linux mm don't. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-09 17:10:35 +11:00
David Gibson	d28513bc7f	powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT Commit `a0668cdc15` cleans up the handling of kmem_caches for allocating various levels of pagetables. Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to the latter's cleverly hidden technique of adding some extra allocation space to the top level page directory to store the extra information it needs. Since that extra allocation really doesn't fit into the cleaned up page directory allocating scheme, this patch alters CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct subpage_prot_table as part of the mm_context_t. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-08 15:59:33 +11:00
Benjamin Herrenschmidt	5a7b4193e5	Revert "powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT" This reverts commit `c045256d14`. It breaks build when CONFIG_PPC_SUBPAGE_PROT is not set. I will commit a fixed version separately Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-02 09:28:35 +11:00
David Gibson	39adfa540f	powerpc/mm: Fix bug in gup_hugepd() Commit `a4fe3ce769` introduced a new get_user_pages() path for hugepages on powerpc. Unfortunately, there is a bug in it's loop logic, which can cause it to overrun the end of the intended region. This came about by copying the logic from the normal page path, which assumes the address and end parameters have been pagesize aligned at the top-level. Since they're not hugepage size aligned, the simplistic logic could step over the end of the gup region without triggering the loop end condition. This patch fixes the bug by using the technique that the normal page path uses in levels above the lowest to truncate the ending address to something we know we'll match with. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-27 14:24:30 +11:00
David Gibson	c045256d14	powerpc/mm: Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT Commit `a0668cdc15` cleans up the handling of kmem_caches for allocating various levels of pagetables. Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to the latter's cleverly hidden technique of adding some extra allocation space to the top level page directory to store the extra information it needs. Since that extra allocation really doesn't fit into the cleaned up page directory allocating scheme, this patch alters CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct subpage_prot_table as part of the mm_context_t. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-27 14:24:29 +11:00
Kumar Gala	8b27f0b61d	powerpc/fsl-booke: Rework TLB CAM code Re-write the code so its more standalone and fixed some issues: * Bump'd # of CAM entries to 64 to support e500mc * Make the code handle MAS7 properly * Use pr_cont instead of creating a string as we go Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-11-20 16:45:33 -06:00
Benjamin Herrenschmidt	0526484aa3	Merge commit 'origin/master' into next	2009-11-12 10:59:04 +11:00
Alexander Graf	e85a47106a	Split init_new_context and destroy_context For KVM we need to allocate a new context id, but don't really care about all the mm context around it. So let's split the alloc and destroy functions for the context id, so we can grab one without allocating an mm context. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-05 16:50:25 +11:00
Alexander Graf	4ab79aa801	Export symbols for KVM module We want to be able to build KVM as a module. To enable us doing so, we need some more exports from core Linux parts. This patch exports all functions and variables that are required for KVM. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-05 16:50:24 +11:00
Benjamin Herrenschmidt	f1167fb318	powerpc/mm: Remove debug context clamping from nohash code I inadvertently left that debug code enabled, causing the number of contexts to be clamped to 31 which is going to slow things down on 4xx and just plain breaks 8xx Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-11-05 16:41:59 +11:00
David Gibson	0895ecda79	powerpc/mm: Bring hugepage PTE accessor functions back into sync with normal accessors The hugepage arch code provides a number of hook functions/macros which mirror the functionality of various normal page pte access functions. Various changes in the normal page accessors (in particular BenH's recent changes to the handling of lazy icache flushing and PAGE_EXEC) have caused the hugepage versions to get out of sync with the originals. In some cases, this is a bug, at least on some MMU types. One of the reasons that some hooks were not identical to the normal page versions, is that the fact we're dealing with a hugepage needed to be passed down do use the correct dcache-icache flush function. This patch makes the main flush_dcache_icache_page() function hugepage aware (by checking for the PageCompound flag). That in turn means we can make set_huge_pte_at() just a call to set_pte_at() bringing it back into sync. As a bonus, this lets us remove the hash_huge_page_do_lazy_icache() function, replacing it with a call to the hash_page_do_lazy_icache() function it was based on. Some other hugepage pte access hooks - huge_ptep_get_and_clear() and huge_ptep_clear_flush() - are not so easily unified, but this patch at least brings them back into sync with the current versions of the corresponding normal page functions. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:21:23 +11:00
David Gibson	883a3e5236	powerpc/mm: Split hash MMU specific hugepage code into a new file This patch separates the parts of hugetlbpage.c which are inherently specific to the hash MMU into a new hugelbpage-hash64.c file. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:59 +11:00
David Gibson	d1837cba5d	powerpc/mm: Cleanup initialization of hugepages on powerpc This patch simplifies the logic used to initialize hugepages on powerpc. The somewhat oddly named set_huge_psize() is renamed to add_huge_page_size() and now does all necessary verification of whether it's given a valid hugepage sizes (instead of just some) and instantiates the generic hstate structure (but no more). hugetlbpage_init() now steps through the available pagesizes, checks if they're valid for hugepages by calling add_huge_page_size() and initializes the kmem_caches for the hugepage pagetables. This means we can now eliminate the mmu_huge_psizes array, since we no longer need to pass the sizing information for the pagetable caches from set_huge_psize() into hugetlbpage_init() Determination of the default huge page size is also moved from the hash code into the general hugepage code. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:58 +11:00
David Gibson	a4fe3ce769	powerpc/mm: Allow more flexible layouts for hugepage pagetables Currently each available hugepage size uses a slightly different pagetable layout: that is, the bottem level table of pointers to hugepages is a different size, and may branch off from the normal page tables at a different level. Every hugepage aware path that needs to walk the pagetables must therefore look up the hugepage size from the slice info first, and work out the correct way to walk the pagetables accordingly. Future hardware is likely to add more possible hugepage sizes, more layout options and more mess. This patch, therefore reworks the handling of hugepage pagetables to reduce this complexity. In the new scheme, instead of having to consult the slice mask, pagetable walking code can check a flag in the PGD/PUD/PMD entries to see where to branch off to hugepage pagetables, and the entry also contains the information (eseentially hugepage shift) necessary to then interpret that table without recourse to the slice mask. This scheme can be extended neatly to handle multiple levels of self-describing "special" hugepage pagetables, although for now we assume only one level exists. This approach means that only the pagetable allocation path needs to know how the pagetables should be set out. All other (hugepage) pagetable walking paths can just interpret the structure as they go. There already was a flag bit in PGD/PUD/PMD entries for hugepage directory pointers, but it was only used for debug. We alter that flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable pointer (normally it would be 1 since the pointer lies in the linear mapping). This means that asm pagetable walking can test for (and punt on) hugepage pointers with the same test that checks for unpopulated page directory entries (beq becomes bge), since hugepage pointers will always be positive, and normal pointers always negative. While we're at it, we get rid of the confusing (and grep defeating) #defining of hugepte_shift to be the same thing as mmu_huge_psizes. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:58 +11:00
David Gibson	a0668cdc15	powerpc/mm: Cleanup management of kmem_caches for pagetables Currently we have a fair bit of rather fiddly code to manage the various kmem_caches used to store page tables of various levels. We generally have two caches holding some combination of PGD, PUD and PMD tables, plus several more for the special hugepage pagetables. This patch cleans this all up by taking a different approach. Rather than the caches being designated as for PUDs or for hugeptes for 16M pages, the caches are simply allocated to be a specific size. Thus sharing of caches between different types/levels of pagetables happens naturally. The pagetable size, where needed, is passed around encoded in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the pagetable contains 2^n pointers. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:57 +11:00
David Gibson	f71dc176aa	powerpc/mm: Make hpte_need_flush() correctly mask for multiple page sizes Currently, hpte_need_flush() only correctly flushes the given address for normal pages. Callers for hugepages are required to mask the address themselves. But hpte_need_flush() already looks up the page sizes for its own reasons, so this is a rather silly imposition on the callers. This patch alters it to mask based on the pagesize it has looked up itself, and removes the awkward masking code in the hugepage caller. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-30 17:20:57 +11:00
Benjamin Herrenschmidt	8d8997f34e	powerpc/mm: Fix hang accessing top of vmalloc space On pSeries, we always force the IO space to be mapped using 4K pages even with a 64K base page size to cope with some limitations in the HV interface to some devices. However, the SLB miss handler code to discriminate between vmalloc and ioremap space uses a CPU feature section such that the code is nop'ed out when the processor support large pages non-cachable mappings. Thus, we end up always using the ioremap page size for vmalloc segments on such processors, causing a discrepency between the segment and the hash table, and thus a hang continously hashing the page. It works for the first segment of the vmalloc space since that segment is "bolted" in by C code correctly, and thankfully we almost never use the vmalloc space beyond the first segment, but the new percpu code made the bug happen. This fixes it by removing the feature section from the assembly, we now always do the comparison between vmalloc and ioremap. Signed-off-by; Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-10-14 16:58:36 +11:00
Rex Feany	e0908085fc	powerpc/8xx: Fix regression introduced by cache coherency rewrite After upgrading to the latest kernel on my mpc875 userspace started running incredibly slow (hours to get to a shell, even!). I tracked it down to commit `8d30c14cab`, that patch removed a work-around for the 8xx. Adding it back makes my problem go away. Signed-off-by: Rex Feany <rfeany@mrv.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-09-24 15:56:30 +10:00
Huang Weiyi	b9eceb2307	powerpc/mm: Remove duplicated #include Remove duplicated #include('s) in arch/powerpc/mm/tlb_low_64e.S Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-09-24 15:31:42 +10:00
KAMEZAWA Hiroyuki	3089aa1b0c	kcore: use registerd physmem information For /proc/kcore, each arch registers its memory range by kclist_add(). In usual, - range of physical memory - range of vmalloc area - text, etc... are registered but "range of physical memory" has some troubles. It doesn't updated at memory hotplug and it tend to include unnecessary memory holes. Now, /proc/iomem (kernel/resource.c) includes required physical memory range information and it's properly updated at memory hotplug. Then, it's good to avoid using its own code(duplicating information) and to rebuild kclist for physical memory based on /proc/iomem. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	908eedc616	walk system ram range Originally, walk_memory_resource() was introduced to traverse all memory of "System RAM" for detecting memory hotplug/unplug range. For doing so, flags of IORESOUCE_MEM\|IORESOURCE_BUSY was used and this was enough for memory hotplug. But for using other purpose, /proc/kcore, this may includes some firmware area marked as IORESOURCE_BUSY \| IORESOUCE_MEM. This patch makes the check strict to find out busy "System RAM". Note: PPC64 keeps their own walk_memory_resouce(), which walk through ppc64's lmb informaton. Because old kclist_add() is called per lmb, this patch makes no difference in behavior, finally. And this patch removes CONFIG_MEMORY_HOTPLUG check from this function. Because pfn_valid() just show "there is memmap or not* and cannot be used for "there is physical memory or not", this function is useful in generic to scan physical memory range. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	a0614da88b	kcore: register vmalloc area in generic way For /proc/kcore, vmalloc areas are registered per arch. But, all of them registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies them. By this. archs which have no kclist_add() hooks can see vmalloc area correctly. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	c30bb2a25f	kcore: add kclist types Presently, kclist_add() only eats start address and size as its arguments. Considering to make kclist dynamically reconfigulable, it's necessary to know which kclists are for System RAM and which are not. This patch add kclist types as KCORE_RAM KCORE_VMALLOC KCORE_TEXT KCORE_OTHER This "type" is used in a patch following this for detecting KCORE_RAM. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
Geert Uytterhoeven	cc013a8890	arches: drop superfluous casts in nr_free_pages() callers Commit `9617729941` ("Drop free_pages()") modified nr_free_pages() to return 'unsigned long' instead of 'unsigned int'. This made the casts to 'unsigned long' in most callers superfluous, so remove them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <zankel@tensilica.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:34 -07:00
Ingo Molnar	cdd6c482c9	perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f \| grep -vE 'oprofile\|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N \| sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-21 14:28:04 +02:00
Linus Torvalds	723e9db7a4	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits) powerpc/nvram: Enable use Generic NVRAM driver for different size chips powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline powerpc/ps3: Workaround for flash memory I/O error powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead powerpc/perf_counters: Reduce stack usage of power_check_constraints powerpc: Fix bug where perf_counters breaks oprofile powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops powerpc/irq: Improve nanodoc powerpc: Fix some late PowerMac G5 with PCIe ATI graphics powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT powerpc/book3e: Add missing page sizes powerpc/pseries: Fix to handle slb resize across migration powerpc/powermac: Thermal control turns system off too eagerly powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan() powerpc/405ex: support cuImage via included dtb powerpc/405ex: provide necessary fixup function to support cuImage powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support. powerpc/44x: Update Arches defconfig powerpc/44x: Update Arches dts ... Fix up conflicts in drivers/char/agp/uninorth-agp.c	2009-09-15 09:51:09 -07:00
Linus Torvalds	ada3fa1505	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c	2009-09-15 09:39:44 -07:00
Brian King	46db2f86a3	powerpc/pseries: Fix to handle slb resize across migration The SLB can change sizes across a live migration, which was not being handled, resulting in possible machine crashes during migration if migrating to a machine which has a smaller max SLB size than the source machine. Fix this by first reducing the SLB size to the minimum possible value, which is 32, prior to migration. Then during the device tree update which occurs after migration, we make the call to ensure the SLB gets updated. Also add the slb_size to the lparcfg output so that the migration tools can check to make sure the kernel has this capability before allowing migration in scenarios where the SLB size will change. BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid breaking ppc32 build Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-09-02 16:19:01 +10:00
Kumar Gala	df5d6ecf81	powerpc/mm: Add MMU features for TLB reservation & Paired MAS registers Support for TLB reservation (or TLB Write Conditional) and Paired MAS registers are optional for a processor implementation so we handle them via MMU feature sections. We currently only used paired MAS registers to access the full RPN + perm bits that are kept in MAS7\|\|MAS3. We assume that if an implementation has hardware page table at this time it also implements in TLB reservations. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-28 14:24:12 +10:00
Benjamin Herrenschmidt	3c2ee2d9f4	Merge commit 'kumar/next' into next	2009-08-27 13:13:41 +10:00
Benjamin Herrenschmidt	ea3cc330ac	powerpc/mm: Cleanup handling of execute permission This is an attempt at cleaning up a bit the way we handle execute permission on powerpc. _PAGE_HWEXEC is gone, _PAGE_EXEC is now only defined by CPUs that can do something with it, and the myriad of #ifdef's in the I$/D$ coherency code is reduced to 2 cases that hopefully should cover everything. The logic on BookE is a little bit different than what it was though not by much. Since now, _PAGE_EXEC will be set by the generic code for executable pages, we need to filter out if they are unclean and recover it. However, I don't expect the code to be more bloated than it already was in that area due to that change. I could boast that this brings proper enforcing of per-page execute permissions to all BookE and 40x but in fact, we've had that now for some time as a side effect of my previous rework in that area (and I didn't even know it :-) We would only enable execute permission if the page was cache clean and we would only cache clean it if we took and exec fault. Since we now enforce that the later only work if VM_EXEC is part of the VMA flags, we de-fact already enforce per-page execute permissions... Unless I missed something Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-27 13:12:51 +10:00
Kumar Gala	fc4bdb35fb	powerpc/booke: Move MMUCSR definition into mmu-book3e.h The MMUCSR is now defined as part of the Book-3E architecture so we can move it into mmu-book3e.h and add some of the additional bits defined by the architecture specs. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-08-24 20:48:05 -05:00
Benjamin Herrenschmidt	4f0dbc2781	Merge commit 'paulus-perf/master' into next	2009-08-20 11:07:56 +10:00
Kumar Gala	797a747a82	powerpc/mm: Fix assert_pte_locked to work properly on uniprocessor Since the pte_lockptr is a spinlock it gets optimized away on uniprocessor builds so using spin_is_locked is not correct. We can use assert_spin_locked instead and get the proper behavior between UP and SMP builds. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:28:32 +10:00
Roel Kluin	8dcd038a13	powerpc/fsl-booke: read buffer overflow cam[tlbcam_index] is checked before tlbcam_index < ARRAY_SIZE(cam) Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:27:12 +10:00
Kumar Gala	67050b5c3e	powerpc/mm: Fix switch_mmu_context to iterate of the proper list of cpus Introduced a temporary variable into our iterating over the list cpus that are threads on the same core. For some reason Ben forgot how for loops work. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:12 +10:00
Benjamin Herrenschmidt	2d27cfd328	powerpc: Remaining 64-bit Book3E support This contains all the bits that didn't fit in previous patches :-) This includes the actual exception handlers assembly, the changes to the kernel entry, other misc bits and wiring it all up in Kconfig. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:11 +10:00
Benjamin Herrenschmidt	32a74949b7	powerpc/mm: Add support for SPARSEMEM_VMEMMAP on 64-bit Book3E The base TLB support didn't include support for SPARSEMEM_VMEMMAP, though we did carve out some virtual space for it, the necessary support code wasn't there. This implements it by using 16M pages for now, though the page size could easily be changed at runtime if necessary. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:10 +10:00
Benjamin Herrenschmidt	25d21ad6e7	powerpc: Add TLB management code for 64-bit Book3E This adds the TLB miss handler assembly, the low level TLB flush routines along with the necessary hook for dealing with our virtual page tables or indirect TLB entries that need to be flushes when PTE pages are freed. There is currently no support for hugetlbfs Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt	a8f7758c1c	powerpc/mm: Move around mmu_gathers definition on 64-bit The definition for the global structure mmu_gathers, used by generic code, is currently defined in multiple places not including anything used by 64-bit Book3E. This changes it by moving to one place common to all processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:09 +10:00
Benjamin Herrenschmidt	57e2a99f74	powerpc: Add memory management headers for new 64-bit BookE This adds the PTE and pgtable format definitions, along with changes to the kernel memory map and other definitions related to implementing support for 64-bit Book3E. This also shields some asm-offset bits that are currently only relevant on 32-bit We also move the definition of the "linux" page size constants to the common mmu.h file and add a few sizes that are relevant to embedded processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:25:06 +10:00
Benjamin Herrenschmidt	c7cc58a1ad	powerpc/mm: Rework & cleanup page table freeing code path That patch used to just add a hook to page table flushing but pulling that string brought out a whole bunch of issues, so it now does that and more: - We now make the RCU batching of page freeing SMP only, as I believe it was intended initially. We make a few more things compile to nothing on !CONFIG_SMP - Some macros are turned into functions, though that forced me to out of line a few stuffs due to unsolvable include depenencies, however it's probably better that way anyway, it's not -that- critical code path. - 32-bit didn't call pte_free_finish() on tlb_flush() which means that it wouldn't push out the batch to RCU for delayed freeing when a bunch of page tables have been freed, they would just stay in there until the batch gets full. 64-bit BookE will use that hook to maintain the virtually linear page tables or the indirect entries in the TLB when using the HW loader. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:24:56 +10:00
Benjamin Herrenschmidt	d4e167da4c	powerpc/mm: Make low level TLB flush ops on BookE take additional args We need to pass down whether the page is direct or indirect and we'll need to pass the page size to _tlbil_va and _tlbivax_bcast We also add a new low level _tlbil_pid_noind() which does a TLB flush by PID but avoids flushing indirect entries if possible This implements those new prototypes but defines them with inlines or macros so that no additional arguments are actually passed on current processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:41 +10:00
Benjamin Herrenschmidt	a245067e20	powerpc/mm: Add support for early ioremap on non-hash 64-bit processors This adds some code to do early ioremap's using page tables instead of bolting entries in the hash table. This will be used by the upcoming 64-bits BookE port. The patch also changes the test for early vs. late ioremap to use slab_is_available() instead of our old hackish mem_init_done. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:40 +10:00
Benjamin Herrenschmidt	fcce810986	powerpc/mm: Add HW threads support to no_hash TLB management The current "no hash" MMU context management code is written with the assumption that one CPU == one TLB. This is not the case on implementations that support HW multithreading, where several linux CPUs can share the same TLB. This adds some basic support for this to our context management and our TLB flushing code. It also cleans up the optional debugging output a bit Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:37 +10:00
Benjamin Herrenschmidt	ee43eb788b	powerpc: Use names rather than numbers for SPRGs (v2) The kernel uses SPRG registers for various purposes, typically in low level assembly code as scratch registers or to hold per-cpu global infos such as the PACA or the current thread_info pointer. We want to be able to easily shuffle the usage of those registers as some implementations have specific constraints realted to some of them, for example, some have userspace readable aliases, etc.. and the current choice isn't always the best. This patch should not change any code generation, and replaces the usage of SPRN_SPRGn everywhere in the kernel with a named replacement and adds documentation next to the definition of the names as to what those are used for on each processor family. The only parts that still use the original numbers are bits of KVM or suspend/resume code that just blindly needs to save/restore all the SPRGs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:27 +10:00
Anton Blanchard	de4376c284	powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can reuse this preload slot by loading in the segment at 0x10000000, where almost all PowerPC binaries are linked at. On a microbenchmark that bounces a token between two 64bit processes over pipes and calls gettimeofday each iteration (to access the VDSO), both the 32bit and 64bit context switch rate improves (tested on a 4GHz POWER6): 32bit: 273k/sec -> 283k/sec 64bit: 277k/sec -> 284k/sec Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:26 +10:00
Anton Blanchard	5eb9bac040	powerpc: Rearrange SLB preload code With the new top down layout it is likely that the pc and stack will be in the same segment, because the pc is most likely in a library allocated via a top down mmap. Right now we bail out early if these segments match. Rearrange the SLB preload code to sanity check all SLB preload addresses are not in the kernel, then check all addresses for conflicts. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-08-20 10:12:25 +10:00
Paul Mackerras	9c1e105238	powerpc: Allow perf_counters to access user memory at interrupt time This provides a mechanism to allow the perf_counters code to access user memory in a PMU interrupt routine. Such an access can cause various kinds of interrupt: SLB miss, MMU hash table miss, segment table miss, or TLB miss, depending on the processor. This commit only deals with 64-bit classic/server processors, which use an MMU hash table. 32-bit processors are already able to access user memory at interrupt time. Since we don't soft-disable on 32-bit, we avoid the possibility of reentering hash_page or the TLB miss handlers, since they run with interrupts disabled. On 64-bit processors, an SLB miss interrupt on a user address will update the slb_cache and slb_cache_ptr fields in the paca. This is OK except in the case where a PMU interrupt occurs in switch_slb, which also accesses those fields. To prevent this, we hard-disable interrupts in switch_slb. Interrupts are already soft-disabled at this point, and will get hard-enabled when they get soft-enabled later. This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice, and to make sure that it clears the slb_cache_ptr when called from other callers than switch_slb, the existing routine is renamed to __slb_flush_and_rebolt, which is called by switch_slb and the new version of slb_flush_and_rebolt. Similarly, switch_stab (used on POWER3 and RS64 processors) gets a hard_irq_disable() to protect the per-cpu variables used there and in ste_allocate. If a MMU hashtable miss interrupt occurs, normally we would call hash_page to look up the Linux PTE for the address and create a HPTE. However, hash_page is fairly complex and takes some locks, so to avoid the possibility of deadlock, we check the preemption count to see if we are in a (pseudo-)NMI handler, and if so, we don't call hash_page but instead treat it like a bad access that will get reported up through the exception table mechanism. An interrupt whose handler runs even though the interrupt occurred when soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI handler, which should use nmi_enter()/nmi_exit() rather than irq_enter()/irq_exit(). Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-08-18 14:48:43 +10:00
Tejun Heo	384be2b18a	Merge branch 'percpu-for-linus' into percpu-for-next Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 14:45:31 +09:00
Kumar Gala	5156ddce6c	powerpc/mm: Fix SMP issue with MMU context handling code In switch_mmu_context() if we call steal_context_smp() to get a context to use we shouldn't fall through and than call steal_context_up(). Doing so can be problematic in that the 'mm' that steal_context_up() ends up using will not get marked dirty in the stale_map[] for other CPUs that might have used that mm. Thus we could end up with stale TLB entries in the other CPUs that can cause all kinda of havoc. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-07-29 23:05:43 -05:00
Benjamin Herrenschmidt	9e1b32caa5	mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() Upcoming paches to support the new 64-bit "BookE" powerpc architecture will need to have the virtual address corresponding to PTE page when freeing it, due to the way the HW table walker works. Basically, the TLB can be loaded with "large" pages that cover the whole virtual space (well, sort-of, half of it actually) represented by a PTE page, and which contain an "indirect" bit indicating that this TLB entry RPN points to an array of PTEs from which the TLB can then create direct entries. Thus, in order to invalidate those when PTE pages are deleted, we need the virtual address to pass to tlbilx or tlbivax instructions. The old trick of sticking it somewhere in the PTE page struct page sucks too much, the address is almost readily available in all call sites and almost everybody implemets these as macros, so we may as well add the argument everywhere. I added it to the pmd and pud variants for consistency. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV] Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-27 12:10:38 -07:00
Michael Ellerman	30c5af435b	powerpc: Use pr_devel() in do_dcache_icache_coherency() pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 2036 368 8 2412 96c arch/powerpc/mm/pgtable.o size after: text data bss dec hex filename 1677 248 8 1933 78d arch/powerpc/mm/pgtable.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:24 +10:00
Michael Ellerman	29e5fa59e5	powerpc: Use pr_devel() in arch/powerpc/mm/gup.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3252 384 0 3636 e34 arch/powerpc/mm/gup.o size after: text data bss dec hex filename 2576 96 0 2672 a70 arch/powerpc/mm/gup.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:23 +10:00
Michael Ellerman	651e2dd2a1	powerpc: Cleanup & use pr_devel() in arch/powerpc/mm/slb.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 3261 416 4 3681 e61 arch/powerpc/mm/slb.o size after: text data bss dec hex filename 2861 248 4 3113 c29 arch/powerpc/mm/slb.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:22 +10:00
Michael Ellerman	a1ac38ab98	powerpc: Use pr_devel() in arch/powerpc/mm/mmu_context_nohash.c pr_debug() can now result in code being generated even when DEBUG is not defined. That's not really desirable in some places. With CONFIG_DYNAMIC_DEBUG=y: size before: text data bss dec hex filename 1508 48 28 1584 630 powerpc/mm/mmu_context_nohash.o size after: text data bss dec hex filename 1088 0 28 1116 45c powerpc/mm/mmu_context_nohash.o Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:22 +10:00
Joe Perches	d258e64ef5	powerpc: Remove unnecessary semicolons Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-07-08 13:50:21 +10:00
Tejun Heo	c43768cbb7	Merge branch 'master' into for-next Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h	2009-07-04 07:13:18 +09:00
Benjamin Herrenschmidt	850f6ac316	powerpc/mm: Make k(un)map_atomic out of line Those functions are way too big to be inline, besides, kmap_atomic() wants to call debug_kmap_atomic() which isn't exported for modules and causes module link failures. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-26 14:37:25 +10:00
Tejun Heo	204fba4aa3	percpu: cleanup percpu array definitions Currently, the following three different ways to define percpu arrays are in use. 1. DEFINE_PER_CPU(elem_type[array_len], array_name); 2. DEFINE_PER_CPU(elem_type, array_name[array_len]); 3. DEFINE_PER_CPU(elem_type, array_name)[array_len]; Unify to #1 which correctly separates the roles of the two parameters and thus allows more flexibility in the way percpu variables are defined. [ Impact: cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Tony Luck <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: linux-mm@kvack.org Cc: Christoph Lameter <cl@linux-foundation.org> Cc: David S. Miller <davem@davemloft.net>	2009-06-24 15:13:45 +09:00
Linus Torvalds	d06063cc22	Move FAULT_FLAG_xyz into handle_mm_fault() callers This allows the callers to now pass down the full set of FAULT_FLAG_xyz flags to handle_mm_fault(). All callers have been (mechanically) converted to the new calling convention, there's almost certainly room for architectures to clean up their code and then add FAULT_FLAG_RETRY when that support is added. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-21 13:08:22 -07:00
Michael Ellerman	ba55bd7436	powerpc: Add configurable -Werror for arch/powerpc Add the option to build the code under arch/powerpc with -Werror. The intention is to make it harder for people to inadvertantly introduce warnings in the arch/powerpc code. It needs to be configurable so that if a warning is introduced, people can easily work around it while it's being fixed. The option is a negative, ie. don't enable -Werror, so that it will be turned on for allyes and allmodconfig builds. The default is n, in the hope that developers will build with -Werror, that will probably lead to some build breaks, I am prepared to be flamed. It's not enabled for math-emu, which is a steaming pile of warnings. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-16 14:15:45 +10:00
Benjamin Herrenschmidt	7dafd239ab	Merge commit 'origin/master' into next	2009-06-15 10:36:54 +10:00
Sankar P	5cdcd9d691	trivial: spelling fix in ppc code comments Fixes a trivial spelling error in powerpc code comments. Signed-off-by: Sankar P <sankar.curiosity@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-06-12 18:01:47 +02:00
Benjamin Herrenschmidt	bc47ab0241	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/kernel/asm-offsets.c	2009-06-12 16:53:38 +10:00
Peter Zijlstra	f4dbfa8f31	perf_counter: Standardize event names Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 17:54:15 +02:00
Benjamin Herrenschmidt	944916858a	powerpc: Shield code specific to 64-bit server processors This is a random collection of added ifdef's around portions of code that only mak sense on server processors. Using either CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate. This is meant to make the future merging of Book3E 64-bit support easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:47:38 +10:00
Benjamin Herrenschmidt	d3f6204a7d	powerpc: Set init_bootmem_done on NUMA platforms as well For some obscure reason, we only set init_bootmem_done after initializing bootmem when NUMA isn't enabled. We even document this next to the declaration of that global in system.h which of course I didn't read before I had to debug why some WIP code wasn't working properly... This patch changes it so that we always set it after bootmem is initialized which should have always been the case... go figure ! Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt	b46b6942b3	powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock The MMU context_lock can be taken from switch_mm() while the rq->lock is held. The rq->lock can also be taken from interrupts, thus if we get interrupted in destroy_context() with the context lock held and that interrupt tries to take the rq->lock, there's a possible deadlock scenario with another CPU having the rq->lock and calling switch_mm() which takes our context lock. The fix is to always ensure interrupts are off when taking our context lock. The switch_mm() path is already good so this fixes the destroy_context() path. While at it, turn the context lock into a new style spinlock. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:43:04 +10:00
Benjamin Herrenschmidt	3035c8634f	powerpc/mm: Fix some SMP issues with MMU context handling This patch fixes a couple of issues that can happen as a result of steal_context() dropping the context_lock when all possible PIDs are ineligible for stealing (hopefully an extremely hard to hit occurence). This case exposes the possibility of a stale context_mm[] entry to be seen since destroy_context() doesn't clear it and the free map isn't re-tested. It also means steal_context() will not notice a context freed while the lock was help, thus possibly trying to steal a context when a free one was available. This fixes it by always returning to the caller from steal_context when it dropped the lock with a return value that causes the caller to re-samble the number of free contexts, along with properly clearing the context_mm[] array for destroyed contexts. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-06-09 16:42:21 +10:00
Ingo Molnar	23db9f430b	Merge branch 'linus' into perfcounters/core Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6 based - to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 10:01:39 +02:00
Benjamin Herrenschmidt	435462c6e6	Merge branch 'merge' into next	2009-05-29 13:54:52 +10:00
Benjamin Herrenschmidt	8b31e49d1d	powerpc: Fix up dma_alloc_coherent() on platforms without cache coherency. The implementation we just revived has issues, such as using a Kconfig-defined virtual address area in kernel space that nothing actually carves out (and thus will overlap whatever is there), or having some dependencies on being self contained in a single PTE page which adds unnecessary constraints on the kernel virtual address space. This fixes it by using more classic PTE accessors and automatically locating the area for consistent memory, carving an appropriate hole in the kernel virtual address space, leaving only the size of that area as a Kconfig option. It also brings some dma-mask related fixes from the ARM implementation which was almost identical initially but grew its own fixes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:33:59 +10:00
Benjamin Herrenschmidt	f637a49e50	powerpc: Minor cleanups of kernel virt address space definitions Make FIXADDR_TOP a compile time constant and cleanup a couple of definitions relative to the layout of the kernel address space on ppc32. We also print out that layout at boot time for debugging purposes. This is a pre-requisite for properly fixing non-coherent DMA allocactions. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:32:50 +10:00
Benjamin Herrenschmidt	b16e7766d6	powerpc: Move dma-noncoherent.c from arch/powerpc/lib to arch/powerpc/mm (pre-requisite to make the next patches more palatable) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-27 16:32:05 +10:00
Hideo Saito	8e35961b57	powerpc/mm: Fix broken MMU PID stealing on !SMP The recent rework of the MMU PID handling for non-hash CPUs has a subtle bug in the !SMP "optimized" variant of the PID stealing function. It clears the PID in the mm context before it calls local_flush_tlb_mm(). However, the later will not flush anything if the PID in the context is clear... Signed-off-by: Hideo Saito <hsaito.ppc@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-26 13:46:49 +10:00
Milton Miller	60dbf43851	powerpc: Add 2.06 tlbie mnemonics This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards compatibilty for CPUs before 2.06. Only useful for bare metal systems. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-21 15:44:21 +10:00
Ingo Molnar	dc3f81b129	Merge commit 'v2.6.30-rc6' into perfcounters/core Merge reason: this branch was on an -rc4 base, merge it up to -rc6 to get the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 07:37:49 +02:00
Mel Gorman	af3e4aca47	powerpc: Do not assert pte_locked for hugepage PTE entries With CONFIG_DEBUG_VM, an assertion is made when changing the protection flags of a PTE that the PTE is locked. Huge pages use a different pagetable format and the assertion is bogus and will always trigger with a bug looking something like Unable to handle kernel paging request for data at address 0xf1a00235800006f8 Faulting instruction address: 0xc000000000034a80 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=32 NUMA Maple Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor windfarm_cpufreq_clamp windfarm_core i2c_powermac NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003 REGS: c000000003037600 TRAP: 0300 Not tainted (2.6.30-rc3-autokern1) MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28002484 XER: 200fffff DAR: f1a00235800006f8, DSISR: 0000000040010000 TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2 GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500 GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001 GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8 GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000 GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20 GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000 GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8 GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880 NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0 LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 Call Trace: [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable) [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4 [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674 [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8 [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828 [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584 [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c Instruction dump: 7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4 7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000 This patch fixes the problem by not asseting the PTE is locked for VMAs backed by huge pages. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-18 15:19:04 +10:00
Becky Bruce	49a8496525	powerpc: Allow mem=x cmdline to work with 4G+ We're currently choking on mem=4g (and above) due to memory_limit being specified as an unsigned long. Make memory_limit phys_addr_t to fix this. Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-05-15 16:43:41 +10:00
Ingo Molnar	e7fd5d4b3d	Merge branch 'linus' into perfcounters/core Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 14:47:05 +02:00
Stephen Rothwell	b62c31ae40	powerpc: fix for long standing bug noticed by gcc 4.4.0 Previous gcc versions didn't notice this because one of the preceding #ifs always evaluated to true. gcc 4.4.0 produced this error: arch/powerpc/mm/tlb_nohash_low.S:206:6: error: #elif with no expression Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-23 08:52:16 -05:00
Kumar Gala	323d23aeac	Revert "powerpc: Add support for early tlbilx opcode" This reverts commit `e996557740`. Our HW guys were able to fix this so it never sees the light of day. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-23 08:51:22 -05:00
Michael Ellerman	24f1ce803c	powerpc: Fix crash on CPU hotplug early_init_mmu_secondary() is called at CPU hotplug time, so it must be marked as __cpuinit, not __init. Caused by `757c74d2` ("powerpc/mm: Introduce early_init_mmu() on 64-bit"). Tested-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2009-04-22 14:56:34 +10:00
Peter Zijlstra	78f13e9525	perf_counter: allow for data addresses to be recorded Paul suggested we allow for data addresses to be recorded along with the traditional IPs as power can provide these. For now, only the software pagefault events provide data addresses, but in the future power might as well for some events. x86 doesn't seem capable of providing this atm. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.394816925@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-08 19:05:56 +02:00
Kumar Gala	52ce67f157	powerpc/mm: Fix compile warning arch/powerpc/mm/tlb_nohash.c: In function 'flush_tlb_mm': arch/powerpc/mm/tlb_nohash.c:128: warning: unused variable 'cpu_mask' Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-07 22:11:10 -05:00
Kumar Gala	e996557740	powerpc: Add support for early tlbilx opcode During the ISA 2.06 development the opcode for tlbilx changed and some early implementations used to old opcode. Add support for a MMU_FTR fixup to deal with this. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-04-07 01:36:30 -05:00
Peter Zijlstra	ac17dc8e58	perf_counter: provide major/minor page fault software events Provide separate sw counters for major and minor page faults. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:29:40 +02:00
Peter Zijlstra	7dd1fcc258	perf_counter: provide pagefault software events We use the generic software counter infrastructure to provide page fault events. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:29:37 +02:00
Benjamin Herrenschmidt	757c74d298	powerpc/mm: Introduce early_init_mmu() on 64-bit This moves some MMU related init code out of setup_64.c into hash_utils_64.c and calls it early_init_mmu() and early_init_mmu_secondary(). This will make it easier to plug in a new MMU type. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt	ff7c660092	powerpc/mm: Fix printk type warning in mmu_context_nohash We need to use %zu instead of %d when printing a sizeof() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:34 +11:00
Benjamin Herrenschmidt	d62cbf45a8	powerpc/mm: Rename arch/powerpc/kernel/mmap.c to mmap_64.c This file is only useful on 64-bit, so we name it accordingly. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:33 +11:00
Benjamin Herrenschmidt	8d1cf34e7a	powerpc/mm: Tweak PTE bit combination definitions This patch tweaks the way some PTE bit combinations are defined, in such a way that the 32 and 64-bit variant become almost identical and that will make it easier to bring in a new common pte-* file for the new variant of the Book3-E support. The combination of bits defining access to kernel pages are now clearly separated from the combination used by userspace and the core VM. The resulting generated code should remain identical unless I made a mistake. Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB in ppc_mmu_32.c which could cause kernel mappings to be user accessible when that option is enabled. Probably something that bitrot. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:33 +11:00
Rusty Russell	56aa4129e8	cpumask: Use mm_cpumask() wrapper instead of cpu_vm_mask Makes code futureproof against the impending change to mm->cpu_vm_mask. It's also a chance to use the new cpumask_ ops which take a pointer (the older ones are deprecated, but there's no hurry for arch code). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-24 13:47:29 +11:00
Benjamin Herrenschmidt	9e5efaa936	powerpc/mm: Properly wire up get_user_pages_fast() on 32-bit While we did add support for _PAGE_SPECIAL on some 32-bit platforms, we never actually built get_user_pages_fast() on them. This fixes it which requires a little bit of ifdef'ing around. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:11:34 +11:00
Benjamin Herrenschmidt	1cdab55d8a	powerpc: Wire up /proc/vmallocinfo to our ioremap() This adds the necessary bits and pieces to powerpc implementation of ioremap to benefit from caller tracking in /proc/vmallocinfo, at least for ioremap's done after mem init as the older ones aren't tracked. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-03-11 17:10:14 +11:00
Kumar Gala	c3071951d0	powerpc/fsl-booke: Add support for tlbilx instructions The e500mc core supports the new tlbilx instructions that do core local invalidates and also provide us the ability to take down all TLB entries matching a given PID. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-03-09 09:25:38 -05:00
Anton Blanchard	002b0ec73d	powerpc: Increase stack gap on 64bit binaries On 64bit there is a possibility our stack and mmap randomisation will put the two close enough such that we can't expand our stack to match the ulimit specified. To avoid this, start the upper mmap address at 1GB + 128MB below the top of our address space, so in the worst case we end up with the same ~128MB hole as in 32bit. This works because we randomise the stack over a 1GB range. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:21 +11:00
Anton Blanchard	a5adc91a4b	powerpc: Ensure random space between stack and mmaps get_random_int() returns the same value within a 1 jiffy interval. This means that the mmap and stack regions will almost always end up the same distance apart, making a relative offset based attack possible. To fix this, shift the randomness we use for the mmap region by 1 bit. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:21 +11:00
Anton Blanchard	9f14c42d75	powerpc: Randomise mmap start address Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks. Until ppc32 uses the mmap.c functionality, this is ppc64 specific. Before: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 f75fe000-f7fff000 rw-p f75fe000 00:00 0 After: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 f718b000-f7b8c000 rw-p f718b000 00:00 0 f7551000-f7f52000 rw-p f7551000 00:00 0 f6ee7000-f78e8000 rw-p f6ee7000 00:00 0 f74d4000-f7ed5000 rw-p f74d4000 00:00 0 f6e9d000-f789e000 rw-p f6e9d000 00:00 0 Similar for 64bit, but with 1GB of scatter: # ./test & cat /proc/${!}/maps\|tail -2\|head -1 fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0 fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0 fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0 fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0 fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0 Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:07 +11:00
Anton Blanchard	13a2cb3694	powerpc: Rearrange mmap.c Rearrange mmap.c to better match the x86 version. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:06 +11:00
Nathan Fontenot	0f16ef7fd3	powerpc/numa: Cleanup hot_add_scn_to_nid This patch reworks the hot_add_scn_to_nid and its supporting functions to make them easier to understand. There are no functional changes in this patch and has been tested on machine with memory represented in the device tree as memory nodes and in the ibm,dynamic-memory property. My previous patch that introduced support for hotplug memory add on systems whose memory was represented by the ibm,dynamic-memory property of the device tree only left the code more unintelligible. This will hopefully makes things easier to understand. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 15:53:04 +11:00
Anton Blanchard	13870b6575	powerpc/mm: Reduce hashtable size when using 64kB pages At the moment we size the hashtable based on 4kB pages / 2, even on a 64kB kernel. This results in a hashtable that is much larger than it needs to be. Grab the real page size and size the hashtable based on that Note: This only has effect on non hypervisor machines. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-23 10:48:58 +11:00
Benjamin Herrenschmidt	3b7faeb49e	Merge commit 'kumar/next' into next	2009-02-18 13:23:30 +11:00
Benjamin Herrenschmidt	82a0a1cc8f	Merge commit 'origin/master' into next Manual merge of: arch/powerpc/include/asm/pgtable-ppc32.h	2009-02-18 13:19:25 +11:00
Dave Hansen	06eccea6c3	powerpc/mm: Fix numa reserve bootmem page selection Fix the powerpc NUMA reserve bootmem page selection logic. commit `8f64e1f2d1` (powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes) changed the logic for how the powerpc LMB reserved regions were converted to bootmen reserved regions. As the folowing discussion reports, the new logic was not correct. mark_reserved_regions_for_nid() goes through each LMB on the system that specifies a reserved area. It searches for active regions that intersect with that LMB and are on the specified node. It attempts to bootmem-reserve only the area where the active region and the reserved LMB intersect. We can not reserve things on other nodes as they may not have bootmem structures allocated, yet. We base the size of the bootmem reservation on two possible things. Normally, we just make the reservation start and stop exactly at the start and end of the LMB. However, the LMB reservations are not aware of NUMA nodes and on occasion a single LMB may cross into several adjacent active regions. Those may even be on different NUMA nodes and will require separate calls to the bootmem reserve functions. So, the bootmem reservation must be trimmed to fit inside the current active region. That's all fine and dandy, but we trim the reservation in a page-aligned fashion. That's bad because we start the reservation at a non-page-aligned address: physbase. The reservation may only span 2 bytes, but that those bytes may span two pfns and cause a reserve_size of 2PAGE_SIZE. Take the case where you reserve 0x2 bytes at 0x0fff and where the active region ends at 0x1000. You'll jump into that if() statment, but node_ar.end_pfn=0x1 and start_pfn=0x0. You'll end up with a reserve_size=0x1000, and then call reserve_bootmem_node(node, physbase=0xfff, size=0x1000); 0x1000 may not be on the same node as 0xfff. Oops. In almost all the vm code, end_<anything> is not inclusive. If you have an end_pfn of 0x1234, page 0x1234 is not included in the range. Using PFN_UP instead of the (>> >> PAGE_SHIFT) will make this consistent with the other VM code. We also need to do math for the reserved size with physbase instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is precisely* the end of the node. However, (start_pfn << PAGE_SHIFT) is NOT precisely the beginning of the reserved area. That is, of course, physbase. If we don't use physbase here, the reserve_size can be made too large. From: Dave Hansen <dave@linux.vnet.ibm.com> Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Tested on PS3. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-13 16:37:45 +11:00
Kumar Gala	96a8bac589	powerpc/fsl-booke: Fix compile warning arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem': arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t' Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:54:53 -06:00
Kumar Gala	d66c82ea45	powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines The Power ISA 2.06 added power of two page sizes to the embedded MMU architecture. Its done it such a way to be code compatiable with the existing HW. Made the minor code changes to support both power of two and power of four page sizes. Also added some new MAS bits and macros that are defined as part of the 2.06 ISA. Renamed some things to use the 'Book-3e' concept to convey the new MMU that is based on the Freescale Book-E MMU programming model. Note, its still invalid to try and use a page size that isn't supported by cpu. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-12 16:37:11 -06:00
Kumar Gala	f99fb8a2cb	powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW The following commit: commit `64b3d0e812` Author: Benjamin Herrenschmidt <benh@kernel.crashing.org> Date: Thu Dec 18 19:13:51 2008 +0000 powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it out before we propogate it to the PPC HW PTE. Reported-by: Martyn Welch <martyn.welch@gefanuc.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:07:02 +11:00
Benjamin Herrenschmidt	8d30c14cab	powerpc/mm: Rework I$/D$ coherency (v3) This patch reworks the way we do I and D cache coherency on PowerPC. The "old" way was split in 3 different parts depending on the processor type: - Hash with per-page exec support (64-bit and >= POWER4 only) does it at hashing time, by preventing exec on unclean pages and cleaning pages on exec faults. - Everything without per-page exec support (32-bit hash, 8xx, and 64-bit < POWER4) does it for all page going to user space in update_mmu_cache(). - Embedded with per-page exec support does it from do_page_fault() on exec faults, in a way similar to what the hash code does. That leads to confusion, and bugs. For example, the method using update_mmu_cache() is racy on SMP where another processor can see the new PTE and hash it in before we have cleaned the cache, and then blow trying to execute. This is hard to hit but I think it has bitten us in the past. Also, it's inefficient for embedded where we always end up having to do at least one more page fault. This reworks the whole thing by moving the cache sync into two main call sites, though we keep different behaviours depending on the HW capability. The call sites are set_pte_at() which is now made out of line, and ptep_set_access_flags() which joins the former in pgtable.c The base idea for Embedded with per-page exec support, is that we now do the flush at set_pte_at() time when coming from an exec fault, which allows us to avoid the double fault problem completely (we can even improve the situation more by implementing TLB preload in update_mmu_cache() but that's for later). If for some reason we didn't do it there and we try to execute, we'll hit the page fault, which will do a minor fault, which will hit ptep_set_access_flags() to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make this guys also perform the I/D cache sync for exec faults now. This second path is the catch all for things that weren't cleaned at set_pte_at() time. For cpus without per-pag exec support, we always do the sync at set_pte_at(), thus guaranteeing that when the PTE is visible to other processors, the cache is clean. For the 64-bit hash with per-page exec support case, we keep the old mechanism for now. I'll look into changing it later, once I've reworked a bit how we use _PAGE_EXEC. This is also a first step for adding _PAGE_EXEC support for embedded platforms Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:10 +11:00
Anton Blanchard	91b0f5ec53	powerpc/mm: Move 64-bit unmapped_area to top of address space We currently place mmaps just below the stack on 32bit, but leave them in the middle of the address space on 64bit: 00100000-00120000 r-xp 00100000 00:00 0 [vdso] 10000000-10010000 r-xp 00000000 08:06 179534 /tmp/sleep 10010000-10020000 rw-p 00000000 08:06 179534 /tmp/sleep 10020000-10130000 rw-p 10020000 00:00 0 [heap] 40000000000-40000030000 r-xp 00000000 08:06 440743 /lib64/ld-2.9.so 40000030000-40000040000 rw-p 00020000 08:06 440743 /lib64/ld-2.9.so 40000050000-400001f0000 r-xp 00000000 08:06 440671 /lib64/libc-2.9.so 400001f0000-40000200000 r--p 00190000 08:06 440671 /lib64/libc-2.9.so 40000200000-40000220000 rw-p 001a0000 08:06 440671 /lib64/libc-2.9.so 40000220000-40008230000 rw-p 40000220000 00:00 0 fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0 [stack] Right now it isn't an issue, but at some stage we will run into mmap or hugetlb allocation issues. Using the same layout as 32bit gives us a some breathing room. This matches what x86-64 is doing too. 00100000-00103000 r-xp 00100000 00:00 0 [vdso] 10000000-10001000 r-xp 00000000 08:06 554894 /tmp/test 10010000-10011000 r--p 00000000 08:06 554894 /tmp/test 10011000-10012000 rw-p 00001000 08:06 554894 /tmp/test 10012000-10113000 rw-p 10012000 00:00 0 [heap] fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0 ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591 /lib64/libc-2.9.so ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591 /lib64/libc-2.9.so ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591 /lib64/libc-2.9.so ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591 /lib64/libc-2.9.so ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0 ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663 /lib64/ld-2.9.so ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0 ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0 ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663 /lib64/ld-2.9.so ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663 /lib64/ld-2.9.so ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0 fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0 [stack] Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 16:00:07 +11:00
Milton Miller	8b16cd238d	powerpc/numa: Remove redundant find_cpu_node() Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in a less restrictive section (text vs cpuinit). Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:37:59 +11:00
Milton Miller	20fcefe5a0	powerpc/numa: Avoid possible reference beyond prop. length in find_min_common_depth() find_min_common_depth() was checking the property length incorrectly. The value is in bytes not cells, and it is using the second entry. Signed-off-By: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-02-11 13:37:58 +11:00
Benjamin Herrenschmidt	edbc29d76d	Merge commit 'kumar/next' into next	2009-02-11 13:37:44 +11:00
Kumar Gala	6c24b17453	powerpc/fsl-booke: Fix mapping functions to use phys_addr_t Fixed v_mapped_by_tlbcam() and p_mapped_by_tlbcam() to use phys_addr_t instead of unsigned long. In 36-bit physical mode we really need these functions to deal with phys_addr_t when trying to match a physical address or when returning one. Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-02-09 21:11:55 -06:00
Trent Piepho	96051465fd	powerpc/fsl-booke: Make CAM entries used for lowmem configurable On booke processors, the code that maps low memory only uses up to three CAM entries, even though there are sixteen and nothing else uses them. Make this number configurable in the advanced options menu along with max low memory size. If one wants 1 GB of lowmem, then it's typically necessary to have four CAM entries. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:54 -06:00
Trent Piepho	c8f3570b7e	powerpc/fsl-booke: Allow larger CAM sizes than 256 MB The code that maps kernel low memory would only use page sizes up to 256 MB. On E500v2 pages up to 4 GB are supported. However, a page must be aligned to a multiple of the page's size. I.e. 256 MB pages must aligned to a 256 MB boundary. This was enforced by a requirement that the physical and virtual addresses of the start of lowmem be aligned to 256 MB. Clearly requiring 1GB or 4GB alignment to allow pages of that size isn't acceptable. To solve this, I simply have adjust_total_lowmem() take alignment into account when it decides what size pages to use. Give it PAGE_OFFSET = 0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map pages like this: PA 0x3000_0000 VA 0x7000_0000 Size 256 MB PA 0x4000_0000 VA 0x8000_0000 Size 1 GB PA 0x8000_0000 VA 0xC000_0000 Size 256 MB PA 0x9000_0000 VA 0xD000_0000 Size 256 MB PA 0xA000_0000 VA 0xE000_0000 Size 256 MB Because the lowmem mapping code now takes alignment into account, PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB. Even lower might be possible. The lowmem code will work down to 4 kB but it's possible some of the boot code will fail before then. Poor alignment will force small pages to be used, which combined with the limited number of TLB1 pages available, will result in very little memory getting mapped. So alignments less than 64 MB probably aren't very useful anyway. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:53 -06:00
Trent Piepho	f88747e7f6	powerpc/fsl-booke: Remove code duplication in lowmem mapping The code to map lowmem uses three CAM aka TLB[1] entries to cover it. The size of each is stored in three globals named __cam0, __cam1, and __cam2. All the code that uses them is duplicated three times for each of the three variables. We have these things called arrays and loops.... Once converted to use an array, it will be easier to make the number of CAMs configurable. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-28 18:16:51 -06:00
Gerhard Pircher	4c456a67f5	powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup code _PAGE_COHERENT is now always set in _PAGE_RAM resp. PAGE_KERNEL. Thus it has to be masked out, if the BAT mapping should be non cacheable or CPU_FTR_NEED_COHERENT is not set. This will work on normal SMP setups because we force-set CPU_FTR_NEED_COHERENT as part of CPU_FTR_COMMON on SMP. Signed-off-by: Gerhard Pircher <gerhard_pircher@gmx.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-28 17:15:52 +11:00
Dave Kleikamp	9ba0fdbfae	powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices powerpc: is_hugepage_only_range() must account for both 4kB and 64kB slices The subpage_prot syscall fails on second and subsequent calls for a given region, because is_hugepage_only_range() is mis-identifying the 4 kB slices when the process has a 64 kB page size. Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-16 16:15:16 +11:00
Ingo Molnar	fe333321e2	powerpc: Change u64/s64 to a long long integer type Convert arch/powerpc/ over to long long based u64: -#ifdef __powerpc64__ -# include <asm-generic/int-l64.h> -#else -# include <asm-generic/int-ll64.h> -#endif +#include <asm-generic/int-ll64.h> This will avoid reoccuring spurious warnings in core kernel code that comes when people test on their own hardware. (i.e. x86 in ~98% of the cases) This is what x86 uses and it generally helps keep 64-bit code 32-bit clean too. [Adjusted to not impact user mode (from paulus) - sfr] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-13 14:47:59 +11:00
Benjamin Herrenschmidt	30aae739a9	Merge commit 'kumar/kumar-next' into next	2009-01-13 13:59:03 +11:00
Anton Vorontsov	7021d86afa	powerpc/mm: Make clear_fixmap() actually work The clear_fixmap() routine issues map_page() with flags set to 0. Currently this causes a BUG_ON() inside the map_page(), as it assumes that a PTE should be clear before mapping. This patch makes the map_page() to trigger the BUG_ON() only if the flags were set. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:17 +11:00
Benjamin Herrenschmidt	4a0826824b	powerpc: Fix missing semicolons in mmu_decl.h This is a brown paper bag from one of my earlier patches that breaks build on 40x and 8xx. And yes, I've now added 40x and 8xx to my list of test configs :-) Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:17 +11:00
Dave Liu	d6a09e0cd6	powerpc: Remove the redundant _tlbil_pid at SMP case Signed-off-by: Dave Liu <daveliu@freescale.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:13 +11:00
Dave Hansen	893473df78	powerpc/mm: Cleanup careful_allocation(): consolidate memset() Both users of careful_allocation() immediately memset() the result. So, just do it in one place. Also give careful_allocation() a 'z' prefix to bring it in line with kzmalloc() and friends. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:09 +11:00
Dave Hansen	0be210fd66	powerpc/mm: Make careful_allocation() return virtual addrs Since we memset() the result in both of the uses here, just make careful_alloc() return a virtual address. Also, add a separate variable to store the physial address that comes back from the lmb_alloc() functions. This makes it less likely that someone will screw it up forgetting to convert before returning since the vaddr is always in a void* and the paddr is always in an unsigned long. I admit this is arbitrary since one of its users needs a paddr and one a vaddr, but it does remove a good number of casts. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Dave Hansen	5d21ea2b0e	powerpc/mm:: Cleanup careful_allocation(): bootmem already panics If we fail a bootmem allocation, the bootmem code itself panics. No need to redo it here. Also change the wording of the other panic. We don't strictly have to allocate memory on the specified node. It is just a hint and that node may not even have any memory on it. In that case we can and do fall back to other nodes. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Dave Hansen	c555e520ef	powerpc/mm: Add better comment on careful_allocation() The behavior in careful_allocation() really confused me at first. Add a comment to hopefully make it easier on the next doofus that looks at it. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-01-08 16:25:08 +11:00
Trent Piepho	6fd8be4bf7	powerpc/fsl-booke: Remove num_tlbcam_entries This is a global variable defined in fsl_booke_mmu.c with a value that gets initialized in assembly code in head_fsl_booke.S. It's never used. If some code ever does want to know the number of entries in TLB1, then "numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a global initialized during kernel boot from assembly. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-07 15:33:07 -06:00
Trent Piepho	19f5465e82	powerpc/fsl-booke: Don't hard-code size of struct tlbcam Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam to 20 when it indexed the TLBCAM table. Anyone changing the size of struct tlbcam would not know to expect that. The kernel already has a system to get the size of C structures into assembly language files, asm-offsets, so let's use it. The definition of the struct gets moved to a header, so that asm-offsets.c can include it. Signed-off-by: Trent Piepho <tpiepho@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2009-01-07 15:33:06 -06:00
Gary Hade	c04fc586c1	mm: show node to memory section relationship with symlinks in sysfs Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:59:00 -08:00
Mel Gorman	3340289ddf	mm: report the MMU pagesize in /proc/pid/smaps The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base pagesize may still use 4K pages for the MMU on older processor. To distinguish, this patch reports MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-01-06 15:58:58 -08:00
Linus Torvalds	3c92ec8ae9	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits) powerpc/44x: Support 16K/64K base page sizes on 44x powerpc: Force memory size to be a multiple of PAGE_SIZE powerpc/32: Wire up the trampoline code for kdump powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M powerpc/32: Allow __ioremap on RAM addresses for kdump kernel powerpc/32: Setup OF properties for kdump powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs() powerpc: Prepare xmon_save_regs for use with kdump powerpc: Remove default kexec/crash_kernel ops assignments powerpc: Make default kexec/crash_kernel ops implicit powerpc: Setup OF properties for ppc32 kexec powerpc/pseries: Fix cpu hotplug powerpc: Fix KVM build on ppc440 powerpc/cell: add QPACE as a separate Cell platform powerpc/cell: fix build breakage with CONFIG_SPUFS disabled powerpc/mpc5200: fix error paths in PSC UART probe function powerpc/mpc5200: add rts/cts handling in PSC UART driver powerpc/mpc5200: Make PSC UART driver update serial errors counters powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver ... Fix trivial conflict in drivers/char/Makefile as per Paul's directions	2008-12-28 16:54:33 -08:00
Ilya Yanok	ca9153a3a2	powerpc/44x: Support 16K/64K base page sizes on 44x This adds support for 16k and 64k page sizes on PowerPC 44x processors. The PGDIR table is much smaller than a page when using 16k or 64k pages (512 and 32 bytes respectively) so we allocate the PGDIR with kzalloc() instead of __get_free_pages(). One PTE table covers rather a large memory area when using 16k or 64k pages (32MB or 512MB respectively), so we can easily put FIXMAP and PKMAP in the area covered by one PTE table. Signed-off-by: Yuri Tikhonov <yur@emcraft.com> Signed-off-by: Vladimir Panfilov <pvr@emcraft.com> Signed-off-by: Ilya Yanok <yanok@emcraft.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-29 09:53:25 +11:00
James Morris	cbacc2c7f0	Merge branch 'next' into for-linus	2008-12-25 11:40:09 +11:00
Dale Farnsworth	ccdcef72c2	powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M Add the ability for a classic ppc kernel to be loaded at an address of 32MB. This done by fixing a few places that assume we are loaded at address 0, and by changing several uses of KERNELBASE to use PAGE_OFFSET, instead. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Anton Vorontsov	01695a9687	powerpc/32: Allow __ioremap on RAM addresses for kdump kernel While for debugging it is good to catch bogus users of ioremap, though for kdump support it is more convenient to use __ioremap for copy_oldmem_page() (exactly as we do for PPC64 currently). Note that copy_oldmem_page() calls __ioremap with flags set to '0', so it should be safe with the regard to the caches. The other option is to use kmap_atomic_pfn()[1], but it will not work for kernels compiled without HIGHMEM. That is, on a board with 256MB RAM and crashkernel=64M@32M case, the !HIGHMEM capturing kernel maps 0-96M range, which does not include all the memory needed to capture the dump. And, obviously, accessing anything upper than 96M will cause faults. [1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.html Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-23 15:13:29 +11:00
Benjamin Herrenschmidt	a14953597b	powerpc: Fix missing 'blr' in _tlbia() Rework to MMU code dropped a much missed 'blr' instruction. Brown-Paper-Bag-Worn-By: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2008-12-21 02:54:25 -07:00
Benjamin Herrenschmidt	64b3d0e812	powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in in the hash code based on some CPU feature bit. We also manipulate _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places. This changes the logic so that instead, the PTE now contains _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms that need it. The hash code clears it if the feature bit is not set. It also adds some clean accessors to setup various valid combinations of access flags and change various bits of code to use them instead. This should help having the PTE actually containing the bit combinations that we really want. I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead set it explicitely from the TLB miss. I will ultimately remove it completely as it appears that it might not be needed after all but in the meantime, having it in the TLB miss makes things a lot easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	7752035180	powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs This makes the MMU context code used for CPUs with no hash table (except 603) dynamically allocate the various maps used to track the state of contexts. Only the main free map and CPU 0 stale map are allocated at boot time. Other CPU maps are allocated when those CPUs are brought up and freed if they are unplugged. This also moves the initialization of the MMU context management slightly later during the boot process, which should be fine as it's really only needed when userland if first started anyways. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	760ec0e02d	powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440 The handlers for Critical, Machine Check or Debug interrupts will save and restore MMUCR nowadays, thus we only need to disable normal interrupts when invalidating TLB entries. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	2a4aca1144	powerpc/mm: Split low level tlb invalidate for nohash processors Currently, the various forms of low level TLB invalidations are all implemented in misc_32.S for 32-bit processors, in a fairly scary mess of #ifdef's and with interesting duplication such as a whole bunch of code for FSL _tlbie and _tlbia which are no longer used. This moves things around such that _tlbie is now defined in hash_low_32.S and is only used by the 32-bit hash code, and all nohash CPUs use the various _tlbil_* forms that are now moved to a new file, tlb_nohash_low.S. I moved all the definitions for that stuff out of include/asm/tlbflush.h as they are really internal mm stuff, into mm/mmu_decl.h The code should have no functional changes. I kept some variants inline for trivial forms on things like 40x and 8xx. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	f048aace29	powerpc/mm: Add SMP support to no-hash TLB handling This commit moves the whole no-hash TLB handling out of line into a new tlb_nohash.c file, and implements some basic SMP support using IPIs and/or broadcast tlbivax instructions. Note that I'm using local invalidations for D->I cache coherency. At worst, if another processor is trying to execute the same and has the old entry in its TLB, it will just take a fault and re-do the TLB flush locally (it won't re-do the cache flush in any case). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	7c03d653cd	powerpc/mm: Introduce MMU features We're soon running out of CPU features and I need to add some new ones for various MMU related bits, so this patch separates the MMU features from the CPU features. I moved over the 32-bit MMU related ones, added base features for MMU type families, but didn't move over any 64-bit only feature yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt	2ca8cf7389	powerpc/mm: Rework context management for CPUs with no hash table This reworks the context management code used by 4xx,8xx and freescale BookE. It adds support for SMP by implementing a concept of stale context map to lazily flush the TLB on processors where a context may have been invalidated. This also contains the ground work for generalizing such lazy TLB flushing by just picking up a new PID and marking the old one stale. This will be implemented later. This is a first implementation that uses a global spinlock. Ideally, we should try to get at least the fast path (context ID already assigned) lockless or limited to a per context lock, but for now this will do. I tried to keep the UP case reasonably simple to avoid adding too much overhead to 8xx which does a lot of context stealing since it effectively has only 16 PIDs available. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt	5e696617c4	powerpc/mm: Split mmu_context handling This splits the mmu_context handling between 32-bit hash based processors, 64-bit hash based processors and everybody else. This is preliminary work for adding SMP support for BookE processors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt	f63837f058	powerpc/mm: Remove flush_HPTE() The function flush_HPTE() is used in only one place, the implementation of DEBUG_PAGEALLOC on ppc32. It's actually a dup of flush_tlb_page() though it's -slightly- more efficient on hash based processors. We remove it and replace it by a direct call to the hash flush code on those processors and to flush_tlb_page() for everybody else. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:34 +11:00
Benjamin Herrenschmidt	e41e811a79	powerpc/mm: Rename tlb_32.c and tlb_64.c to tlb_hash32.c and tlb_hash64.c This renames the files to clarify the fact that they are used by the hash based family of CPUs (the 603 being an exception in that family but is still handled by that code). This paves the way for the new tlb_nohash.c coming via a subsequent commit. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 15:53:30 +11:00
Paul Mackerras	1e1c568d6c	Merge branch 'merge' into next	2008-12-16 14:38:58 +11:00
Dave Hansen	a4c74ddd5e	powerpc: Fix bootmem reservation on uninitialized node careful_allocation() was calling into the bootmem allocator for nodes which had not been fully initialized and caused a previous bug: http://patchwork.ozlabs.org/patch/10528/ So, I merged a few broken out loops in do_init_bootmem() to fix it. That changed the code ordering. I think this bug is triggered by having reserved areas for a node which are spanned by another node's contents. In the mark_reserved_regions_for_nid() code, we attempt to reserve the area for a node before we have allocated the NODE_DATA() for that nid. We do this since I reordered that loop. I suck. This is causing crashes at bootup on some systems, as reported by Jon Tollefson. This may only present on some systems that have 16GB pages reserved. But, it can probably happen on any system that is trying to reserve large swaths of memory that happen to span other nodes' contents. This commit ensures that we do not touch bootmem for any node which has not been initialized, and also removes a compile warning about an unused variable. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 13:48:18 +11:00
Brian King	48f797de55	powerpc: Check for valid hugepage size in hugetlb_get_unmapped_area It looks like most of the hugetlb code is doing the correct thing if hugepages are not supported, but the mmap code is not. If we get into the mmap code when hugepages are not supported, such as in an LPAR which is running Active Memory Sharing, we can oops the kernel. This fixes the oops being seen in this path. oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: nfs(N) lockd(N) nfs_acl(N) sunrpc(N) ipv6(N) fuse(N) loop(N) dm_mod(N) sg(N) ibmveth(N) sd_mod(N) crc_t10dif(N) ibmvscsic(N) scsi_transport_srp(N) scsi_tgt(N) scsi_mod(N) Supported: No NIP: c000000000038d60 LR: c00000000003945c CTR: c0000000000393f0 REGS: c000000077e7b830 TRAP: 0300 Tainted: G (2.6.27.5-bz50170-2-ppc64) MSR: 8000000000009032 <EE,ME,IR,DR> CR: 44000448 XER: 20000001 DAR: c000002000af90a8, DSISR: 0000000040000000 TASK = c00000007c1b8600[4019] 'hugemmap01' THREAD: c000000077e78000 CPU: 6 GPR00: 0000001fffffffe0 c000000077e7bab0 c0000000009a4e78 0000000000000000 GPR04: 0000000000010000 0000000000000001 00000000ffffffff 0000000000000001 GPR08: 0000000000000000 c000000000af90c8 0000000000000001 0000000000000000 GPR12: 000000000000003f c000000000a73880 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000010000 GPR20: 0000000000000000 0000000000000003 0000000000010000 0000000000000001 GPR24: 0000000000000003 0000000000000000 0000000000000001 ffffffffffffffb5 GPR28: c000000077ca2e80 0000000000000000 c00000000092af78 0000000000010000 NIP [c000000000038d60] .slice_get_unmapped_area+0x6c/0x4e0 LR [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80 Call Trace: [c000000077e7bbc0] [c00000000003945c] .hugetlb_get_unmapped_area+0x6c/0x80 [c000000077e7bc30] [c000000000107e30] .get_unmapped_area+0x64/0xd8 [c000000077e7bcb0] [c00000000010b140] .do_mmap_pgoff+0x140/0x420 [c000000077e7bd80] [c00000000000bf5c] .sys_mmap+0xc4/0x140 [c000000077e7be30] [c0000000000086b4] syscall_exit+0x0/0x40 Instruction dump: fac1ffb0 fae1ffb8 fb01ffc0 fb21ffc8 fb41ffd0 fb61ffd8 fb81ffe0 fbc1fff0 fbe1fff8 f821fef1 f8c10158 f8e10160 <7d49002e> f9010168 e92d01b0 eb4902b0 Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-16 13:48:18 +11:00
James Morris	ec98ce480a	Merge branch 'master' into next Conflicts: fs/nfsd/nfs4recover.c Manually fixed above to use new creds API functions, e.g. nfs4_save_creds(). Signed-off-by: James Morris <jmorris@namei.org>	2008-12-04 17:16:36 +11:00
Kumar Gala	0186f47e70	powerpc: Use RCU based pte freeing mechanism for all powerpc Refactor the RCU based pte free code that was used on ppc64 to be used on all powerpc. Additionally refactor pte_free() & pte_free_kernel() into common code between ppc32 & ppc64. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:35 +11:00
Kumar Gala	f4f3a1261a	powerpc: hash_page_sync should only be used on SMP & STD_MMU_32 Clean up the ifdefs so we only use hash_page_sync if we have CONFIG_SMP && CONFIG_PPC_STD_MMU_32. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-03 20:46:35 +11:00
Paul Mackerras	5274918855	Merge branch 'merge'	2008-12-03 20:11:06 +11:00
Linus Torvalds	03cfdb86ac	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: powerpc: Fix system calls on Cell entered with XER.SO=1 powerpc/cell: Fix GDB watchpoints, again powerpc/mpic: Don't reset affinity for secondary MPIC on boot powerpc/cell/axon-msi: Retry on missing interrupt powerpc: Fix boot freeze on machine with empty memory node powerpc: Fix IRQ assignment for some PCIe devices powerpc/spufs: Fix spinning in spufs_ps_fault on signal powerpc/mpc832x_rdb: fix swapped ethernet ids powerpc: Use generic PHY driver for Marvell 88E1111 PHY on GE Fanuc SBC610 powerpc/85xx: L2 cache size wrong in 8572DS dts powerpc/virtex: Update defconfigs powerpc/52xx: update defconfigs xsysace: Fix driver to use resource_size_t instead of unsigned long powerpc/virtex: fix various format/casting printk mismatches powerpc/mpc5200: fix bestcomm Kconfig dependencies powerpc/44x: Fix 460EX/460GT machine check handling powerpc/40x: Limit allocable DRAM during early mapping	2008-11-30 16:44:18 -08:00
Dave Hansen	4a6186696e	powerpc: Fix boot freeze on machine with empty memory node I got a bug report about a distro kernel not booting on a particular machine. It would freeze during boot: > ... > Could not find start_pfn for node 1 > [boot]0015 Setup Done > Built 2 zonelists in Node order, mobility grouping on. Total pages: 123783 > Policy zone: DMA > Kernel command line: > [boot]0020 XICS Init > [boot]0021 XICS Done > PID hash table entries: 4096 (order: 12, 32768 bytes) > clocksource: timebase mult[7d0000] shift[22] registered > Console: colour dummy device 80x25 > console handover: boot [udbg0] -> real [hvc0] > Dentry cache hash table entries: 1048576 (order: 7, 8388608 bytes) > Inode-cache hash table entries: 524288 (order: 6, 4194304 bytes) > freeing bootmem node 0 I've reproduced this on 2.6.27.7. It is caused by commit `8f64e1f2d1` ("powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes"). The problem is that Jon took a loop which was (in pseudocode): for_each_node(nid) NODE_DATA(nid) = careful_alloc(nid); setup_bootmem(nid); reserve_node_bootmem(nid); and broke it up into: for_each_node(nid) NODE_DATA(nid) = careful_alloc(nid); setup_bootmem(nid); for_each_node(nid) reserve_node_bootmem(nid); The issue comes in when the 'careful_alloc()' is called on a node with no memory. It falls back to using bootmem from a previously-initialized node. But, bootmem has not yet been reserved when Jon's patch is applied. It gives back bogus memory (0xc000000000000000) and pukes later in boot. The following patch collapses the loop back together. It also breaks the mark_reserved_regions_for_nid() code out into a function and adds some comments. I think a huge part of introducing this bug is because for loop was too long and hard to read. The actual bug fix here is the: + if (end_pfn <= node->node_start_pfn \|\| + start_pfn >= node_end_pfn) + continue; Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-12-01 09:40:18 +11:00
Al Viro	4ea8fb9c1c	powerpc set_huge_psize() false positive called only from __init, calls __init. Incidentally, it ought to be static in file. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-11-30 10:03:35 -08:00
Robert Jennings	a6326e98a2	powerpc: Correct page-in counter for CMM with 64k pages Linux will report the number of page-ins so that the hypervisor can better determine partition memory pressure. The hardware page size and the OS page size can be different. In the case where the hardware page size is 4k and the OS is running with 64k pages the code in commit `409001948d` ("powerpc: Update page-in counter for CMM") would under-report the number of pages. This corrects the reporting to the hypervisor by incrementing the page_in count by 1 << PAGE_FACTOR each time. Reported-by: Andrew Theurer <habanero@linux.vnet.ibm.com> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-19 16:05:05 +11:00
David Howells	1330deb0f6	CRED: Wrap task credential accesses in the PowerPC arch Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(\|e\|s\|fs)[ug]id to current_(\|e\|s\|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@ozlabs.org Signed-off-by: James Morris <jmorris@namei.org>	2008-11-14 10:38:39 +11:00
Grant Erickson	5907630ffc	powerpc/40x: Limit allocable DRAM during early mapping If the size of DRAM is not an exact power of two, we may not have covered DRAM in its entirety with large 16 and 4 MiB pages. If that is the case, we can get non-recoverable page faults when doing the final PTE mappings for the non-large page PTEs. Consequently, we restrict the top end of DRAM currently allocable by updating '__initial_memory_limit_addr' so that calls to the LMB to allocate PTEs for "tail" coverage with normal-sized pages (or other reasons) do not attempt to allocate outside the allowed range. Signed-off-by: Grant Erickson <gerickson@nuovations.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2008-11-13 10:10:56 -05:00
Jon Tollefson	7d4320f3d5	powerpc: Hugetlb pgtable cache access cleanup Andrew Morton suggested that using a macro that makes an array reference look like a function call makes it harder to understand the code. This therefore removes the huge_pgtable_cache(psize) macro and replaces its uses with pgtable_cache[HUGE_PGTABLE_INDEX(psize)]. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-06 09:49:39 +11:00
Brian King	409001948d	powerpc: Update page-in counter for CMM A new field has been added to the VPA as a method for the client OS to communicate to firmware the number of page-ins it is performing when running collaborative memory overcommit. The hypervisor will use this information to better determine if a partition is experiencing memory pressure and needs more memory allocated to it. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-11-05 22:08:28 +11:00
Jon Tollefson	4792adbac9	powerpc: Don't use a 16G page if beyond mem= limits If mem= is used on the boot command line to limit memory then the memory block where a 16G page resides may not be available. Thanks to Michael Ellerman for finding the problem. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-22 15:01:21 +11:00
Benjamin Herrenschmidt	a02efb906d	Merge commit 'origin' into master Manual merge of: arch/powerpc/Kconfig arch/powerpc/include/asm/page.h	2008-10-21 15:52:04 +11:00
Milton Miller	fe55249d17	powerpc: Always trim numa memory to lmb_end_of_DRAM() numa_enforce_memory_limit tried to be smart and only call lmb_end_of_DRAM when a memory limit was set via mem= on the command line. However, the early boot code will also limit memory added to the lmb system when iommu=off is specified. When this happens, the page allocator is given pages not in the linear mapping and this results in a fatal data reference to the unmapped page. Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-21 15:19:12 +11:00
Jon Tollefson	e81703724a	powerpc/numa: Make memory reserve code more robust Adjust amount to reserve based on previous nodes for reserves spanning multiple nodes. Check if the node active range is empty before attempting to pass the reserve to bootmem. In practice the range shouldn't be empty, but to be sure we check. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-21 15:17:48 +11:00
Badari Pulavarty	71088785c6	mm: cleanup to make remove_memory() arch-neutral There is nothing architecture specific about remove_memory(). remove_memory() function is common for all architectures which support hotplug memory remove. Instead of duplicating it in every architecture, collapse them into arch neutral function. [akpm@linux-foundation.org: fix the export] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-10-20 08:50:25 -07:00
David Gibson	f5ea64dcba	powerpc: Get USE_STRICT_MM_TYPECHECKS working again The typesafe version of the powerpc pagetable handling (with USE_STRICT_MM_TYPECHECKS defined) has bitrotted again. This patch makes a bunch of small fixes to get it back to building status. It's still not enabled by default as gcc still generates worse code with it for some reason. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-14 10:35:27 +11:00
Jon Tollefson	8f64e1f2d1	powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes If there are multiple reserved memory blocks via lmb_reserve() that are contiguous addresses and on different NUMA nodes we are losing track of which address ranges to reserve in bootmem on which node. I discovered this when I recently got to try 16GB huge pages on a system with more then 2 nodes. When scanning the device tree in early boot we call lmb_reserve() with the addresses of the 16G pages that we find so that the memory doesn't get used for something else. For example the addresses for the pages could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages, one on each of eight nodes. In the lmb after all the pages have been reserved it will look something like the following: lmb_dump_all: memory.cnt = 0x2 memory.size = 0x3e80000000 memory.region[0x0].base = 0x0 .size = 0x1e80000000 memory.region[0x1].base = 0x4000000000 .size = 0x2000000000 reserved.cnt = 0x5 reserved.size = 0x3e80000000 reserved.region[0x0].base = 0x0 .size = 0x7b5000 reserved.region[0x1].base = 0x2a00000 .size = 0x78c000 reserved.region[0x2].base = 0x328c000 .size = 0x43000 reserved.region[0x3].base = 0xf4e8000 .size = 0xb18000 reserved.region[0x4].base = 0x4000000000 .size = 0x2000000000 The reserved.region[0x4] contains the 16G pages. In arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the node numbers looking for the reserved regions that belong to the particular node. It is not able to identify region 0x4 as being a part of each of the 8 nodes. It is assuming that a reserved region is only on a single node. This patch takes out the reserved region loop from inside the loop that goes over each node. It looks up the active region containing the start of the reserved region. If it extends past that active region then it adjusts the size and gets the next active region containing it. Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-10 15:55:19 +11:00
Roland Dreier	a880e76233	powerpc: Avoid integer overflow in page_is_ram() Commit `8b150478` ("ppc: make phys_mem_access_prot() work with pfns instead of addresses") fixed page_is_ram() in arch/ppc to avoid overflow for addresses above 4G on 32-bit kernels. However arch/powerpc's page_is_ram() is missing the same fix -- it computes a physical address by doing pfn << PAGE_SHIFT, which overflows if pfn corresponds to a page above 4G. In particular this causes pages above 4G to be mapped with the wrong caching attribute; for example many ppc440-based SoCs have PCI space above 4G, and mmap()ing MMIO space may end up with a mapping that has caching enabled. Fix this by working with the pfn and avoiding the conversion to physical address that causes the overflow. This patch compares the pfn to max_pfn, which is a semantic change from the old code -- that code compared the physical address to high_memory, which corresponds to max_low_pfn. However, I think that was is another bug, since highmem pages are still RAM. Reported-by: vb <vb@vsbe.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-10-07 14:26:18 +11:00
Becky Bruce	4ee7084eb1	POWERPC: Allow 32-bit hashed pgtable code to support 36-bit physical This rearranges a bit of code, and adds support for 36-bit physical addressing for configs that use a hashed page table. The 36b physical support is not enabled by default on any config - it must be explicitly enabled via the config system. This patch only expands the page table code to accomodate large physical addresses on 32-bit systems and enables the PHYS_64BIT config option for 86xx. It does not allow you to boot a board with more than about 3.5GB of RAM - for that, SWIOTLB support is also required (and coming soon). Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-09-24 16:29:44 -05:00
Becky Bruce	82331ab15f	powerpc/85xx: fix build warning, remove silly cast This fixes a build warning when PHYS_64BIT is enabled, and removes an unnecessary cast to phys_addr_t (the variable being cast is already a phys_addr_t) Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2008-09-16 10:01:35 -05:00
David Gibson	0b26425ce1	powerpc: Clean up hugepage pagetable allocation for powerpc with 16G pages There is a small bug in the handling of 16G hugepages recently added to the kernel. This doesn't cause a crash or other user-visible problems, but it does mean that more levels of pagetable are allocated than makes sense for 16G pages. The hugepage pagetables for the 16G pages are allocated much lower in the pagetable tree than they should be, with the intervening levels allocated with full pmd and pud pages which will only ever have one entry filled in. This corrects this problem, at the same time cleaning up the handling of which level 64k versus 16M hugepage pagetables are allocated at. The new way of formatting the tests should be more robust against changes in pagetable structure, or any newly added hugepage sizes. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:08:47 -07:00
Becky Bruce	aaf4a9b0f7	powerpc: Rename PTE_SIZE to HPTE_SIZE It's the size of the hardware PTE; make that clear in the name. Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:08:42 -07:00
Paul Mackerras	549e8152de	powerpc: Make the 64-bit kernel as a position-independent executable This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as a position-independent executable (PIE) when it is set. This involves processing the dynamic relocations in the image in the early stages of booting, even if the kernel is being run at the address it is linked at, since the linker does not necessarily fill in words in the image for which there are dynamic relocations. (In fact the linker does fill in such words for 64-bit executables, though not for 32-bit executables, so in principle we could avoid calling relocate() entirely when we're running a 64-bit kernel at the linked address.) The dynamic relocations are processed by a new function relocate(addr), where the addr parameter is the virtual address where the image will be run. In fact we call it twice; once before calling prom_init, and again when starting the main kernel. This means that reloc_offset() returns 0 in prom_init (since it has been relocated to the address it is running at), which necessitated a few adjustments. This also changes __va and __pa to use an equivalent definition that is simpler. With the relocatable kernel, PAGE_OFFSET and MEMORY_START are constants (for 64-bit) whereas PHYSICAL_START is a variable (and KERNELBASE ideally should be too, but isn't yet). With this, relocatable kernels still copy themselves down to physical address 0 and run there. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:08:38 -07:00
Chandru	cf00085d80	powerpc: Add support for dynamic reconfiguration memory in kexec/kdump kernels Kdump kernel needs to use only those memory regions that it is allowed to use (crashkernel, rtas, tce, etc.). Each of these regions have their own sizes and are currently added under 'linux,usable-memory' property under each memory@xxx node of the device tree. The ibm,dynamic-memory property of ibm,dynamic-reconfiguration-memory node (on POWER6) now stores in it the representation for most of the logical memory blocks with the size of each memory block being a constant (lmb_size). If one or more or part of the above mentioned regions lie under one of the lmb from ibm,dynamic-memory property, there is a need to identify those regions within the given lmb. This makes the kernel recognize a new 'linux,drconf-usable-memory' property added by kexec-tools. Each entry in this property is of the form of a count followed by that many (base, size) pairs for the above mentioned regions. The number of cells in the count value is given by the #size-cells property of the root node. Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-15 11:07:58 -07:00
Paul Mackerras	7e392f8c29	Merge branch 'linux-2.6'	2008-09-10 11:36:13 +10:00
Paul Mackerras	9e88ba4e45	powerpc: Only make kernel text pages of linear mapping executable Commit `bc033b63bb` ("powerpc/mm: Fix attribute confusion with htab_bolt_mapping()") moved the check for whether we should make pages of the linear mapping executable from htab_bolt_mapping into its callers, including htab_initialize. A side-effect of this is that the decision is now made once for each contiguous section in the LMB array rather than for each page individually. This can often mean that the whole of the linear mapping ends up being executable. This reverts to the previous behaviour, where individual pages are checked for being part of the kernel text or not, by moving the check back down into htab_bolt_mapping. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-09-03 20:53:22 +10:00
Tony Breeds	e16a9c0990	powerpc: Guard htab_dt_scan_hugepage_blocks appropriately htab_dt_scan_hugepage_blocks is only used when CONFIG_HUGETLB_PAGE is defined, so guard the declaration likewise. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-20 16:34:57 +10:00
Benjamin Herrenschmidt	bc033b63bb	powerpc/mm: Fix attribute confusion with htab_bolt_mapping() The function htab_bolt_mapping() is used to create permanent mappings in the MMU hash table, for example, in order to create the linear mapping of vmemmap. It's also used by early boot ioremap (before mem_init_done). However, the way ioremap uses it is incorrect as it passes it the protection flags in the "linux PTE" form while htab_bolt_mapping() expects them in the hash table format. This is made more confusing by the fact that some of those flags are actually in the same position in both cases. This fixes it all by making htab_bolt_mapping() take normal linux protection flags instead, and use a little helper to convert them to htab flags. Callers can now use the usual PAGE_* definitions safely. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> arch/powerpc/include/asm/mmu-hash64.h \| 2 - arch/powerpc/mm/hash_utils_64.c \| 65 ++++++++++++++++++++-------------- arch/powerpc/mm/init_64.c \| 9 +--- 3 files changed, 44 insertions(+), 32 deletions(-) Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-11 10:09:56 +10:00
Tony Breeds	c7c8eede27	powerpc: Force printing of 'total_memory' to unsigned long long total_memory is a 'phys_addr_t', Which can be either 64 or 32 bits. Force printing as unsigned long long to silence the warning. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 13:18:17 +10:00
Tony Breeds	fb61063587	powerpc: Fix compiler warning in arch/powerpc/mm/mem.c Explicitly cast to unsigned long long, rather than u64. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 13:18:17 +10:00
Stephen Rothwell	b8b572e101	powerpc: Move include files to arch/powerpc/include/asm from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-08-04 12:02:00 +10:00
Nick Piggin	ce0ad7f095	powerpc/mm: Lockless get_user_pages_fast() for 64-bit (v3) Implement lockless get_user_pages_fast for 64-bit powerpc. Page table existence is guaranteed with RCU, and speculative page references are used to take a reference to the pages without having a prior existence guarantee on them. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-30 15:26:54 +10:00
Benjamin Herrenschmidt	00df438e89	powerpc: Disable 64K hugetlb support when doing 64K SPU mappings The 64K SPU local store mapping feature is incompatible with the 64K huge pages support due to the inability of some parts of the memory management to differenciate between them while they use a different page table format. For now, disable 64K huge pages when CONFIG_SPU_FS_64K_LS, in the long run, this can be fixed by making this feature use the hugetlb page table format. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-28 16:30:53 +10:00
Johannes Weiner	bda2fa5355	powerpc: use generic show_mem() Remove arch-specific show_mem() in favor of the generic version. This also removes the following redundant information display: - pages in swapcache, printed by show_swap_cache_info() where show_mem() calls show_free_areas(), which calls show_swap_cache_info(). Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:11 -07:00
Alexey Dobriyan	51cc50685a	SL*B: drop kmem cache argument from constructor Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-26 12:00:07 -07:00
Luis Machado	d6a61bfc06	powerpc: BookE hardware watchpoint support This patch implements support for HW based watchpoint via the DBSR_DAC (Data Address Compare) facility of the BookE processors. It does so by interfacing with the existing DABR breakpoint code and adding the necessary bits and pieces for the new bits to be properly set or cleared Signed-off-by: Luis Machado <luisgpm@br.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-25 15:44:39 +10:00
Jon Tollefson	0d9ea75443	powerpc: support multiple hugepage sizes Instead of using the variable mmu_huge_psize to keep track of the huge page size we use an array of MMU_PAGE_* values. For each supported huge page size we need to know the hugepte_shift value and have a pgtable_cache. The hstate or an mmu_huge_psizes index is passed to functions so that they know which huge page size they should use. The hugepage sizes 16M and 64K are setup(if available on the hardware) so that they don't have to be set on the boot cmd line in order to use them. The number of 16G pages have to be specified at boot-time though (e.g. hugepagesz=16G hugepages=5). Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Jon Tollefson	91224346aa	powerpc: define support for 16G hugepages The huge page size is defined for 16G pages. If a hugepagesz of 16G is specified at boot-time then it becomes the huge page size instead of the default 16M. The change in pgtable-64K.h is to the macro pte_iterate_hashed_subpages to make the increment to va (the 1 being shifted) be a long so that it is not shifted to 0. Otherwise it would create an infinite loop when the shift value is for a 16G page (when base page size is 64K). Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Jon Tollefson	658013e93e	powerpc: scan device tree for gigantic pages The 16G huge pages have to be reserved in the HMC prior to boot. The location of the pages are placed in the device tree. This patch adds code to scan the device tree during very early boot and save these page locations until hugetlbfs is ready for them. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Jon Tollefson	ec4b2c0c83	powerpc: function to allocate gigantic hugepages The 16G page locations have been saved during early boot in an array. The alloc_bootmem_huge_page() function adds a page from here to the huge_boot_pages list. Acked-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:19 -07:00
Andi Kleen	ceb8687961	hugetlb: introduce pud_huge Straight forward extensions for huge pages located in the PUD instead of PMDs. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:18 -07:00
Andi Kleen	a551643895	hugetlb: modular state for hugetlb page size The goal of this patchset is to support multiple hugetlb page sizes. This is achieved by introducing a new struct hstate structure, which encapsulates the important hugetlb state and constants (eg. huge page size, number of huge pages currently allocated, etc). The hstate structure is then passed around the code which requires these fields, they will do the right thing regardless of the exact hstate they are operating on. This patch adds the hstate structure, with a single global instance of it (default_hstate), and does the basic work of converting hugetlb to use the hstate. Future patches will add more hstate structures to allow for different hugetlbfs mounts to have different page sizes. [akpm@linux-foundation.org: coding-style fixes] Acked-by: Adam Litke <agl@us.ibm.com> Acked-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:17 -07:00
Jan Beulich	42b7772812	mm: remove double indirection on tlb parameter to free_pgd_range() & Co The double indirection here is not needed anywhere and hence (at least) confusing. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Benjamin Herrenschmidt	a1f242ff46	powerpc ioremap_prot This adds ioremap_prot and pte_pgprot() so that one can extract protection bits from a PTE and use them to ioremap_prot() (in order to support ptrace of VM_IO \| VM_PFNMAP as per Rik's patch). This moves a couple of flag checks around in the ioremap implementations of arch/powerpc. There's a side effect of allowing non-cacheable and non-guarded mappings on ppc32 which before would always have _PAGE_GUARDED set whenever _PAGE_NO_CACHE is. (standard ioremap will still set _PAGE_GUARDED, but ioremap_prot will be capable of setting such a non guarded mapping). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Dave Airlie <airlied@linux.ie> Cc: Hugh Dickins <hugh@veritas.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:15 -07:00
Johannes Weiner	b61bfa3c46	mm: move bootmem descriptors definition to a single place There are a lot of places that define either a single bootmem descriptor or an array of them. Use only one central array with MAX_NUMNODES items instead. Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-24 10:47:14 -07:00
Benjamin Herrenschmidt	84c3d4aaec	Merge commit 'origin/master' Manual merge of: arch/powerpc/Kconfig arch/powerpc/kernel/stacktrace.c arch/powerpc/mm/slice.c arch/ppc/kernel/smp.c	2008-07-16 11:07:59 +10:00
Stefan Roese	2bf3016f89	powerpc: Fix problems with 32bit PPC's running with >= 4GB of RAM This patch enables 32bit PPC's (with 36bit physical address space, e.g. IBM/AMCC PPC44x) to run with >= 4GB of RAM. Mostly its just replacing types (unsigned long -> phys_addr_t). Tested on an AMCC Katmai with 4GB of DDR2. Signed-off-by: Stefan Roese <sr@denx.de> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2008-07-09 14:13:01 -04:00
Benjamin Herrenschmidt	1bc54c0311	powerpc: rework 4xx PTE access and TLB miss This is some preliminary work to improve TLB management on SW loaded TLB powerpc platforms. This introduce support for non-atomic PTE operations in pgtable-ppc32.h and removes write back to the PTE from the TLB miss handlers. In addition, the DSI interrupt code no longer tries to fixup write permission, this is left to generic code, and _PAGE_HWWRITE is gone. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>	2008-07-09 13:36:17 -04:00
Stephen Rothwell	392096e98f	generic-ipi: fix linux-next tree build failure Today's linux-next build (powerpc ppc64_defconfig) failed like this: arch/powerpc/mm/tlb_64.c: In function 'pgtable_free_now': arch/powerpc/mm/tlb_64.c:66: error: too many arguments to function 'smp_call_function' arch/powerpc/kernel/machine_kexec_64.c: In function 'kexec_prepare_cpus': arch/powerpc/kernel/machine_kexec_64.c:175: error: too many arguments to function 'smp_call_function' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Paul Mackerras <paulus@samba.org> Cc: <linuxppc-dev@ozlabs.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-03 09:25:42 +02:00
Nathan Fontenot	0db9360aaa	powerpc/pseries: Update numa association of hotplug memory add for drconf memory Update the association of a memory section with a numa node that occurs during hotplug add of a memory section. This adds a check in the hot_add_scn_to_nid() routine for the ibm,dynamic-reconfiguration-memory node in the device tree. If present the new hot_add_drconf_scn_to_nid() routine is invoked, which can properly parse the ibm,dynamic-reconfiguration-memory node of the device tree and make the proper numa node associations. This also introduces the valid_hot_add_scn() routine as a helper function for code that is common to the hot_add_scn_to_nid() and hot_add_drconf_scn_to_nid() routines. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-03 16:58:18 +10:00
Nathan Fontenot	8342681d3e	powerpc/pseries: Split code into helper routines for drconf memory This splits off several pieces of code that parse the ibm,dynamic-reconfiguration-memory node of the device tree into separate helper routines. This is in preparation for the next commit that will use these helper routines. There are no functional changes in this patch. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-03 16:58:17 +10:00
Tony Breeds	db7f37de2c	powerpc: Fix building of arch/powerpc/mm/mem.o when MEMORY_HOTPLUG=y and SPARSEMEM=n Currently the kernel fails to build with the above config options with: CC arch/powerpc/mm/mem.o arch/powerpc/mm/mem.c: In function 'arch_add_memory': arch/powerpc/mm/mem.c:130: error: implicit declaration of function 'create_section_mapping' This explicitly includes asm/sparsemem.h in arch/powerpc/mm/mem.c and moves the guards in include/asm-powerpc/sparsemem.h to protect the SPARSEMEM specific portions only. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-03 16:58:07 +10:00
Dave Kleikamp	87e9ab13c3	powerpc: hash_huge_page() should get the WIMG bits from the lpte Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-07-01 11:28:02 +10:00
Paul Mackerras	3a8247cc2c	powerpc: Only demote individual slices rather than whole process At present, if we have a kernel with a 64kB page size, and some process maps something that has to be mapped with 4kB pages (such as a cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair pages), we change the process to use 4kB pages everywhere. This hurts the performance of HPC programs that access eHCA from userspace. With this patch, the kernel will only demote the slice(s) containing the eHCA or cache-inhibited mappings, leaving the remaining slices able to use 64kB hardware pages. This also changes the slice_get_unmapped_area code so that it is willing to place a 64k-page mapping into (or across) a 4k-page slice if there is no better alternative, i.e. if the program specified MAP_FIXED or if there is not sufficient space available in slices that are either empty or already have 64k-page mappings in them. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-01 11:27:57 +10:00
Becky Bruce	316a405841	powerpc: Get rid of bitfields in ppc_bat struct While working on the 36-bit physical support, I noticed that there was exactly one line of code that actually referenced the bitfields. So I got rid of them and redefined ppc_bat as a struct of 2 u32's: batu and batl. I also got rid of the previous union that held the bitfield structs and a word representation of the batu/l values. This seems like a nicer solution than adding in a bunch of new bitfields to support extended bat addressing that would never get used, and just leaving the struct as-is would have been incomplete in the face of large physical addressing. Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-30 22:31:05 +10:00
Becky Bruce	7c5c4325d2	powerpc: Change BAT code to use phys_addr_t Currently, the physical address is an unsigned long, but it should be phys_addr_t in set_bat, [v/p]_mapped_by_bat. Also, create a macro that can convert a large physical address into the correct format for programming the BAT registers. Signed-off-by: Becky Bruce <becky.bruce@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-30 22:31:03 +10:00
Benjamin Herrenschmidt	41743a4e34	powerpc: Free a PTE bit on ppc64 with 64K pages This frees a PTE bit when using 64K pages on ppc64. This is done by getting rid of the separate _PAGE_HASHPTE bit. Instead, we just test if any of the 16 sub-page bits is set. For non-combo pages (ie. real 64K pages), we set SUB0 and the location encoding in that field. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-30 22:30:53 +10:00
Paul Mackerras	e9a4b6a3f6	Merge branch 'linux-2.6'	2008-06-30 10:16:50 +10:00
Jens Axboe	15c8b6c1aa	on_each_cpu(): kill unused 'retry' parameter It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2008-06-26 11:24:38 +02:00
Paul Mackerras	65ba6cdc83	[POWERPC] Clear sub-page HPTE present bits when demoting page size When we demote a slice from 64k to 4k, and we are about to insert an HPTE for a 4k subpage and we notice that there is an existing 64k HPTE, we first invalidate that HPTE before inserting the new 4k subpage HPTE. Since the bits that encode which hash bucket the old HPTE was in overlap with the bits that encode which of the 16 subpages have HPTEs, we need to clear out the subpage HPTE-present bits before starting to insert HPTEs for the 4k subpages. If we don't do that, we can erroneously think that a subpage already has an HPTE when it doesn't. That in itself wouldn't be such a problem except that when we go to update the HPTE that we think is present on machines with a hypervisor, the hypervisor can tell us that the HPTE we think is there is actually there even though it isn't, which can lead to a process getting stuck in a loop, continually faulting. The reason for the confusion is that the AVPN (abbreviated virtual page number) we are looking for in the HPTE for a 4k subpage can actually match the AVPN in a stale HPTE for another 64k page. For example, the HPTE for the 4k subpage at 0x84000f000 will be in the same hash bucket and have the same AVPN as the HPTE for the 64k page at 0x8400f0000. This fixes the code to clear out the subpage HPTE-present bits. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-18 21:40:43 +10:00
Paul Mackerras	8a3e1c670e	Merge branch 'merge' Conflicts: arch/powerpc/sysdev/fsl_soc.c	2008-06-09 12:19:41 +10:00
Nathan Lynch	0d5799449f	[POWERPC] Make walk_memory_resource available with MEMORY_HOTPLUG=n The ehea driver was recently changed[1] to use walk_memory_resource() to detect the system's memory layout. However, walk_memory_resource() is available only when memory hotplug is enabled. So CONFIG_EHEA was made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a network driver to have such a dependency. Make the declaration of walk_memory_resource() and its powerpc implementation (ehea is powerpc-specific) unconditionally available. [1] `48cfb14f8b` "ehea: Add DLPAR memory remove support" [2] `fb7b6ca2b6` "ehea: Add dependency to Kconfig" Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-06-09 11:32:41 +10:00
Paul Mackerras	acf464817d	Merge branch 'merge' into powerpc-next	2008-05-23 16:53:23 +10:00
David Gibson	46a7417963	[POWERPC] Fix __set_fixmap() for STRICT_MM_TYPECHECKS __set_fixmap() in pgtable_32.c currently fails to compile if STRICT_MM_TYPECHECKS is defined. This fixes it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-23 16:15:32 +10:00
Adrian Bunk	d3d3d3cdb1	[POWERPC] powerpc/mm/hash_low_32.S: Remove CVS keyword This removes a CVS keyword that wasn't updated for a long time from a comment. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-20 09:34:18 +10:00
Paul Mackerras	fcff474ea5	Merge branch 'linux-2.6' into powerpc-next	2008-05-16 23:13:42 +10:00
Benjamin Herrenschmidt	cec08e7a94	[POWERPC] vmemmap fixes to use smaller pages This changes vmemmap to use a different region (region 0xf) of the address space, and to configure the page size of that region dynamically at boot. The problem with the current approach of always using 16M pages is that it's not well suited to machines that have small amounts of memory such as small partitions on pseries, or PS3's. In fact, on the PS3, failure to allocate the 16M page backing vmmemmap tends to prevent hotplugging the HV's "additional" memory, thus limiting the available memory even more, from my experience down to something like 80M total, which makes it really not very useable. The logic used by my match to choose the vmemmap page size is: - If 16M pages are available and there's 1G or more RAM at boot, use that size. - Else if 64K pages are available, use that - Else use 4K pages I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages) and it seems to work fine. Note that I intend to change the way we organize the kernel regions & SLBs so the actual region will change from 0xf back to something else at one point, as I simplify the SLB miss handler, but that will be for a later patch. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-15 20:49:25 +10:00
Michael Ellerman	c884116ac3	[POWERPC] Remove duplicate variable definitions in mm/tlb_64.c Somewhere along the way (`e28f7faf05`, "Four level pagetables for ppc64") we ended up with duplicate definitions for pte_freelist_cur and pte_freelist_force_free. Somehow this compiles, but it would be better to just have one definition for each. The two definitions we end up with can be static too! Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:49 +10:00
Michael Ellerman	572fb578de	[POWERPC] Move declaration of tce variables into mmu-hash64.h ... instead of having extern declarations in a .c file. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:47 +10:00
Michael Ellerman	09de9ff872	[POWERPC] Fix sparse warnings in arch/powerpc/mm Make two vmemmap helpers static in init_64.c Make stab variables static in stab.c Make psize defs static in hash_utils_64.c Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:46 +10:00
Michael Ellerman	5f25f06529	[POWERPC] Move declaration of init_bootmem_done into system.h ... instead of having an extern declaration in a .c file. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-14 22:31:44 +10:00
Paul Mackerras	3b5750644b	[POWERPC] Bolt in SLB entry for kernel stack on secondary cpus This fixes a regression reported by Kamalesh Bulabel where a POWER4 machine would crash because of an SLB miss at a point where the SLB miss exception was unrecoverable. This regression is tracked at: http://bugzilla.kernel.org/show_bug.cgi?id=10082 SLB misses at such points shouldn't happen because the kernel stack is the only memory accessed other than things in the first segment of the linear mapping (which is mapped at all times by entry 0 of the SLB). The context switch code ensures that SLB entry 2 covers the kernel stack, if it is not already covered by entry 0. None of entries 0 to 2 are ever replaced by the SLB miss handler. Where this went wrong is that the context switch code assumes it doesn't have to write to SLB entry 2 if the new kernel stack is in the same segment as the old kernel stack, since entry 2 should already be correct. However, when we start up a secondary cpu, it calls slb_initialize, which doesn't set up entry 2. This is correct for the boot cpu, where we will be using a stack in the kernel BSS at this point (i.e. init_thread_union), but not necessarily for secondary cpus, whose initial stack can be allocated anywhere. This doesn't cause any immediate problem since the SLB miss handler will just create an SLB entry somewhere else to cover the initial stack. In fact it's possible for the cpu to go quite a long time without SLB entry 2 being valid. Eventually, though, the entry created by the SLB miss handler will get overwritten by some other entry, and if the next access to the stack is at an unrecoverable point, we get the crash. This fixes the problem by making slb_initialize create a suitable entry for the kernel stack, if we are on a secondary cpu and the stack isn't covered by SLB entry 0. This requires initializing the get_paca()->kstack field earlier, so I do that in smp_create_idle where the current field is initialized. This also abstracts a bit of the computation that mk_esid_data in slb.c does so that it can be used in slb_initialize. Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-02 15:00:45 +10:00
Geoff Levand	bbea346062	[POWERPC] Fix slb.c compile warnings Arrange for a syntax check to always be done on the powerpc/mm/slb.c DBG() macro by defining it to pr_debug() for non-debug builds. Also, fix these related compile warnings: slb.c:273: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int slb.c:274: warning: format '%04x' expects type 'unsigned int', but argument 2 has type 'long unsigned int' Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-05-02 15:00:44 +10:00
Badari Pulavarty	9d88a2eb6e	[POWERPC] Provide walk_memory_resource() for powerpc Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains logical memory region mapping in the lmb.memory structure. Walk through these structures and do the callbacks for the contiguous chunks. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-29 15:57:53 +10:00
Jeremy Fitzhardinge	180c06efce	hotplug-memory: make online_page() common All architectures use an effectively identical definition of online_page(), so just make it common code. x86-64, ia64, powerpc and sh are actually identical; x86-32 is slightly different. x86-32's differences arise because it puts its hotplug pages in the highmem zone. We can handle this in the generic code by inspecting the page to see if its in highmem, and update the totalhigh_pages count appropriately. This leaves init_32.c:free_new_highpage with a single caller, so I folded it into add_one_highpage_init. I also removed an incorrect comment referring to the NUMA case; any NUMA details have already been dealt with by the time online_page() is called. [akpm@linux-foundation.org: fix indenting] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Tested-by: KAMEZAWA Hiroyuki <kamez.hiroyu@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-28 08:58:17 -07:00
Kumar Gala	f608600e74	[POWERPC] Clean up access to thread_info in assembly Use (31-THREAD_SHIFT) to get to thread_info from stack pointer. This makes the code a bit easier to read and more robust if we ever change THREAD_SHIFT. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:58:02 +10:00
Kumar Gala	2c419bdeca	[POWERPC] Port fixmap from x86 and use for kmap_atomic The fixmap code from x86 allows us to have compile time virtual addresses that we change the physical addresses of at run time. This is useful for applications like kmap_atomic, PCI config that is done via direct memory map, kexec/kdump. We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal location for PKMAP_BASE based on where the fixmap addresses start and working back from there. Additionally, the kmap code in asm-powerpc/highmem.h always had debug enabled. Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should have the extra debug checking. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:58:02 +10:00
Kumar Gala	37dd2badcf	[POWERPC] 85xx: Add support for relocatable kernel (and booting at non-zero) Added support to allow an 85xx kernel to be run from a non-zero physical address (useful for cooperative asymmetric multiprocessing situations and kdump). The support can be configured at compile time by setting CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as desired. Alternatively, the kernel build can set CONFIG_RELOCATABLE. Setting this config option causes the kernel to determine at runtime the physical addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START. If CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning. However, CONFIG_PHYSICAL_START will always be used to set the LOAD program header physical address field in the resulting ELF image. Currently we are limited to running at a physical address that is a multiple of 256M. This is due to how we map TLBs to cover lowmem. This should be fixed to allow 64M or maybe even 16M alignment in the future. It is considered an error to try and run a kernel at a non-aligned physical address. All the magic for this support is accomplished by proper initialization of the kernel memory subsystem and use of ARCH_PFN_OFFSET. The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings. ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET. /dev/mem continues to allow access to any physical address in the system regardless of how CONFIG_PHYSICAL_START is set. Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:58:01 +10:00
Michael Ellerman	6df1646e31	[POWERPC] Add include of linux/of.h to numa.c numa.c requires routines declared in linux/of.h, so should include it. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>	2008-04-24 20:57:32 +10:00

... 6 7 8 9 10 ...

1076 Commits