linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-08 13:11:45 +00:00

Author	SHA1	Message	Date
H. Peter Anvin	de65d816aa	Merge remote-tracking branch 'origin/x86/boot' into x86/mm2 Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2013-01-29 15:10:15 -08:00
Linus Torvalds	3d59eebc5e	Automatic NUMA Balancing V11 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) iQIcBAABAgAGBQJQx0kQAAoJEHzG/DNEskfi4fQP/R5PRovayroZALBMLnVJDaLD Ttr9p40VNXbiJ+MfRgatJjSSJZ4Jl+fC3NEqBhcwVZhckZZb9R2s0WtrSQo5+ZbB vdRfiuKoCaKM4cSZ08C12uTvsF6xjhjd27CTUlMkyOcDoKxMEFKelv0hocSxe4Wo xqlv3eF+VsY7kE1BNbgBP06SX4tDpIHRxXfqJPMHaSKQmre+cU0xG2GcEu3QGbHT DEDTI788YSaWLmBfMC+kWoaQl1+bV/FYvavIAS8/o4K9IKvgR42VzrXmaFaqrbgb 72ksa6xfAi57yTmZHqyGmts06qYeBbPpKI+yIhCMInxA9CY3lPbvHppRf0RQOyzj YOi4hovGEMJKE+BCILukhJcZ9jCTtS3zut6v1rdvR88f4y7uhR9RfmRfsxuW7PNj 3Rmh191+n0lVWDmhOs2psXuCLJr3LEiA0dFffN1z8REUTtTAZMsj8Rz+SvBNAZDR hsJhERVeXB6X5uQ5rkLDzbn1Zic60LjVw7LIp6SF2OYf/YKaF8vhyWOA8dyCEu8W CGo7AoG0BO8tIIr8+LvFe8CweypysZImx4AjCfIs4u9pu/v11zmBvO9NO5yfuObF BreEERYgTes/UITxn1qdIW4/q+Nr0iKO3CTqsmu6L1GfCz3/XzPGs3U26fUhllqi Ka0JKgnWvsa6ez6FSzKI =ivQa -----END PGP SIGNATURE----- Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma Pull Automatic NUMA Balancing bare-bones from Mel Gorman: "There are three implementations for NUMA balancing, this tree (balancenuma), numacore which has been developed in tip/master and autonuma which is in aa.git. In almost all respects balancenuma is the dumbest of the three because its main impact is on the VM side with no attempt to be smart about scheduling. In the interest of getting the ball rolling, it would be desirable to see this much merged for 3.8 with the view to building scheduler smarts on top and adapting the VM where required for 3.9. The most recent set of comparisons available from different people are mel: https://lkml.org/lkml/2012/12/9/108 mingo: https://lkml.org/lkml/2012/12/7/331 tglx: https://lkml.org/lkml/2012/12/10/437 srikar: https://lkml.org/lkml/2012/12/10/397 The results are a mixed bag. In my own tests, balancenuma does reasonably well. It's dumb as rocks and does not regress against mainline. On the other hand, Ingo's tests shows that balancenuma is incapable of converging for this workloads driven by perf which is bad but is potentially explained by the lack of scheduler smarts. Thomas' results show balancenuma improves on mainline but falls far short of numacore or autonuma. Srikar's results indicate we all suffer on a large machine with imbalanced node sizes. My own testing showed that recent numacore results have improved dramatically, particularly in the last week but not universally. We've butted heads heavily on system CPU usage and high levels of migration even when it shows that overall performance is better. There are also cases where it regresses. Of interest is that for specjbb in some configurations it will regress for lower numbers of warehouses and show gains for higher numbers which is not reported by the tool by default and sometimes missed in treports. Recently I reported for numacore that the JVM was crashing with NullPointerExceptions but currently it's unclear what the source of this problem is. Initially I thought it was in how numacore batch handles PTEs but I'm no longer think this is the case. It's possible numacore is just able to trigger it due to higher rates of migration. These reports were quite late in the cycle so I/we would like to start with this tree as it contains much of the code we can agree on and has not changed significantly over the last 2-3 weeks." * tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits) mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable mm/rmap: Convert the struct anon_vma::mutex to an rwsem mm: migrate: Account a transhuge page properly when rate limiting mm: numa: Account for failed allocations and isolations as migration failures mm: numa: Add THP migration for the NUMA working set scanning fault case build fix mm: numa: Add THP migration for the NUMA working set scanning fault case. mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG mm: sched: numa: Control enabling and disabling of NUMA balancing mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships mm: numa: migrate: Set last_nid on newly allocated page mm: numa: split_huge_page: Transfer last_nid on tail page mm: numa: Introduce last_nid to the page frame sched: numa: Slowly increase the scanning period as NUMA faults are handled mm: numa: Rate limit setting of pte_numa if node is saturated mm: numa: Rate limit the amount of memory that is migrated between nodes mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting mm: numa: Migrate pages handled during a pmd_numa hinting fault mm: numa: Migrate on reference policy ...	2012-12-16 15:18:08 -08:00
Linus Torvalds	11520e5e7c	Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)" This reverts commit `bd52276fa1` ("x86-64/efi: Use EFI to deal with platform wall clock (again)"), and the two supporting commits: `da5a108d05`: "x86/kernel: remove tboot 1:1 page table creation code" `185034e72d`: "x86, efi: 1:1 pagetable mapping for virtual EFI calls") as they all depend semantically on commit `53b87cf088` ("x86, mm: Include the entire kernel memory map in trampoline_pgd") that got reverted earlier due to the problems it caused. This was pointed out by Yinghai Lu, and verified by me on my Macbook Air that uses EFI. Pointed-out-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-15 15:20:41 -08:00
Linus Torvalds	be354f4081	Revert "x86, mm: Include the entire kernel memory map in trampoline_pgd" This reverts commit `53b87cf088`. It causes odd bootup problems on x86-64. Markus Trippelsdorf gets a repeatable oops, and I see a non-repeatable oops (or constant stream of messages that scroll off too quickly to read) that seems to go away with this commit reverted. So we don't know exactly what is wrong with the commit, but it's definitely problematic, and worth reverting sooner rather than later. Bisected-by: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: H Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-15 12:29:54 -08:00
Linus Torvalds	d42b3a2906	Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI update from Peter Anvin: "EFI tree, from Matt Fleming. Most of the patches are the new efivarfs filesystem by Matt Garrett & co. The balance are support for EFI wallclock in the absence of a hardware-specific driver, and various fixes and cleanups." * 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) efivarfs: Make efivarfs_fill_super() static x86, efi: Check table header length in efi_bgrt_init() efivarfs: Use query_variable_info() to limit kmalloc() efivarfs: Fix return value of efivarfs_file_write() efivarfs: Return a consistent error when efivarfs_get_inode() fails efivarfs: Make 'datasize' unsigned long efivarfs: Add unique magic number efivarfs: Replace magic number with sizeof(attributes) efivarfs: Return an error if we fail to read a variable efi: Clarify GUID length calculations efivarfs: Implement exclusive access for {get,set}_variable efivarfs: efivarfs_fill_super() ensure we clean up correctly on error efivarfs: efivarfs_fill_super() ensure we free our temporary name efivarfs: efivarfs_fill_super() fix inode reference counts efivarfs: efivarfs_create() ensure we drop our reference on inode on error efivarfs: efivarfs_file_read ensure we free data in error paths x86-64/efi: Use EFI to deal with platform wall clock (again) x86/kernel: remove tboot 1:1 page table creation code x86, efi: 1:1 pagetable mapping for virtual EFI calls x86, mm: Include the entire kernel memory map in trampoline_pgd ...	2012-12-14 10:08:40 -08:00
Linus Torvalds	f6e858a00a	Merge branch 'akpm' (Andrew's patch-bomb) Merge misc VM changes from Andrew Morton: "The rest of most-of-MM. The other MM bits await a slab merge. This patch includes the addition of a huge zero_page. Not a performance boost but it an save large amounts of physical memory in some situations. Also a bunch of Fujitsu engineers are working on memory hotplug. Which, as it turns out, was badly broken. About half of their patches are included here; the remainder are 3.8 material." However, this merge disables CONFIG_MOVABLE_NODE, which was totally broken. We don't add new features with "default y", nor do we add Kconfig questions that are incomprehensible to most people without any help text. Does the feature even make sense without compaction or memory hotplug? * akpm: (54 commits) mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic() mm/memory.c: remove unused code from do_wp_page() asm-generic, mm: pgtable: consolidate zero page helpers mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage hwpoison, hugetlbfs: fix RSS-counter warning hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage mm: protect against concurrent vma expansion memcg: do not check for mm in __mem_cgroup_count_vm_event tmpfs: support SEEK_DATA and SEEK_HOLE (reprise) mm: provide more accurate estimation of pages occupied by memmap fs/buffer.c: remove redundant initialization in alloc_page_buffers() fs/buffer.c: do not inline exported function writeback: fix a typo in comment mm: introduce new field "managed_pages" to struct zone mm, oom: remove statically defined arch functions of same name mm, oom: remove redundant sleep in pagefault oom handler mm, oom: cleanup pagefault oom handler memory_hotplug: allow online/offline memory to result movable node numa: add CONFIG_MOVABLE_NODE for movable-dedicated node mm, memcg: avoid unnecessary function call when memcg is disabled ...	2012-12-13 13:11:15 -08:00
Linus Torvalds	a2013a13e6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial branch from Jiri Kosina: "Usual stuff -- comment/printk typo fixes, documentation updates, dead code elimination." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) HOWTO: fix double words typo x86 mtrr: fix comment typo in mtrr_bp_init propagate name change to comments in kernel source doc: Update the name of profiling based on sysfs treewide: Fix typos in various drivers treewide: Fix typos in various Kconfig wireless: mwifiex: Fix typo in wireless/mwifiex driver messages: i2o: Fix typo in messages/i2o scripts/kernel-doc: check that non-void fcts describe their return value Kernel-doc: Convention: Use a "Return" section to describe return values radeon: Fix typo and copy/paste error in comments doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c various: Fix spelling of "asynchronous" in comments. Fix misspellings of "whether" in comments. eisa: Fix spelling of "asynchronous". various: Fix spelling of "registered" in comments. doc: fix quite a few typos within Documentation target: iscsi: fix comment typos in target/iscsi drivers treewide: fix typo of "suport" in various comments and Kconfig treewide: fix typo of "suppport" in various comments ...	2012-12-13 12:00:02 -08:00
David Rientjes	c2d23f919b	mm, oom: remove statically defined arch functions of same name out_of_memory() is a globally defined function to call the oom killer. x86, sh, and powerpc all use a function of the same name within file scope in their respective fault.c unnecessarily. Inline the functions into the pagefault handlers to clean the code up. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-12 17:38:34 -08:00
Lai Jiangshan	4b0ef1fe8a	page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Since we introduced N_MEMORY, we update the initialization of node_states. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-12 17:38:33 -08:00
Linus Torvalds	743aa456c1	Merge branch 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull "Nuke 386-DX/SX support" from Ingo Molnar: "This tree removes ancient-386-CPUs support and thus zaps quite a bit of complexity: 24 files changed, 56 insertions(+), 425 deletions(-) ... which complexity has plagued us with extra work whenever we wanted to change SMP primitives, for years. Unfortunately there's a nostalgic cost: your old original 386 DX33 system from early 1991 won't be able to boot modern Linux kernels anymore. Sniff." I'm not sentimental. Good riddance. * 'x86-nuke386-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, 386 removal: Document Nx586 as a 386 and thus unsupported x86, cleanups: Simplify sync_core() in the case of no CPUID x86, 386 removal: Remove CONFIG_X86_POPAD_OK x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK x86, 386 removal: Remove CONFIG_INVLPG x86, 386 removal: Remove CONFIG_BSWAP x86, 386 removal: Remove CONFIG_XADD x86, 386 removal: Remove CONFIG_CMPXCHG x86, 386 removal: Remove CONFIG_M386 from Kconfig	2012-12-11 19:59:32 -08:00
Linus Torvalds	37ea95a959	Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU update from Ingo Molnar: "The major features of this tree are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486." * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) context_tracking: New context tracking susbsystem sched: Mark RCU reader in sched_show_task() rcu: Separate accounting of callbacks from callback-free CPUs rcu: Add callback-free CPUs rcu: Add documentation for the new rcuexp debugfs trace file rcu: Update documentation for TREE_RCU debugfs tracing rcu: Reduce default RCU CPU stall warning timeout rcu: Fix TINY_RCU rcu_is_cpu_rrupt_from_idle check rcu: Clarify memory-ordering properties of grace-period primitives rcu: Add new rcutorture module parameters to start/end test messages rcu: Remove list_for_each_continue_rcu() rcu: Fix batch-limit size problem rcu: Add tracing for synchronize_sched_expedited() rcu: Remove old debugfs interfaces and also RCU flavor name rcu: split 'rcuhier' to each flavor rcu: split 'rcugp' to each flavor rcu: split 'rcuboost' to each flavor rcu: split 'rcubarrier' to each flavor rcu: Fix tracing formatting rcu: Remove the interface "rcudata.csv" ...	2012-12-11 18:10:49 -08:00
Michel Lespinasse	cdc1734495	mm: use vm_unmapped_area() in hugetlbfs on i386 architecture Update the i386 hugetlb_get_unmapped_area function to make use of vm_unmapped_area() instead of implementing a brute force search. [akpm@linux-foundation.org: fix build] Signed-off-by: Michel Lespinasse <walken@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-11 17:22:25 -08:00
Rik van Riel	e4a1cc56e4	x86: mm: drop TLB flush from ptep_set_access_flags Intel has an architectural guarantee that the TLB entry causing a page fault gets invalidated automatically. This means we should be able to drop the local TLB invalidation. Because of the way other areas of the page fault code work, chances are good that all x86 CPUs do this. However, if someone somewhere has an x86 CPU that does not invalidate the TLB entry causing a page fault, this one-liner should be easy to revert. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com>	2012-12-11 14:28:33 +00:00
Rik van Riel	0f9a921cf9	x86: mm: only do a local tlb flush in ptep_set_access_flags() The function ptep_set_access_flags() is only ever invoked to set access flags or add write permission on a PTE. The write bit is only ever set together with the dirty bit. Because we only ever upgrade a PTE, it is safe to skip flushing entries on remote TLBs. The worst that can happen is a spurious page fault on other CPUs, which would flush that TLB entry. Lazily letting another CPU incur a spurious page fault occasionally is (much!) cheaper than aggressively flushing everybody else's TLB. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michel Lespinasse <walken@google.com> Cc: Ingo Molnar <mingo@kernel.org>	2012-12-11 14:28:33 +00:00
Nadia Yvette Chambers	6d49e352ae	propagate name change to comments in kernel source I've legally changed my name with New York State, the US Social Security Administration, et al. This patch propagates the name change and change in initials and login to comments in the kernel source as well. Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-12-06 10:39:54 +01:00
Ingo Molnar	630e1e0bcd	Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Conflicts: arch/x86/kernel/ptrace.c Pull the latest RCU tree from Paul E. McKenney: " The major features of this series are: 1. A first version of no-callbacks CPUs. This version prohibits offlining CPU 0, but only when enabled via CONFIG_RCU_NOCB_CPU=y. Relaxing this constraint is in progress, but not yet ready for prime time. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/724, and are at branch rcu/nocb. 2. Changes to SRCU that allows statically initialized srcu_struct structures. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/296, and are at branch rcu/srcu. 3. Restructuring of RCU's debugfs output. These commits were posted to LKML at https://lkml.org/lkml/2012/10/30/341, and are at branch rcu/tracing. 4. Additional CPU-hotplug/RCU improvements, posted to LKML at https://lkml.org/lkml/2012/10/30/327, and are at branch rcu/hotplug. Note that the commit eliminating __stop_machine() was judged to be too-high of risk, so is deferred to 3.9. 5. Changes to RCU's idle interface, most notably a new module parameter that redirects normal grace-period operations to their expedited equivalents. These were posted to LKML at https://lkml.org/lkml/2012/10/30/739, and are at branch rcu/idle. 6. Additional diagnostics for RCU's CPU stall warning facility, posted to LKML at https://lkml.org/lkml/2012/10/30/315, and are at branch rcu/stall. The most notable change reduces the default RCU CPU stall-warning time from 60 seconds to 21 seconds, so that it once again happens sooner than the softlockup timeout. 7. Documentation updates, which were posted to LKML at https://lkml.org/lkml/2012/10/30/280, and are at branch rcu/doc. A couple of late-breaking changes were posted at https://lkml.org/lkml/2012/11/16/634 and https://lkml.org/lkml/2012/11/16/547. 8. Miscellaneous fixes, which were posted to LKML at https://lkml.org/lkml/2012/10/30/309, along with a late-breaking change posted at Fri, 16 Nov 2012 11:26:25 -0800 with message-ID <20121116192625.GA447@linux.vnet.ibm.com>, but which lkml.org seems to have missed. These are at branch rcu/fixes. 9. Finally, a fix for an lockdep-RCU splat was posted to LKML at https://lkml.org/lkml/2012/11/7/486. This is at rcu/next. " Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-03 06:27:05 +01:00
Frederic Weisbecker	91d1aa43d3	context_tracking: New context tracking susbsystem Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> [ paulmck: fix whitespace error and email address. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2012-11-30 11:40:07 -08:00
H. Peter Anvin	a5c2a893db	x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK All 486+ CPUs support WP in supervisor mode, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-7-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:03 -08:00
H. Peter Anvin	094ab1db7c	x86, 386 removal: Remove CONFIG_INVLPG All 486+ CPUs support INVLPG, so remove the fallback 386 support code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1354132230-21854-6-git-send-email-hpa@linux.intel.com	2012-11-29 13:23:02 -08:00
Yinghai Lu	94b43c3d86	x86, mm: kill numa_free_all_bootmem() Now NO_BOOTMEM version free_all_bootmem_node() does not really do free_bootmem at all, and it only call register_page_bootmem_info_node instead. That is confusing, try to kill that free_all_bootmem_node(). Before that, this patch will remove numa_free_all_bootmem(). That function could be replaced with register_page_bootmem_info() and free_all_bootmem(); Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-43-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:47 -08:00
Yinghai Lu	b8fd39c036	x86, mm: Use clamp_t() in init_range_memory_mapping save some lines, and make code more readable. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-42-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:46 -08:00
Yinghai Lu	60a8f42832	x86, mm: Move after_bootmem to mm_internel.h it is only used in arch/x86/mm/init*.c Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-41-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:45 -08:00
Yinghai Lu	4e37a89047	x86, mm: Unifying after_bootmem for 32bit and 64bit after_bootmem has different meaning in 32bit and 64bit. 32bit: after bootmem is ready 64bit: after bootmem is distroyed Let's merget them make 32bit the same as 64bit. for 32bit, it is mixing alloc_bootmem_pages, and alloc_low_page under after_bootmem is set or not set. alloc_bootmem is just wrapper for memblock for x86. Now we have alloc_low_page() with memblock too. We can drop bootmem path now, and only alloc_low_page only. At the same time, we make alloc_low_page could handle real after_bootmem for 32bit, because alloc_bootmem_pages could fallback to use slab too. At last move after_bootmem set position for 32bit the same as 64bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-40-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:44 -08:00
Yinghai Lu	2e8059edb6	x86, mm: use limit_pfn for end pfn instead of shifting end to get that. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-39-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:43 -08:00
Yinghai Lu	1829ae9ad7	x86, mm: use pfn instead of pos in split_mem_range could save some bit shifting operations. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-38-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:41 -08:00
Yinghai Lu	84d770019b	x86, mm: use PFN_DOWN in split_mem_range() to replace own inline version for shifting. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-37-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:40 -08:00
Yinghai Lu	5a0d3aeeef	x86, mm: use round_up/down in split_mem_range() to replace own inline version for those roundup and rounddown. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-36-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:40 -08:00
Yinghai Lu	11ed9e927d	x86, mm: Add check before clear pte above max_low_pfn on 32bit During test patch that adjust page_size_mask to map small range ram with big page size, found page table is setup wrongly for 32bit. And native_pagetable_init wrong clear pte for pmd with large page support. 1. add more comments about why we are expecting pte. 2. add BUG checking, so next time we could find problem earlier when we mess up page table setup again. 3. max_low_pfn is not included boundary for low memory mapping. We should check from max_low_pfn instead of +1. 4. add print out when some pte really get cleared, or we should use WARN() to find out why above max_low_pfn get mapped? so we could fix it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-35-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:39 -08:00
Yinghai Lu	c8dcdb9ce4	x86, mm: Move function declaration into mm_internal.h They are only for mm/init*.c. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-34-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:38 -08:00
Yinghai Lu	f836e35a98	x86, mm: change low/hignmem_pfn_init to static on 32bit Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-33-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:37 -08:00
Yinghai Lu	148b20989e	x86, mm: Move init_gbpages() out of setup.c Put it in mm/init.c, and call it from probe_page_mask(). init_mem_mapping is calling probe_page_mask at first. So calling sequence is not changed. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-32-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:37 -08:00
Yinghai Lu	cf47065961	x86, mm: Move back pgt_buf_* to mm/init.c Also change them to static. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-31-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:36 -08:00
Yinghai Lu	719272c45b	x86, mm: only call early_ioremap_page_table_range_init() once On 32bit, before patcheset that only set page table for ram, we only call that one time. Now, we are calling that during every init_memory_mapping if we have holes under max_low_pfn. We should only call it one time after all ranges under max_low_page get mapped just like we did before. Also that could avoid the risk to run out of pgt_buf in BRK. Need to update page_table_range_init() to count the pages for kmap page table at first, and use new added alloc_low_pages() to get pages in sequence. That will conform to the requirement that pages need to be in low to high order. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-30-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:29 -08:00
Stefano Stabellini	ddd3509df8	x86, mm: Add pointer about Xen mmu requirement for alloc_low_pages Add link for more information `279b706` x86,xen: introduce x86_init.mapping.pagetable_reserve -v2: updated to commets from hpa to include commit name. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-29-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:28 -08:00
Yinghai Lu	22c8ca2ac2	x86, mm: Add alloc_low_pages(num) 32bit kmap mapping needs pages to be used for low to high. At this point those pages are still from pgt_buf_* from BRK, so it is ok now. But we want to move early_ioremap_page_table_range_init() out of init_memory_mapping() and only call it one time later, that will make page_table_range_init/page_table_kmap_check/alloc_low_page to use memblock to get page. memblock allocation for pages are from high to low. So will get panic from page_table_kmap_check() that has BUG_ON to do ordering checking. This patch add alloc_low_pages to make it possible to allocate serveral pages at first, and hand out pages one by one from low to high. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-28-git-send-email-yinghai@kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:27 -08:00
Yinghai Lu	6f80b68e9e	x86, mm, Xen: Remove mapping_pagetable_reserve() Page table area are pre-mapped now after x86, mm: setup page table in top-down x86, mm: Remove early_memremap workaround for page table accessing on 64bit mapping_pagetable_reserve is not used anymore, so remove it. Also remove operation in mask_rw_pte(), as modified allow_low_page always return pages that are already mapped, moreover xen_alloc_pte_init, xen_alloc_pmd_init, etc, will mark the page RO before hooking it into the pagetable automatically. -v2: add changelog about mask_rw_pte() from Stefano. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-27-git-send-email-yinghai@kernel.org Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:26 -08:00
Yinghai Lu	9985b4c6fa	x86, mm: Move min_pfn_mapped back to mm/init.c Also change it to static. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:24 -08:00
Yinghai Lu	5c51bdbe4c	x86, mm: Merge alloc_low_page between 64bit and 32bit They are almost same except 64 bit need to handle after_bootmem case. Add mm_internal.h to make that alloc_low_page() only to be accessible from arch/x86/mm/init*.c Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-25-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:23 -08:00
Yinghai Lu	868bf4d6b9	x86, mm: Remove parameter in alloc_low_page for 64bit Now all page table buf are pre-mapped, and could use virtual address directly. So don't need to remember physical address anymore. Remove that phys pointer in alloc_low_page(), and that will allow us to merge alloc_low_page between 64bit and 32bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-24-git-send-email-yinghai@kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:22 -08:00
Yinghai Lu	973dc4f3fa	x86, mm: Remove early_memremap workaround for page table accessing on 64bit We try to put page table high to make room for kdump, and at that time those ranges are not mapped yet, and have to use ioremap to access it. Now after patch that pre-map page table top down. x86, mm: setup page table in top-down We do not need that workaround anymore. Just use __va to return directly mapping address. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-23-git-send-email-yinghai@kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:20 -08:00
Yinghai Lu	8d57470d8f	x86, mm: setup page table in top-down Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first. Then use mapped pages to map more ranges below, and keep looping until all pages get mapped. alloc_low_page will use page from BRK at first, after that buffer is used up, will use memblock to find and reserve pages for page table usage. Introduce min_pfn_mapped to make sure find new pages from mapped ranges, that will be updated when lower pages get mapped. Also add step_size to make sure that don't try to map too big range with limited mapped pages initially, and increase the step_size when we have more mapped pages on hand. We don't need to call pagetable_reserve anymore, reserve work is done in alloc_low_page() directly. At last we can get rid of calculation and find early pgt related code. -v2: update to after fix_xen change, also use MACRO for initial pgt_buf size and add comments with it. -v3: skip big reserved range in memblock.reserved near end. -v4: don't need fix_xen change now. -v5: add changelog about moving about reserving pagetable to alloc_low_page. Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:19 -08:00
Yinghai Lu	f763ad1d38	x86, mm: Break down init_all_memory_mapping Will replace that with top-down page table initialization. New API need to take range: init_range_memory_mapping() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-21-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:17 -08:00
Yinghai Lu	eceb3632ac	x86, mm: Don't clear page table if range is ram After we add code use buffer in BRK to pre-map buf for page table in following patch: x86, mm: setup page table in top-down it should be safe to remove early_memmap for page table accessing. Instead we get panic with that. It turns out that we clear the initial page table wrongly for next range that is separated by holes. And it only happens when we are trying to map ram range one by one. We need to check if the range is ram before clearing page table. We change the loop structure to remove the extra little loop and use one loop only, and in that loop will caculate next at first, and check if [addr,next) is covered by E820_RAM. -v2: E820_RESERVED_KERN is treated as E820_RAM. EFI one change some E820_RAM to that, so next kernel by kexec will know that range is used already. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-20-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:17 -08:00
Yinghai Lu	aeebe84cc9	x86, mm: Use big page size for small memory range We could map small range in the middle of big range at first, so should use big page size at first to avoid using small page size to break down page table. Only can set big page bit when that range has ram area around it. -v2: fix 32bit boundary checking. We can not count ram above max_low_pfn for 32 bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-19-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:16 -08:00
Yinghai Lu	960ddb4fe7	x86, mm: Align start address to correct big page size We are going to use buffer in BRK to map small range just under memory top, and use those new mapped ram to map ram range under it. The ram range that will be mapped at first could be only page aligned, but ranges around it are ram too, we could use bigger page to map it to avoid small page size. We will adjust page_size_mask in following patch: x86, mm: Use big page size for small memory range to use big page size for small ram range. Before that patch, this patch will make sure start address to be aligned down according to bigger page size, otherwise entry in page page will not have correct value. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-18-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:15 -08:00
Jacob Shin	66520ebc2d	x86, mm: Only direct map addresses that are marked as E820_RAM Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT ) and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not backed by actual DRAM. This is fine for holes under 4GB which are covered by fixed and variable range MTRRs to be UC. However, we run into trouble on higher memory addresses which cannot be covered by MTRRs. Our system with 1TB of RAM has an e820 that looks like this: BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable and so direct mappings are created for huge memory hole between 0x000000e038000000 to 0x0000010000000000. Even though the kernel never generates memory accesses in that region, since the page tables mark them incorrectly as being WB, our (AMD) processor ends up causing a MCE while doing some memory bookkeeping/optimizations around that area. This patch iterates through e820 and only direct maps ranges that are marked as E820_RAM, and keeps track of those pfn ranges. Depending on the alignment of E820 ranges, this may possibly result in using smaller size (i.e. 4K instead of 2M or 1G) page tables. -v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range instead. - Yinghai Lu -v3: add calculate_all_table_space_size() to get correct needed page table size. - Yinghai Lu -v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when mem map does have hole under 4g that is found by Konard on xen domU with 8g ram. - Yinghai Signed-off-by: Jacob Shin <jacob.shin@amd.com> Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.org Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:14 -08:00
Yinghai Lu	8eb5779f6b	x86, mm: use pfn_range_is_mapped() with CPA We are going to map ram only, so under max_low_pfn_mapped, between 4g and max_pfn_mapped does not mean mapped at all. Use pfn_range_is_mapped() directly. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-13-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:09 -08:00
Yinghai Lu	ab9519376e	x86, mm: Separate out calculate_table_space_size() It should take physical address range that will need to be mapped. find_early_table_space should take range that pgt buff should be in. Separating page table size calculating and finding early page table to reduce confusing. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-9-git-send-email-yinghai@kernel.org Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:07 -08:00
Yinghai Lu	c14fa0b63b	x86, mm: Find early page table buffer together We should not do that in every calling of init_memory_mapping. At the same time need to move down early_memtest, and could remove after_bootmem checking. -v2: fix one early_memtest with 32bit by passing max_pfn_mapped instead. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-8-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:06 -08:00
Yinghai Lu	84f1ae30bb	x86, mm: Change find_early_table_space() paramters call split_mem_range inside the function. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1353123563-3103-7-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-11-17 11:59:05 -08:00

1 2 3 4 5 ...

1714 Commits