linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-07 04:32:03 +00:00

Author	SHA1	Message	Date
Heiko Carstens	8fe234d3c8	s390/mm: fix mapping of read-only kernel text section Within the identity mapping the kernel text section is mapped read-only. However when mapping the first and last page of the text section we must round upwards and downwards respectively, if only parts of a page belong to the section. Otherwise potential rw data can be mapped read-only. So the rounding must be done just the other way we have it right now. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-10-09 14:16:59 +02:00
Heiko Carstens	378b1e7a80	s390/mm: fix pmd_huge() usage for kernel mapping pmd_huge() will always return 0 on !HUGETLBFS, however we use that helper function when walking the kernel page tables to decide if we have a 1MB page frame or not. Since we create 1MB frames for the kernel 1:1 mapping independently of HUGETLBFS this can lead to incorrect storage accesses since the code can assume that we have a pointer to a page table instead of a pointer to a 1MB frame. Fix this by adding a pmd_large() primitive like other architectures have it already and remove all references to HUGETLBFS/HUGETLBPAGE from the code that walks kernel page tables. Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-10-09 14:16:56 +02:00
Gerald Schaefer	648609e3f2	s390: enable large page support with CONFIG_DEBUG_PAGEALLOC So far, large page support was completely disabled with CONFIG_DEBUG_PAGEALLOC, although it would be sufficient if only the large page kernel mapping was disabled. This patch enables large page support with CONFIG_DEBUG_PAGEALLOC, while it prevents the large kernel mapping in that case. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-09-26 15:44:50 +02:00
Heiko Carstens	a53c8fab3f	s390/comments: unify copyright messages and remove file names Remove the file name from the comment at top of many files. In most cases the file name was wrong anyway, so it's rather pointless. Also unify the IBM copyright statement. We did have a lot of sightly different statements and wanted to change them one after another whenever a file gets touched. However that never happened. Instead people start to take the old/"wrong" statements to use as a template for new files. So unify all of them in one go. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2012-07-20 11:15:04 +02:00
Heiko Carstens	f4815ac6c9	s390/headers: replace __s390x__ with CONFIG_64BIT where possible Replace __s390x__ with CONFIG_64BIT in all places that are not exported to userspace or guarded with #ifdef __KERNEL__. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2012-05-24 10:10:10 +02:00
Michael Holzheu	60a0c68df2	[S390] kdump backend code This patch provides the architecture specific part of the s390 kdump support. Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-10-30 15:16:42 +01:00
Martin Schwidefsky	e5992f2e6c	[S390] kvm guest address space mapping Add code that allows KVM to control the virtual memory layout that is seen by a guest. The guest address space uses a second page table that shares the last level pte-tables with the process page table. If a page is unmapped from the process page table it is automatically unmapped from the guest page table as well. The guest address space mapping starts out empty, KVM can map any individual 1MB segments from the process virtual memory to any 1MB aligned location in the guest virtual memory. If a target segment in the process virtual memory does not exist or is unmapped while a guest mapping exists the desired target address is stored as an invalid segment table entry in the guest page table. The population of the guest page table is fault driven. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-07-24 10:48:21 +02:00
Martin Schwidefsky	b2fa47e6bf	[S390] refactor page table functions for better pgste support Rework the architecture page table functions to access the bits in the page table extension array (pgste). There are a number of changes: 1) Fix missing pgste update if the attach_count for the mm is <= 1. 2) For every operation that affects the invalid bit in the pte or the rcp byte in the pgste the pcl lock needs to be acquired. The function pgste_get_lock gets the pcl lock and returns the current pgste value for a pte pointer. The function pgste_set_unlock stores the pgste and releases the lock. Between these two calls the bits in the pgste can be shuffled. 3) Define two software bits in the pte _PAGE_SWR and _PAGE_SWC to avoid calling SetPageDirty and SetPageReferenced from pgtable.h. If the host reference backup bit or the host change backup bit has been set the dirty/referenced state is transfered to the pte. The common code will pick up the state from the pte. 4) Add ptep_modify_prot_start and ptep_modify_prot_commit for mprotect. 5) Remove pgd_populate_kernel, pud_populate_kernel, pmd_populate_kernel pgd_clear_kernel, pud_clear_kernel, pmd_clear_kernel and ptep_invalidate. 6) Rename kvm_s390_test_and_clear_page_dirty to ptep_test_and_clear_user_dirty and add ptep_test_and_clear_user_young. 7) Define mm_exclusive() and mm_has_pgste() helper to improve readability. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2011-05-23 10:24:31 +02:00
Heiko Carstens	a1b200e27c	mm: provide init_mm mm_context initializer Provide an INIT_MM_CONTEXT intializer macro which can be used to statically initialize mm_struct:mm_context of init_mm. This way we can get rid of code which will do the initialization at run time (on s390). In addition the current code can be found at a place where it is not expected. So let's have a common initializer which architectures can use if needed. This is based on a patch from Suzuki Poulose. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Suzuki Poulose <suzuki@in.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:44:54 -07:00
Christian Borntraeger	6af7eea2ae	[S390] s390: disable change bit override commit `6a985c6194` ([S390] s390: use change recording override for kernel mapping) deactivated the change bit recording for the kernel mapping to improve the performance. This works most of the time, but there are cases (e.g. kernel runs in home space, futex atomic compare xcmg) where we modify user memory with the kernel mapping instead of the user mapping. Instead of fixing these cases, this patch just deactivates change bit override to avoid future problems with other kernel code that might use the kernel mapping for user memory. CC: stable@kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2010-04-09 13:43:02 +02:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Christian Borntraeger	6a985c6194	[S390] s390: use change recording override for kernel mapping We dont need the dirty bit if a write access is done via the kernel mapping. In that case SetPageDirty and friends are used anyway, no need to do that a second time. We can use the change-recording overide function for the kernel mapping, if available. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-12-07 12:51:37 +01:00
Martin Schwidefsky	50aa98bad0	[S390] fix recursive locking on page_table_lock Suzuki Poulose reported the following recursive locking bug on s390: Here is the stack trace : (see Appendix I for more info) [<0000000000406ed6>] _spin_lock+0x52/0x94 [<0000000000103bde>] crst_table_free+0x14e/0x1a4 [<00000000001ba684>] __pmd_alloc+0x114/0x1ec [<00000000001be8d0>] handle_mm_fault+0x2cc/0xb80 [<0000000000407d62>] do_dat_exception+0x2b6/0x3a0 [<0000000000114f8c>] sysc_return+0x0/0x8 [<00000200001642b2>] 0x200001642b2 The page_table_lock is already acquired in __pmd_alloc (mm/memory.c) and it tries to populate the pud/pgd with a new pmd allocated. If another thread populates it before we get a chance, we free the pmd using pmd_free(). On s390x, pmd_free(even pud_free ) is #defined to crst_table_free(), which acquires the page_table_lock to protect the crst_table index updates. Hence this ends up in a recursive locking of the page_table_lock. The solution suggested by Dave Hansen is to use a new spin lock in the mmu context to protect the access to the crst_list and the pgtable_list. Reported-by: Suzuki Poulose <suzuki@in.ibm.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2009-09-11 10:29:53 +02:00
Heiko Carstens	ee0ddadd08	[S390] vmemmap: fix off-by-one bug. If a memory range is supposed to be added to the 1:1 mapping and it ends just below the maximum supported physical address it won't succeed. This is because a test doesn't consider that the end address is 1 smaller than start + size. Fix the comparison. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-06-10 10:03:27 +02:00
Heiko Carstens	67060d9c1f	[S390] Fix section mismatch warnings. This fixes the last remaining section mismatch warnings in s390 architecture code. It reveals also a real bug introduced by... me with git commit `2069e978d5` ("[S390] sparsemem vmemmap: initialize memmap.") Calling the generic vmemmap_alloc_block() function to get initialized memory is a nice idea, however that function is __meminit annotated and therefore the function might be gone if we try to call it later. This can happen if a DCSS segment gets added. So basically revert the patch and clear the memmap explicitly to fix the original bug. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-05-30 10:03:34 +02:00
Heiko Carstens	2069e978d5	[S390] sparsemem vmemmap: initialize memmap. Let's just use the generic vmmemmap_alloc_block() function which always returns initialized memory. Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-05-15 16:52:38 +02:00
Heiko Carstens	17f3458085	[S390] Convert to SPARSEMEM & SPARSEMEM_VMEMMAP Convert s390 to SPARSEMEM and SPARSEMEM_VMEMMAP. We do a select of SPARSEMEM_VMEMMAP since it is configurable. This is because SPARSEMEM without SPARSEMEM_VMEMMAP gives us a hell of broken include dependencies that I don't want to fix. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-04-30 13:38:48 +02:00
Gerald Schaefer	53492b1de4	[S390] System z large page support. This adds hugetlbfs support on System z, using both hardware large page support if available and software large page emulation on older hardware. Shared (large) page tables are implemented in software emulation mode, by using page->index of the first tail page from a compound large page to store page table information. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-04-30 13:38:47 +02:00
Heiko Carstens	8fc6365868	[S390] vmemmap: use clear_table to initialise page tables. Always use clear_table to initialise page tables. The overlapping memcpy is just a leftover of a previous version that wasn't fully converted to clear_table. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-04-30 13:38:46 +02:00
Martin Schwidefsky	5a216a2083	[S390] Add four level page tables for CONFIG_64BIT=y. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-02-09 18:24:40 +01:00
Martin Schwidefsky	146e4b3c8b	[S390] 1K/2K page table pages. This patch implements 1K/2K page table pages for s390. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-02-09 18:24:40 +01:00
Heiko Carstens	0189103c69	[S390] Remove BUILD_BUG_ON() in vmem code. Remove BUILD_BUG_ON() in vmem code since it causes build failures if the size of struct page increases. Instead calculate at compile time the address of the highest physical address that can be added to the 1:1 mapping. This supposed to fix a build failure with the page owner tracking leak detector patches as reported by akpm. page-owner-tracking-leak-detector-broken-on-s390.patch can be removed from -mm again when this is merged. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-02-05 16:51:01 +01:00
Heiko Carstens	2bc89b5ece	[S390] Fix couple of section mismatches. Fix couple of section mismatches. And since we touch the code anyway change the IPL code to use C99 initializers. Cc: Michael Holzheu <holzheu@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-02-05 16:50:56 +01:00
Christian Borntraeger	a2fd64d6aa	[S390] vmemmap: allocate struct pages before 1:1 mapping We have seen an oops in an OOM situation, where show_mem tried to access the struct page of a dcss segment. The vmemmap code has already created the 1:1 mapping but failed allocating the struct pages. In the OOM case, show_mem now walks the memory. It uses pfn_valid to detect if it may access the struct page. In the case described above, the mapping was established and pfn_valid returned true. As the struct pages were not allocated, the kernel oopsed. We have to ensure that we have created the struct pages, before we add a mapping pointing to the pages. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-01-26 14:11:23 +01:00
Heiko Carstens	9f4b0ba81f	[S390] Get rid of HOLES_IN_ZONE requirement. Align everything to MAX_ORDER so we can get rid of the extra checks. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-01-26 14:11:13 +01:00
Christian Borntraeger	5fd9c6e214	[S390] Change vmalloc defintions Currently the vmalloc area starts at a dynamic address depending on the memory size. There was also an 8MB security hole after the physical memory to catch out-of-bounds accesses. We can simplify the code by putting the vmalloc area explicitely at the top of the kernel mapping and setting the vmalloc size to a fixed value of 128MB/128GB for 31bit/64bit systems. Part of the vmalloc area will be used for the vmem_map. This leaves an area of 96MB/1GB for normal vmalloc allocations. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2008-01-26 14:11:12 +01:00
Martin Schwidefsky	190a1d722a	[S390] 4level-fixup cleanup Get independent from asm-generic/4level-fixup.h Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-10-22 12:52:49 +02:00
Martin Schwidefsky	3610cce87a	[S390] Cleanup page table definitions. - De-confuse the defines for the address-space-control-elements and the segment/region table entries. - Create out of line functions for page table allocation / freeing. - Simplify get_shadow_xxx functions. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-10-22 12:52:49 +02:00
Heiko Carstens	e62133b4ea	[S390] Get rid of new section mismatch warnings. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-07-27 12:29:18 +02:00
Gerald Schaefer	c1821c2e97	[S390] noexec protection This provides a noexec protection on s390 hardware. Our hardware does not have any bits left in the pte for a hw noexec bit, so this is a different approach using shadow page tables and a special addressing mode that allows separate address spaces for code and data. As a special feature of our "secondary-space" addressing mode, separate page tables can be specified for the translation of data addresses (storage operands) and instruction addresses. The shadow page table is used for the instruction addresses and the standard page table for the data addresses. The shadow page table is linked to the standard page table by a pointer in page->lru.next of the struct page corresponding to the page that contains the standard page table (since page->private is not really private with the pte_lock and the page table pages are not in the LRU list). Depending on the software bits of a pte, it is either inserted into both page tables or just into the standard (data) page table. Pages of a vma that does not have the VM_EXEC bit set get mapped only in the data address space. Any try to execute code on such a page will cause a page translation exception. The standard reaction to this is a SIGSEGV with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the kernel to the signal stack frame. Unfortunately, the signal return mechanism cannot be modified to use an SA_RESTORER because the exception unwinding code depends on the system call opcode stored behind the signal stack frame. This feature requires that user space is executed in secondary-space mode and the kernel in home-space mode, which means that the addressing modes need to be switched and that the noexec protection only works for user space. After switching the addressing modes, we cannot use the mvcp/mvcs instructions anymore to copy between kernel and user space. A new mvcos instruction has been added to the z9 EC/BC hardware which allows to copy between arbitrary address spaces, but on older hardware the page tables need to be walked manually. Signed-off-by: Gerald Schaefer <geraldsc@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2007-02-05 21:18:17 +01:00
Dave Hansen	a2f3aa0257	[PATCH] Fix sparsemem on Cell Fix an oops experienced on the Cell architecture when init-time functions, early_*(), are called at runtime. It alters the call paths to make sure that the callers explicitly say whether the call is being made on behalf of a hotplug even, or happening at boot-time. It has been compile tested on ppc64, ia64, s390, i386 and x86_64. Acked-by: Arnd Bergmann <arndb@de.ibm.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Acked-by: Andy Whitcroft <apw@shadowen.org> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2007-01-11 18:18:20 -08:00
Heiko Carstens	f4eb07c17d	[S390] Virtual memmap for s390. Virtual memmap support for s390. Inspired by the ia64 implementation. Unlike ia64 we need a mechanism which allows us to dynamically attach shared memory regions. These memory regions are accessed via the dcss device driver. dcss implements the 'direct_access' operation, which requires struct pages for every single shared page. Therefore this implementation provides an interface to attach/detach shared memory: int add_shared_memory(unsigned long start, unsigned long size); int remove_shared_memory(unsigned long start, unsigned long size); The purpose of the add_shared_memory function is to add the given memory range to the 1:1 mapping and to make sure that the corresponding range in the vmemmap is backed with physical pages. It also initialises the new struct pages. remove_shared_memory in turn only invalidates the page table entries in the 1:1 mapping. The page tables and the memory used for struct pages in the vmemmap are currently not freed. They will be reused when the next segment will be attached. Given that the maximum size of a shared memory region is 2GB and in addition all regions must reside below 2GB this is not too much of a restriction, but there is room for improvement. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	2006-12-08 15:56:07 +01:00

32 Commits