linux/arch/um/kernel/trap.c

325 lines
8.2 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (C) 2000 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
*/
#include <linux/mm.h>
#include <linux/sched/signal.h>
#include <linux/hardirq.h>
#include <linux/module.h>
#include <linux/uaccess.h>
#include <linux/sched/debug.h>
#include <asm/current.h>
#include <asm/tlbflush.h>
#include <arch.h>
#include <as-layout.h>
#include <kern_util.h>
#include <os.h>
#include <skas.h>
/*
* Note this is constrained to return 0, -EFAULT, -EACCES, -ENOMEM by
* segv().
*/
int handle_page_fault(unsigned long address, unsigned long ip,
int is_write, int is_user, int *code_out)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
pmd_t *pmd;
pte_t *pte;
int err = -EFAULT;
mm: introduce FAULT_FLAG_DEFAULT Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL. Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 04:08:37 +00:00
unsigned int flags = FAULT_FLAG_DEFAULT;
*code_out = SEGV_MAPERR;
/*
mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler Introduce faulthandler_disabled() and use it to check for irq context and disabled pagefaults (via pagefault_disable()) in the pagefault handlers. Please note that we keep the in_atomic() checks in place - to detect whether in irq context (in which case preemption is always properly disabled). In contrast, preempt_disable() should never be used to disable pagefaults. With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt counter, and therefore the result of in_atomic() differs. We validate that condition by using might_fault() checks when calling might_sleep(). Therefore, add a comment to faulthandler_disabled(), describing why this is needed. faulthandler_disabled() and pagefault_disable() are defined in linux/uaccess.h, so let's properly add that include to all relevant files. This patch is based on a patch from Thomas Gleixner. Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 15:52:11 +00:00
* If the fault was with pagefaults disabled, don't take the fault, just
* fail.
*/
mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler Introduce faulthandler_disabled() and use it to check for irq context and disabled pagefaults (via pagefault_disable()) in the pagefault handlers. Please note that we keep the in_atomic() checks in place - to detect whether in irq context (in which case preemption is always properly disabled). In contrast, preempt_disable() should never be used to disable pagefaults. With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt counter, and therefore the result of in_atomic() differs. We validate that condition by using might_fault() checks when calling might_sleep(). Therefore, add a comment to faulthandler_disabled(), describing why this is needed. faulthandler_disabled() and pagefault_disable() are defined in linux/uaccess.h, so let's properly add that include to all relevant files. This patch is based on a patch from Thomas Gleixner. Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: David.Laight@ACULAB.COM Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bigeasy@linutronix.de Cc: borntraeger@de.ibm.com Cc: daniel.vetter@intel.com Cc: heiko.carstens@de.ibm.com Cc: herbert@gondor.apana.org.au Cc: hocko@suse.cz Cc: hughd@google.com Cc: mst@redhat.com Cc: paulus@samba.org Cc: ralf@linux-mips.org Cc: schwidefsky@de.ibm.com Cc: yang.shi@windriver.com Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-11 15:52:11 +00:00
if (faulthandler_disabled())
goto out_nosemaphore;
if (is_user)
flags |= FAULT_FLAG_USER;
retry:
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 04:33:25 +00:00
mmap_read_lock(mm);
vma = find_vma(mm, address);
if (!vma)
goto out;
if (vma->vm_start <= address)
goto good_area;
if (!(vma->vm_flags & VM_GROWSDOWN))
goto out;
if (is_user && !ARCH_IS_STACKGROW(address))
goto out;
vma = expand_stack(mm, address);
if (!vma)
goto out_nosemaphore;
good_area:
*code_out = SEGV_ACCERR;
if (is_write) {
if (!(vma->vm_flags & VM_WRITE))
goto out;
flags |= FAULT_FLAG_WRITE;
} else {
/* Don't require VM_READ|VM_EXEC for write faults! */
if (!(vma->vm_flags & (VM_READ | VM_EXEC)))
goto out;
}
do {
mm: convert return type of handle_mm_fault() caller to vm_fault_t Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. Ref-> commit 1c8f422059ae ("mm: change return type to vm_fault_t") In this patch all the caller of handle_mm_fault() are changed to return vm_fault_t type. Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Tony Luck <tony.luck@intel.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: James Hogan <jhogan@kernel.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: David S. Miller <davem@davemloft.net> Cc: Richard Weinberger <richard@nod.at> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 22:44:47 +00:00
vm_fault_t fault;
mm: do page fault accounting in handle_mm_fault Patch series "mm: Page fault accounting cleanups", v5. This is v5 of the pf accounting cleanup series. It originates from Gerald Schaefer's report on an issue a week ago regarding to incorrect page fault accountings for retried page fault after commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times"): https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/ What this series did: - Correct page fault accounting: we do accounting for a page fault (no matter whether it's from #PF handling, or gup, or anything else) only with the one that completed the fault. For example, page fault retries should not be counted in page fault counters. Same to the perf events. - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf event is used in an adhoc way across different archs. Case (1): for many archs it's done at the entry of a page fault handler, so that it will also cover e.g. errornous faults. Case (2): for some other archs, it is only accounted when the page fault is resolved successfully. Case (3): there're still quite some archs that have not enabled this perf event. Since this series will touch merely all the archs, we unify this perf event to always follow case (1), which is the one that makes most sense. And since we moved the accounting into handle_mm_fault, the other two MAJ/MIN perf events are well taken care of naturally. - Unify definition of "major faults": the definition of "major fault" is slightly changed when used in accounting (not VM_FAULT_MAJOR). More information in patch 1. - Always account the page fault onto the one that triggered the page fault. This does not matter much for #PF handlings, but mostly for gup. More information on this in patch 25. Patchset layout: Patch 1: Introduced the accounting in handle_mm_fault(), not enabled. Patch 2-23: Enable the new accounting for arch #PF handlers one by one. Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.) Patch 25: Cleanup GUP task_struct pointer since it's not needed any more This patch (of 25): This is a preparation patch to move page fault accountings into the general code in handle_mm_fault(). This includes both the per task flt_maj/flt_min counters, and the major/minor page fault perf events. To do this, the pt_regs pointer is passed into handle_mm_fault(). PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault handlers. So far, all the pt_regs pointer that passed into handle_mm_fault() is NULL, which means this patch should have no intented functional change. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-12 01:37:44 +00:00
fault = handle_mm_fault(vma, address, flags, NULL);
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(current))
goto out_nosemaphore;
mm: avoid unnecessary page fault retires on shared memory types I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon no page. It's because we'll need to release the mmap lock for dirty rate limit purpose with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). Then after that throttling we return VM_FAULT_RETRY. We did that probably because VM_FAULT_RETRY is the only way we can return to the fault handler at that time telling it we've released the mmap lock. However that's not ideal because it's very likely the fault does not need to be retried at all since the pgtable was well installed before the throttling, so the next continuous fault (including taking mmap read lock, walk the pgtable, etc.) could be in most cases unnecessary. It's not only slowing down page faults for shared file-backed, but also add more mmap lock contention which is in most cases not needed at all. To observe this, one could try to write to some shmem page and look at "pgfault" value in /proc/vmstat, then we should expect 2 counts for each shmem write simply because we retried, and vm event "pgfault" will capture that. To make it more efficient, add a new VM_FAULT_COMPLETED return code just to show that we've completed the whole fault and released the lock. It's also a hint that we should very possibly not need another fault immediately on this page because we've just completed it. This patch provides a ~12% perf boost on my aarch64 test VM with a simple program sequentially dirtying 400MB shmem file being mmap()ed and these are the time it needs: Before: 650.980 ms (+-1.94%) After: 569.396 ms (+-1.38%) I believe it could help more than that. We need some special care on GUP and the s390 pgfault handler (for gmap code before returning from pgfault), the rest changes in the page fault handlers should be relatively straightforward. Another thing to mention is that mm_account_fault() does take this new fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping them as-is. Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vineet Gupta <vgupta@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part] Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Brian Cain <bcain@quicinc.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Weinberger <richard@nod.at> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Will Deacon <will@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Simek <monstr@monstr.eu> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: David Hildenbrand <david@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Helge Deller <deller@gmx.de> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-30 18:34:50 +00:00
/* The fault is fully completed (including releasing mmap lock) */
if (fault & VM_FAULT_COMPLETED)
return 0;
mm: fault feedback #2 This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 08:47:05 +00:00
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM) {
goto out_of_memory;
vm: add VM_FAULT_SIGSEGV handling support The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d4514 ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: Takashi Iwai <tiwai@suse.de> Tested-by: Jan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-29 18:51:32 +00:00
} else if (fault & VM_FAULT_SIGSEGV) {
goto out;
mm: fault feedback #2 This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 08:47:05 +00:00
} else if (fault & VM_FAULT_SIGBUS) {
err = -EACCES;
goto out;
}
BUG();
}
if (fault & VM_FAULT_RETRY) {
flags |= FAULT_FLAG_TRIED;
goto retry;
}
mm: fault feedback #2 This patch completes Linus's wish that the fault return codes be made into bit flags, which I agree makes everything nicer. This requires requires all handle_mm_fault callers to be modified (possibly the modifications should go further and do things like fault accounting in handle_mm_fault -- however that would be for another patch). [akpm@linux-foundation.org: fix alpha build] [akpm@linux-foundation.org: fix s390 build] [akpm@linux-foundation.org: fix sparc build] [akpm@linux-foundation.org: fix sparc64 build] [akpm@linux-foundation.org: fix ia64 build] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Matthew Wilcox <willy@debian.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Still apparently needs some ARM and PPC loving - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 08:47:05 +00:00
mm: pgtable: add shortcuts for accessing kernel PMD and PTE The powerpc 32-bit implementation of pgtable has nice shortcuts for accessing kernel PMD and PTE for a given virtual address. Make these helpers available for all architectures. [rppt@linux.ibm.com: microblaze: fix page table traversal in setup_rt_frame()] Link: http://lkml.kernel.org/r/20200518191511.GD1118872@kernel.org [akpm@linux-foundation.org: s/pmd_ptr_k/pmd_off_k/ in various powerpc places] Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-9-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 04:33:05 +00:00
pmd = pmd_off(mm, address);
pte = pte_offset_kernel(pmd, address);
} while (!pte_present(*pte));
err = 0;
/*
* The below warning was added in place of
* pte_mkyoung(); if (is_write) pte_mkdirty();
* If it's triggered, we'd see normally a hang here (a clean pte is
* marked read-only to emulate the dirty bit).
* However, the generic code can mark a PTE writable but clean on a
* concurrent read fault, triggering this harmlessly. So comment it out.
*/
#if 0
WARN_ON(!pte_young(*pte) || (is_write && !pte_dirty(*pte)));
#endif
out:
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 04:33:25 +00:00
mmap_read_unlock(mm);
out_nosemaphore:
return err;
out_of_memory:
/*
* We ran out of memory, call the OOM killer, and return the userspace
* (which will retry the fault, or kill us if we got oom-killed).
*/
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 04:33:25 +00:00
mmap_read_unlock(mm);
if (!is_user)
goto out_nosemaphore;
pagefault_out_of_memory();
return 0;
}
static void show_segv_info(struct uml_pt_regs *regs)
{
struct task_struct *tsk = current;
struct faultinfo *fi = UPT_FAULTINFO(regs);
if (!unhandled_signal(tsk, SIGSEGV))
return;
if (!printk_ratelimit())
return;
printk("%s%s[%d]: segfault at %lx ip %px sp %px error %x",
task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG,
tsk->comm, task_pid_nr(tsk), FAULT_ADDRESS(*fi),
(void *)UPT_IP(regs), (void *)UPT_SP(regs),
fi->error_code);
print_vma_addr(KERN_CONT " in ", UPT_IP(regs));
printk(KERN_CONT "\n");
}
static void bad_segv(struct faultinfo fi, unsigned long ip)
{
current->thread.arch.faultinfo = fi;
force_sig_fault(SIGSEGV, SEGV_ACCERR, (void __user *) FAULT_ADDRESS(fi));
}
uml: kill processes instead of panicing kernel UML was panicing in the case of failures of libc calls which shouldn't happen. This is an overreaction since a failure from libc doesn't normally mean that kernel data structures are in an unknown state. Instead, the current process should just be killed if there is no way to recover. The case that prompted this was a failure of PTRACE_SETREGS restoring the same state that was read by PTRACE_GETREGS. It appears that when a process tries to load a bogus value into a segment register, it segfaults (as expected) and the value is actually loaded and is seen by PTRACE_GETREGS (not expected). This case is fixed by forcing a fatal SIGSEGV on the process so that it immediately dies. fatal_sigsegv was added for this purpose. It was declared as noreturn, so in order to pursuade gcc that it actually does not return, I added a call to os_dump_core (and declared it noreturn) so that I get a core file if somehow the process survives. All other calls in arch/um/os-Linux/skas/process.c got the same treatment, with failures causing the process to die instead of a kernel panic, with some exceptions. userspace_tramp exits with status 1 if anything goes wrong there. That will cause start_userspace to return an error. copy_context_skas0 and map_stub_pages also now return errors instead of panicing. Callers of thes functions were changed to check for errors and do something appropriate. Usually that's to return an error to their callers. check_skas3_ptrace_faultinfo just exits since that's too early to do anything else. save_registers, restore_registers, and init_registers now return status instead of panicing on failure, with their callers doing something appropriate. There were also duplicate declarations of save_registers and restore_registers in os.h - these are gone. I noticed and fixed up some whitespace damage. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 06:30:58 +00:00
void fatal_sigsegv(void)
{
force_fatal_sig(SIGSEGV);
do_signal(&current->thread.regs);
uml: kill processes instead of panicing kernel UML was panicing in the case of failures of libc calls which shouldn't happen. This is an overreaction since a failure from libc doesn't normally mean that kernel data structures are in an unknown state. Instead, the current process should just be killed if there is no way to recover. The case that prompted this was a failure of PTRACE_SETREGS restoring the same state that was read by PTRACE_GETREGS. It appears that when a process tries to load a bogus value into a segment register, it segfaults (as expected) and the value is actually loaded and is seen by PTRACE_GETREGS (not expected). This case is fixed by forcing a fatal SIGSEGV on the process so that it immediately dies. fatal_sigsegv was added for this purpose. It was declared as noreturn, so in order to pursuade gcc that it actually does not return, I added a call to os_dump_core (and declared it noreturn) so that I get a core file if somehow the process survives. All other calls in arch/um/os-Linux/skas/process.c got the same treatment, with failures causing the process to die instead of a kernel panic, with some exceptions. userspace_tramp exits with status 1 if anything goes wrong there. That will cause start_userspace to return an error. copy_context_skas0 and map_stub_pages also now return errors instead of panicing. Callers of thes functions were changed to check for errors and do something appropriate. Usually that's to return an error to their callers. check_skas3_ptrace_faultinfo just exits since that's too early to do anything else. save_registers, restore_registers, and init_registers now return status instead of panicing on failure, with their callers doing something appropriate. There were also duplicate declarations of save_registers and restore_registers in os.h - these are gone. I noticed and fixed up some whitespace damage. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 06:30:58 +00:00
/*
* This is to tell gcc that we're not returning - do_signal
* can, in general, return, but in this case, it's not, since
* we just got a fatal SIGSEGV queued.
*/
os_dump_core();
}
/**
* segv_handler() - the SIGSEGV handler
* @sig: the signal number
* @unused_si: the signal info struct; unused in this handler
* @regs: the ptrace register information
*
* The handler first extracts the faultinfo from the UML ptrace regs struct.
* If the userfault did not happen in an UML userspace process, bad_segv is called.
* Otherwise the signal did happen in a cloned userspace process, handle it.
*/
void segv_handler(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
{
struct faultinfo * fi = UPT_FAULTINFO(regs);
if (UPT_IS_USER(regs) && !SEGV_IS_FIXABLE(fi)) {
show_segv_info(regs);
bad_segv(*fi, UPT_IP(regs));
return;
}
segv(*fi, UPT_IP(regs), UPT_IS_USER(regs), regs);
}
[PATCH] uml: S390 preparation, abstract host page fault data This patch removes the arch-specific fault/trap-infos from thread and skas-regs. It adds a new struct faultinfo, that is arch-specific defined in sysdep/faultinfo.h. The structure is inserted in thread.arch and thread.regs.skas and thread.regs.tt Now, segv and other trap-handlers can copy the contents from regs.X.faultinfo to thread.arch.faultinfo with one simple assignment. Also, the number of macros necessary is reduced to FAULT_ADDRESS(struct faultinfo) extracts the faulting address from faultinfo FAULT_WRITE(struct faultinfo) extracts the "is_write" flag SEGV_IS_FIXABLE(struct faultinfo) is true for the fixable segvs, i.e. (TRAP == 14) on i386 UPT_FAULTINFO(regs) result is (struct faultinfo *) to the faultinfo in regs->skas.faultinfo GET_FAULTINFO_FROM_SC(struct faultinfo, struct sigcontext *) copies the relevant parts of the sigcontext to struct faultinfo. On SIGSEGV, call user_signal() instead of handle_segv(), if the architecture provides the information needed in PTRACE_FAULTINFO, or if PTRACE_FAULTINFO is missing, because segv-stub will provide the info. The benefit of the change is, that in case of a non-fixable SIGSEGV, we can give user processes a SIGSEGV, instead of possibly looping on pagefault handling. Since handle_segv() sikked arch_fixup() implicitly by passing ip==0 to segv(), I changed segv() to call arch_fixup() only, if !is_user. Signed-off-by: Bodo Stroesser <bstroesser@fujitsu-siemens.com> Signed-off-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-05-05 23:15:31 +00:00
/*
* We give a *copy* of the faultinfo in the regs to segv.
* This must be done, since nesting SEGVs could overwrite
* the info in the regs. A pointer to the info then would
* give us bad data!
*/
unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
struct uml_pt_regs *regs)
{
jmp_buf *catcher;
int si_code;
int err;
int is_write = FAULT_WRITE(fi);
unsigned long address = FAULT_ADDRESS(fi);
if (!is_user && regs)
current->thread.segv_regs = container_of(regs, struct pt_regs, regs);
if (!is_user && init_mm.context.sync_tlb_range_to) {
/*
* Kernel has pending updates from set_ptes that were not
* flushed yet. Syncing them should fix the pagefault (if not
* we'll get here again and panic).
*/
err = um_tlb_sync(&init_mm);
if (err == -ENOMEM)
report_enomem();
if (err)
panic("Failed to sync kernel TLBs: %d", err);
goto out;
}
else if (current->mm == NULL) {
show_regs(container_of(regs, struct pt_regs, regs));
panic("Segfault with no mm");
}
else if (!is_user && address > PAGE_SIZE && address < TASK_SIZE) {
show_regs(container_of(regs, struct pt_regs, regs));
panic("Kernel tried to access user memory at addr 0x%lx, ip 0x%lx",
address, ip);
}
if (SEGV_IS_FIXABLE(&fi))
err = handle_page_fault(address, ip, is_write, is_user,
&si_code);
else {
err = -EFAULT;
/*
* A thread accessed NULL, we get a fault, but CR2 is invalid.
* This code is used in __do_copy_from_user() of TT mode.
* XXX tt mode is gone, so maybe this isn't needed any more
*/
address = 0;
}
catcher = current->thread.fault_catcher;
if (!err)
goto out;
else if (catcher != NULL) {
current->thread.fault_addr = (void *) address;
UML_LONGJMP(catcher, 1);
}
else if (current->thread.fault_addr != NULL)
panic("fault_addr set but no fault catcher");
else if (!is_user && arch_fixup(ip, regs))
goto out;
if (!is_user) {
show_regs(container_of(regs, struct pt_regs, regs));
panic("Kernel mode fault at addr 0x%lx, ip 0x%lx",
address, ip);
}
show_segv_info(regs);
if (err == -EACCES) {
current->thread.arch.faultinfo = fi;
force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
} else {
BUG_ON(err != -EFAULT);
current->thread.arch.faultinfo = fi;
force_sig_fault(SIGSEGV, si_code, (void __user *) address);
}
out:
if (regs)
current->thread.segv_regs = NULL;
return 0;
}
void relay_signal(int sig, struct siginfo *si, struct uml_pt_regs *regs)
{
signal/um: More carefully relay signals in relay_signal. There is a bug in relay signal. It assumes that when a signal is relayed the signal never uses a signal independent si_code, such as SI_USER, SI_KERNEL, SI_QUEUE, ... SI_SIGIO etc. In practice siginfo was assuming it was relaying a signal with the SIL_FAULT layout. As that is the common cases for the signals it supported that is a reasonable assumption. Further user mode linux must be very careful when relaying different kinds of signals to prevent an information leak. This means simply increasing the kinds of signals that are handled in relay_signal is non-trivial. Therefore use siginfo_layout and force_sig_fault to simplify the signal relaying in relay_signal. By taking advantage of the fact that user mode linux only works on x86 and x86_64 we can assume that si_trapno can be ignored, and that si_errno is always zero. For the signals SIGLL, SIGFPE, SIGSEGV, SIGBUS, and SIGTRAP the only fault handler I know of that sets si_errno is SIGTRAP TRAP_HWBKPT on a few oddball architectures. Those architectures have been modified to use force_sig_ptrace_errno_trap. Similarly only a few architectures set __ARCH_SI_TRAPNO. At the point uml supports those architectures again these additional cases can be examined and supported if desired in relay_signal. Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@kot-begemot.co.uk> Cc: Martin Pärtel <martin.partel@gmail.com> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: linux-um@lists.infradead.org Fixes: d3c1cfcdb43e ("um: pass siginfo to guest process") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-16 21:12:31 +00:00
int code, err;
if (!UPT_IS_USER(regs)) {
if (sig == SIGBUS)
printk(KERN_ERR "Bus error - the host /dev/shm or /tmp "
"mount likely just ran out of space\n");
panic("Kernel mode signal %d", sig);
}
arch_examine_signal(sig, regs);
signal/um: More carefully relay signals in relay_signal. There is a bug in relay signal. It assumes that when a signal is relayed the signal never uses a signal independent si_code, such as SI_USER, SI_KERNEL, SI_QUEUE, ... SI_SIGIO etc. In practice siginfo was assuming it was relaying a signal with the SIL_FAULT layout. As that is the common cases for the signals it supported that is a reasonable assumption. Further user mode linux must be very careful when relaying different kinds of signals to prevent an information leak. This means simply increasing the kinds of signals that are handled in relay_signal is non-trivial. Therefore use siginfo_layout and force_sig_fault to simplify the signal relaying in relay_signal. By taking advantage of the fact that user mode linux only works on x86 and x86_64 we can assume that si_trapno can be ignored, and that si_errno is always zero. For the signals SIGLL, SIGFPE, SIGSEGV, SIGBUS, and SIGTRAP the only fault handler I know of that sets si_errno is SIGTRAP TRAP_HWBKPT on a few oddball architectures. Those architectures have been modified to use force_sig_ptrace_errno_trap. Similarly only a few architectures set __ARCH_SI_TRAPNO. At the point uml supports those architectures again these additional cases can be examined and supported if desired in relay_signal. Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@kot-begemot.co.uk> Cc: Martin Pärtel <martin.partel@gmail.com> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: linux-um@lists.infradead.org Fixes: d3c1cfcdb43e ("um: pass siginfo to guest process") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-16 21:12:31 +00:00
/* Is the signal layout for the signal known?
* Signal data must be scrubbed to prevent information leaks.
*/
code = si->si_code;
err = si->si_errno;
if ((err == 0) && (siginfo_layout(sig, code) == SIL_FAULT)) {
struct faultinfo *fi = UPT_FAULTINFO(regs);
current->thread.arch.faultinfo = *fi;
force_sig_fault(sig, code, (void __user *)FAULT_ADDRESS(*fi));
signal/um: More carefully relay signals in relay_signal. There is a bug in relay signal. It assumes that when a signal is relayed the signal never uses a signal independent si_code, such as SI_USER, SI_KERNEL, SI_QUEUE, ... SI_SIGIO etc. In practice siginfo was assuming it was relaying a signal with the SIL_FAULT layout. As that is the common cases for the signals it supported that is a reasonable assumption. Further user mode linux must be very careful when relaying different kinds of signals to prevent an information leak. This means simply increasing the kinds of signals that are handled in relay_signal is non-trivial. Therefore use siginfo_layout and force_sig_fault to simplify the signal relaying in relay_signal. By taking advantage of the fact that user mode linux only works on x86 and x86_64 we can assume that si_trapno can be ignored, and that si_errno is always zero. For the signals SIGLL, SIGFPE, SIGSEGV, SIGBUS, and SIGTRAP the only fault handler I know of that sets si_errno is SIGTRAP TRAP_HWBKPT on a few oddball architectures. Those architectures have been modified to use force_sig_ptrace_errno_trap. Similarly only a few architectures set __ARCH_SI_TRAPNO. At the point uml supports those architectures again these additional cases can be examined and supported if desired in relay_signal. Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Anton Ivanov <anton.ivanov@kot-begemot.co.uk> Cc: Martin Pärtel <martin.partel@gmail.com> Cc: user-mode-linux-devel@lists.sourceforge.net Cc: linux-um@lists.infradead.org Fixes: d3c1cfcdb43e ("um: pass siginfo to guest process") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-16 21:12:31 +00:00
} else {
printk(KERN_ERR "Attempted to relay unknown signal %d (si_code = %d) with errno %d\n",
sig, code, err);
force_sig(sig);
}
}
void bus_handler(int sig, struct siginfo *si, struct uml_pt_regs *regs)
{
if (current->thread.fault_catcher != NULL)
UML_LONGJMP(current->thread.fault_catcher, 1);
else
relay_signal(sig, si, regs);
}
void winch(int sig, struct siginfo *unused_si, struct uml_pt_regs *regs)
{
do_IRQ(WINCH_IRQ, regs);
}