linux/arch/mips/include/asm/processor.h

435 lines
12 KiB
C
Raw Normal View History

/*
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file "COPYING" in the main directory of this archive
* for more details.
*
* Copyright (C) 1994 Waldorf GMBH
* Copyright (C) 1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003 Ralf Baechle
* Copyright (C) 1996 Paul M. Antoine
* Copyright (C) 1999, 2000 Silicon Graphics, Inc.
*/
#ifndef _ASM_PROCESSOR_H
#define _ASM_PROCESSOR_H
MIPS: Use per-mm page to execute branch delay slot instructions In some cases the kernel needs to execute an instruction from the delay slot of an emulated branch instruction. These cases include: - Emulated floating point branch instructions (bc1[ft]l?) for systems which don't include an FPU, or upon which the kernel is run with the "nofpu" parameter. - MIPSr6 systems running binaries targeting older revisions of the architecture, which may include branch instructions whose encodings are no longer valid in MIPSr6. Executing instructions from such delay slots is done by writing the instruction to memory followed by a trap, as part of an "emuframe", and executing it. This avoids the requirement of an emulator for the entire MIPS instruction set. Prior to this patch such emuframes are written to the user stack and executed from there. This patch moves FP branch delay emuframes off of the user stack and into a per-mm page. Allocating a page per-mm leaves userland with access to only what it had access to previously, and compared to other solutions is relatively simple. When a thread requires a delay slot emulation, it is allocated a frame. A thread may only have one frame allocated at any one time, since it may only ever be executing one instruction at any one time. In order to ensure that we can free up allocated frame later, its index is recorded in struct thread_struct. In the typical case, after executing the delay slot instruction we'll execute a break instruction with the BRK_MEMU code. This traps back to the kernel & leads to a call to do_dsemulret which frees the allocated frame & moves the user PC back to the instruction that would have executed following the emulated branch. In some cases the delay slot instruction may be invalid, such as a branch, or may trigger an exception. In these cases the BRK_MEMU break instruction will not be hit. In order to ensure that frames are freed this patch introduces dsemul_thread_cleanup() and calls it to free any allocated frame upon thread exit. If the instruction generated an exception & leads to a signal being delivered to the thread, or indeed if a signal simply happens to be delivered to the thread whilst it is executing from the struct emuframe, then we need to take care to exit the frame appropriately. This is done by either rolling back the user PC to the branch or advancing it to the continuation PC prior to signal delivery, using dsemul_thread_rollback(). If this were not done then a sigreturn would return to the struct emuframe, and if that frame had meanwhile been used in response to an emulated branch instruction within the signal handler then we would execute the wrong user code. Whilst a user could theoretically place something like a compact branch to self in a delay slot and cause their thread to become stuck in an infinite loop with the frame never being deallocated, this would: - Only affect the users single process. - Be architecturally invalid since there would be a branch in the delay slot, which is forbidden. - Be extremely unlikely to happen by mistake, and provide a program with no more ability to harm the system than a simple infinite loop would. If a thread requires a delay slot emulation & no frame is available to it (ie. the process has enough other threads that all frames are currently in use) then the thread joins a waitqueue. It will sleep until a frame is freed by another thread in the process. Since we now know whether a thread has an allocated frame due to our tracking of its index, the cookie field of struct emuframe is removed as we can be more certain whether we have a valid frame. Since a thread may only ever have a single frame at any given time, the epc field of struct emuframe is also removed & the PC to continue from is instead stored in struct thread_struct. Together these changes simplify & shrink struct emuframe somewhat, allowing twice as many frames to fit into the page allocated for them. The primary benefit of this patch is that we are now free to mark the user stack non-executable where that is possible. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: Maciej Rozycki <maciej.rozycki@imgtec.com> Cc: Faraz Shahbazker <faraz.shahbazker@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: Matthew Fortune <matthew.fortune@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13764/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-08 10:06:19 +00:00
#include <linux/atomic.h>
#include <linux/cpumask.h>
MIPS: VDSO: Always map near top of user memory When using the legacy mmap layout, for example triggered using ulimit -s unlimited, get_unmapped_area() fills memory from bottom to top starting from a fairly low address near TASK_UNMAPPED_BASE. This placement is suboptimal if the user application wishes to allocate large amounts of heap memory using the brk syscall. With the VDSO being located low in the user's virtual address space, the amount of space available for access using brk is limited much more than it was prior to the introduction of the VDSO. For example: # ulimit -s unlimited; cat /proc/self/maps 00400000-004ec000 r-xp 00000000 08:00 71436 /usr/bin/coreutils 004fc000-004fd000 rwxp 000ec000 08:00 71436 /usr/bin/coreutils 004fd000-0050f000 rwxp 00000000 00:00 0 00cc3000-00ce4000 rwxp 00000000 00:00 0 [heap] 2ab96000-2ab98000 r--p 00000000 00:00 0 [vvar] 2ab98000-2ab99000 r-xp 00000000 00:00 0 [vdso] 2ab99000-2ab9d000 rwxp 00000000 00:00 0 ... Resolve this by adjusting STACK_TOP to reserve space for the VDSO & providing an address hint to get_unmapped_area() causing it to use this space even when using the legacy mmap layout. We reserve enough space for the VDSO, plus 1MB or 256MB for 32 bit & 64 bit systems respectively within which we randomize the VDSO base address. Previously this randomization was taken care of by the mmap base address randomization performed by arch_mmap_rnd(). The 1MB & 256MB sizes are somewhat arbitrary but chosen such that we have some randomization without taking up too much of the user's virtual address space, which is often in short supply for 32 bit systems. With this the VDSO is always mapped at a high address, leaving lots of space for statically linked programs to make use of brk: # ulimit -s unlimited; cat /proc/self/maps 00400000-004ec000 r-xp 00000000 08:00 71436 /usr/bin/coreutils 004fc000-004fd000 rwxp 000ec000 08:00 71436 /usr/bin/coreutils 004fd000-0050f000 rwxp 00000000 00:00 0 00c28000-00c49000 rwxp 00000000 00:00 0 [heap] ... 7f67c000-7f69d000 rwxp 00000000 00:00 0 [stack] 7f7fc000-7f7fd000 rwxp 00000000 00:00 0 7fcf1000-7fcf3000 r--p 00000000 00:00 0 [vvar] 7fcf3000-7fcf4000 r-xp 00000000 00:00 0 [vdso] Signed-off-by: Paul Burton <paul.burton@mips.com> Reported-by: Huacai Chen <chenhc@lemote.com> Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.4+
2018-09-25 22:51:26 +00:00
#include <linux/sizes.h>
#include <linux/threads.h>
#include <asm/cachectl.h>
#include <asm/cpu.h>
#include <asm/cpu-info.h>
MIPS: Use per-mm page to execute branch delay slot instructions In some cases the kernel needs to execute an instruction from the delay slot of an emulated branch instruction. These cases include: - Emulated floating point branch instructions (bc1[ft]l?) for systems which don't include an FPU, or upon which the kernel is run with the "nofpu" parameter. - MIPSr6 systems running binaries targeting older revisions of the architecture, which may include branch instructions whose encodings are no longer valid in MIPSr6. Executing instructions from such delay slots is done by writing the instruction to memory followed by a trap, as part of an "emuframe", and executing it. This avoids the requirement of an emulator for the entire MIPS instruction set. Prior to this patch such emuframes are written to the user stack and executed from there. This patch moves FP branch delay emuframes off of the user stack and into a per-mm page. Allocating a page per-mm leaves userland with access to only what it had access to previously, and compared to other solutions is relatively simple. When a thread requires a delay slot emulation, it is allocated a frame. A thread may only have one frame allocated at any one time, since it may only ever be executing one instruction at any one time. In order to ensure that we can free up allocated frame later, its index is recorded in struct thread_struct. In the typical case, after executing the delay slot instruction we'll execute a break instruction with the BRK_MEMU code. This traps back to the kernel & leads to a call to do_dsemulret which frees the allocated frame & moves the user PC back to the instruction that would have executed following the emulated branch. In some cases the delay slot instruction may be invalid, such as a branch, or may trigger an exception. In these cases the BRK_MEMU break instruction will not be hit. In order to ensure that frames are freed this patch introduces dsemul_thread_cleanup() and calls it to free any allocated frame upon thread exit. If the instruction generated an exception & leads to a signal being delivered to the thread, or indeed if a signal simply happens to be delivered to the thread whilst it is executing from the struct emuframe, then we need to take care to exit the frame appropriately. This is done by either rolling back the user PC to the branch or advancing it to the continuation PC prior to signal delivery, using dsemul_thread_rollback(). If this were not done then a sigreturn would return to the struct emuframe, and if that frame had meanwhile been used in response to an emulated branch instruction within the signal handler then we would execute the wrong user code. Whilst a user could theoretically place something like a compact branch to self in a delay slot and cause their thread to become stuck in an infinite loop with the frame never being deallocated, this would: - Only affect the users single process. - Be architecturally invalid since there would be a branch in the delay slot, which is forbidden. - Be extremely unlikely to happen by mistake, and provide a program with no more ability to harm the system than a simple infinite loop would. If a thread requires a delay slot emulation & no frame is available to it (ie. the process has enough other threads that all frames are currently in use) then the thread joins a waitqueue. It will sleep until a frame is freed by another thread in the process. Since we now know whether a thread has an allocated frame due to our tracking of its index, the cookie field of struct emuframe is removed as we can be more certain whether we have a valid frame. Since a thread may only ever have a single frame at any given time, the epc field of struct emuframe is also removed & the PC to continue from is instead stored in struct thread_struct. Together these changes simplify & shrink struct emuframe somewhat, allowing twice as many frames to fit into the page allocated for them. The primary benefit of this patch is that we are now free to mark the user stack non-executable where that is possible. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: Maciej Rozycki <maciej.rozycki@imgtec.com> Cc: Faraz Shahbazker <faraz.shahbazker@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: Matthew Fortune <matthew.fortune@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13764/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-08 10:06:19 +00:00
#include <asm/dsemul.h>
#include <asm/mipsregs.h>
#include <asm/prefetch.h>
/*
* System setup and hardware flags..
*/
extern unsigned int vced_count, vcei_count;
/*
* MIPS does have an arch_pick_mmap_layout()
*/
#define HAVE_ARCH_PICK_MMAP_LAYOUT 1
#ifdef CONFIG_32BIT
#ifdef CONFIG_KVM_GUEST
/* User space process size is limited to 1GB in KVM Guest Mode */
#define TASK_SIZE 0x3fff8000UL
#else
/*
* User space process size: 2GB. This is hardcoded into a few places,
* so don't change it unless you know what you are doing.
*/
#define TASK_SIZE 0x80000000UL
#endif
#define STACK_TOP_MAX TASK_SIZE
#define TASK_IS_32BIT_ADDR 1
#endif
#ifdef CONFIG_64BIT
/*
* User space process size: 1TB. This is hardcoded into a few places,
* so don't change it unless you know what you are doing. TASK_SIZE
* is limited to 1TB by the R4000 architecture; R10000 and better can
* support 16TB; the architectural reserve for future expansion is
* 8192EB ...
*/
#define TASK_SIZE32 0x7fff8000UL
#ifdef CONFIG_MIPS_VA_BITS_48
#define TASK_SIZE64 (0x1UL << ((cpu_data[0].vmbits>48)?48:cpu_data[0].vmbits))
#else
#define TASK_SIZE64 0x10000000000UL
#endif
#define TASK_SIZE (test_thread_flag(TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE64)
#define STACK_TOP_MAX TASK_SIZE64
#define TASK_SIZE_OF(tsk) \
(test_tsk_thread_flag(tsk, TIF_32BIT_ADDR) ? TASK_SIZE32 : TASK_SIZE64)
#define TASK_IS_32BIT_ADDR test_thread_flag(TIF_32BIT_ADDR)
#endif
#define VDSO_RANDOMIZE_SIZE (TASK_IS_32BIT_ADDR ? SZ_1M : SZ_64M)
MIPS: VDSO: Always map near top of user memory When using the legacy mmap layout, for example triggered using ulimit -s unlimited, get_unmapped_area() fills memory from bottom to top starting from a fairly low address near TASK_UNMAPPED_BASE. This placement is suboptimal if the user application wishes to allocate large amounts of heap memory using the brk syscall. With the VDSO being located low in the user's virtual address space, the amount of space available for access using brk is limited much more than it was prior to the introduction of the VDSO. For example: # ulimit -s unlimited; cat /proc/self/maps 00400000-004ec000 r-xp 00000000 08:00 71436 /usr/bin/coreutils 004fc000-004fd000 rwxp 000ec000 08:00 71436 /usr/bin/coreutils 004fd000-0050f000 rwxp 00000000 00:00 0 00cc3000-00ce4000 rwxp 00000000 00:00 0 [heap] 2ab96000-2ab98000 r--p 00000000 00:00 0 [vvar] 2ab98000-2ab99000 r-xp 00000000 00:00 0 [vdso] 2ab99000-2ab9d000 rwxp 00000000 00:00 0 ... Resolve this by adjusting STACK_TOP to reserve space for the VDSO & providing an address hint to get_unmapped_area() causing it to use this space even when using the legacy mmap layout. We reserve enough space for the VDSO, plus 1MB or 256MB for 32 bit & 64 bit systems respectively within which we randomize the VDSO base address. Previously this randomization was taken care of by the mmap base address randomization performed by arch_mmap_rnd(). The 1MB & 256MB sizes are somewhat arbitrary but chosen such that we have some randomization without taking up too much of the user's virtual address space, which is often in short supply for 32 bit systems. With this the VDSO is always mapped at a high address, leaving lots of space for statically linked programs to make use of brk: # ulimit -s unlimited; cat /proc/self/maps 00400000-004ec000 r-xp 00000000 08:00 71436 /usr/bin/coreutils 004fc000-004fd000 rwxp 000ec000 08:00 71436 /usr/bin/coreutils 004fd000-0050f000 rwxp 00000000 00:00 0 00c28000-00c49000 rwxp 00000000 00:00 0 [heap] ... 7f67c000-7f69d000 rwxp 00000000 00:00 0 [stack] 7f7fc000-7f7fd000 rwxp 00000000 00:00 0 7fcf1000-7fcf3000 r--p 00000000 00:00 0 [vvar] 7fcf3000-7fcf4000 r-xp 00000000 00:00 0 [vdso] Signed-off-by: Paul Burton <paul.burton@mips.com> Reported-by: Huacai Chen <chenhc@lemote.com> Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Cc: Huacai Chen <chenhc@lemote.com> Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org # v4.4+
2018-09-25 22:51:26 +00:00
extern unsigned long mips_stack_top(void);
#define STACK_TOP mips_stack_top()
/*
* This decides where the kernel will search for a free chunk of vm
* space during mmap's.
*/
#define TASK_UNMAPPED_BASE PAGE_ALIGN(TASK_SIZE / 3)
#define NUM_FPU_REGS 32
#ifdef CONFIG_CPU_HAS_MSA
# define FPU_REG_WIDTH 128
#else
# define FPU_REG_WIDTH 64
#endif
union fpureg {
__u32 val32[FPU_REG_WIDTH / 32];
__u64 val64[FPU_REG_WIDTH / 64];
};
#ifdef CONFIG_CPU_LITTLE_ENDIAN
# define FPR_IDX(width, idx) (idx)
#else
# define FPR_IDX(width, idx) ((idx) ^ ((64 / (width)) - 1))
#endif
#define BUILD_FPR_ACCESS(width) \
static inline u##width get_fpr##width(union fpureg *fpr, unsigned idx) \
{ \
return fpr->val##width[FPR_IDX(width, idx)]; \
} \
\
static inline void set_fpr##width(union fpureg *fpr, unsigned idx, \
u##width val) \
{ \
fpr->val##width[FPR_IDX(width, idx)] = val; \
}
BUILD_FPR_ACCESS(32)
BUILD_FPR_ACCESS(64)
/*
* It would be nice to add some more fields for emulator statistics,
* the additional information is private to the FPU emulator for now.
* See arch/mips/include/asm/fpu_emulator.h.
*/
struct mips_fpu_struct {
union fpureg fpr[NUM_FPU_REGS];
unsigned int fcr31;
unsigned int msacsr;
};
#define NUM_DSP_REGS 6
typedef unsigned long dspreg_t;
struct mips_dsp_state {
dspreg_t dspr[NUM_DSP_REGS];
unsigned int dspcontrol;
};
#define INIT_CPUMASK { \
{0,} \
}
struct mips3264_watch_reg_state {
/* The width of watchlo is 32 in a 32 bit kernel and 64 in a
64 bit kernel. We use unsigned long as it has the same
property. */
unsigned long watchlo[NUM_WATCH_REGS];
/* Only the mask and IRW bits from watchhi. */
u16 watchhi[NUM_WATCH_REGS];
};
union mips_watch_reg_state {
struct mips3264_watch_reg_state mips3264;
};
#if defined(CONFIG_CPU_CAVIUM_OCTEON)
struct octeon_cop2_state {
/* DMFC2 rt, 0x0201 */
unsigned long cop2_crc_iv;
/* DMFC2 rt, 0x0202 (Set with DMTC2 rt, 0x1202) */
unsigned long cop2_crc_length;
/* DMFC2 rt, 0x0200 (set with DMTC2 rt, 0x4200) */
unsigned long cop2_crc_poly;
/* DMFC2 rt, 0x0402; DMFC2 rt, 0x040A */
unsigned long cop2_llm_dat[2];
/* DMFC2 rt, 0x0084 */
unsigned long cop2_3des_iv;
/* DMFC2 rt, 0x0080; DMFC2 rt, 0x0081; DMFC2 rt, 0x0082 */
unsigned long cop2_3des_key[3];
/* DMFC2 rt, 0x0088 (Set with DMTC2 rt, 0x0098) */
unsigned long cop2_3des_result;
/* DMFC2 rt, 0x0111 (FIXME: Read Pass1 Errata) */
unsigned long cop2_aes_inp0;
/* DMFC2 rt, 0x0102; DMFC2 rt, 0x0103 */
unsigned long cop2_aes_iv[2];
/* DMFC2 rt, 0x0104; DMFC2 rt, 0x0105; DMFC2 rt, 0x0106; DMFC2
* rt, 0x0107 */
unsigned long cop2_aes_key[4];
/* DMFC2 rt, 0x0110 */
unsigned long cop2_aes_keylen;
/* DMFC2 rt, 0x0100; DMFC2 rt, 0x0101 */
unsigned long cop2_aes_result[2];
/* DMFC2 rt, 0x0240; DMFC2 rt, 0x0241; DMFC2 rt, 0x0242; DMFC2
* rt, 0x0243; DMFC2 rt, 0x0244; DMFC2 rt, 0x0245; DMFC2 rt,
* 0x0246; DMFC2 rt, 0x0247; DMFC2 rt, 0x0248; DMFC2 rt,
* 0x0249; DMFC2 rt, 0x024A; DMFC2 rt, 0x024B; DMFC2 rt,
* 0x024C; DMFC2 rt, 0x024D; DMFC2 rt, 0x024E - Pass2 */
unsigned long cop2_hsh_datw[15];
/* DMFC2 rt, 0x0250; DMFC2 rt, 0x0251; DMFC2 rt, 0x0252; DMFC2
* rt, 0x0253; DMFC2 rt, 0x0254; DMFC2 rt, 0x0255; DMFC2 rt,
* 0x0256; DMFC2 rt, 0x0257 - Pass2 */
unsigned long cop2_hsh_ivw[8];
/* DMFC2 rt, 0x0258; DMFC2 rt, 0x0259 - Pass2 */
unsigned long cop2_gfm_mult[2];
/* DMFC2 rt, 0x025E - Pass2 */
unsigned long cop2_gfm_poly;
/* DMFC2 rt, 0x025A; DMFC2 rt, 0x025B - Pass2 */
unsigned long cop2_gfm_result[2];
/* DMFC2 rt, 0x24F, DMFC2 rt, 0x50, OCTEON III */
unsigned long cop2_sha3[2];
};
#define COP2_INIT \
.cp2 = {0,},
struct octeon_cvmseg_state {
unsigned long cvmseg[CONFIG_CAVIUM_OCTEON_CVMSEG_SIZE]
[cpu_dcache_line_size() / sizeof(unsigned long)];
};
#elif defined(CONFIG_CPU_XLP)
struct nlm_cop2_state {
u64 rx[4];
u64 tx[4];
u32 tx_msg_status;
u32 rx_msg_status;
};
#define COP2_INIT \
.cp2 = {{0}, {0}, 0, 0},
#else
#define COP2_INIT
#endif
typedef struct {
unsigned long seg;
} mm_segment_t;
#ifdef CONFIG_CPU_HAS_MSA
# define ARCH_MIN_TASKALIGN 16
# define FPU_ALIGN __aligned(16)
#else
# define ARCH_MIN_TASKALIGN 8
# define FPU_ALIGN
#endif
struct mips_abi;
/*
* If you change thread_struct remember to change the #defines below too!
*/
struct thread_struct {
/* Saved main processor registers. */
unsigned long reg16;
unsigned long reg17, reg18, reg19, reg20, reg21, reg22, reg23;
unsigned long reg29, reg30, reg31;
/* Saved cp0 stuff. */
unsigned long cp0_status;
/* Saved fpu/fpu emulator stuff. */
struct mips_fpu_struct fpu FPU_ALIGN;
MIPS: Use per-mm page to execute branch delay slot instructions In some cases the kernel needs to execute an instruction from the delay slot of an emulated branch instruction. These cases include: - Emulated floating point branch instructions (bc1[ft]l?) for systems which don't include an FPU, or upon which the kernel is run with the "nofpu" parameter. - MIPSr6 systems running binaries targeting older revisions of the architecture, which may include branch instructions whose encodings are no longer valid in MIPSr6. Executing instructions from such delay slots is done by writing the instruction to memory followed by a trap, as part of an "emuframe", and executing it. This avoids the requirement of an emulator for the entire MIPS instruction set. Prior to this patch such emuframes are written to the user stack and executed from there. This patch moves FP branch delay emuframes off of the user stack and into a per-mm page. Allocating a page per-mm leaves userland with access to only what it had access to previously, and compared to other solutions is relatively simple. When a thread requires a delay slot emulation, it is allocated a frame. A thread may only have one frame allocated at any one time, since it may only ever be executing one instruction at any one time. In order to ensure that we can free up allocated frame later, its index is recorded in struct thread_struct. In the typical case, after executing the delay slot instruction we'll execute a break instruction with the BRK_MEMU code. This traps back to the kernel & leads to a call to do_dsemulret which frees the allocated frame & moves the user PC back to the instruction that would have executed following the emulated branch. In some cases the delay slot instruction may be invalid, such as a branch, or may trigger an exception. In these cases the BRK_MEMU break instruction will not be hit. In order to ensure that frames are freed this patch introduces dsemul_thread_cleanup() and calls it to free any allocated frame upon thread exit. If the instruction generated an exception & leads to a signal being delivered to the thread, or indeed if a signal simply happens to be delivered to the thread whilst it is executing from the struct emuframe, then we need to take care to exit the frame appropriately. This is done by either rolling back the user PC to the branch or advancing it to the continuation PC prior to signal delivery, using dsemul_thread_rollback(). If this were not done then a sigreturn would return to the struct emuframe, and if that frame had meanwhile been used in response to an emulated branch instruction within the signal handler then we would execute the wrong user code. Whilst a user could theoretically place something like a compact branch to self in a delay slot and cause their thread to become stuck in an infinite loop with the frame never being deallocated, this would: - Only affect the users single process. - Be architecturally invalid since there would be a branch in the delay slot, which is forbidden. - Be extremely unlikely to happen by mistake, and provide a program with no more ability to harm the system than a simple infinite loop would. If a thread requires a delay slot emulation & no frame is available to it (ie. the process has enough other threads that all frames are currently in use) then the thread joins a waitqueue. It will sleep until a frame is freed by another thread in the process. Since we now know whether a thread has an allocated frame due to our tracking of its index, the cookie field of struct emuframe is removed as we can be more certain whether we have a valid frame. Since a thread may only ever have a single frame at any given time, the epc field of struct emuframe is also removed & the PC to continue from is instead stored in struct thread_struct. Together these changes simplify & shrink struct emuframe somewhat, allowing twice as many frames to fit into the page allocated for them. The primary benefit of this patch is that we are now free to mark the user stack non-executable where that is possible. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: Maciej Rozycki <maciej.rozycki@imgtec.com> Cc: Faraz Shahbazker <faraz.shahbazker@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: Matthew Fortune <matthew.fortune@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13764/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-08 10:06:19 +00:00
/* Assigned branch delay slot 'emulation' frame */
atomic_t bd_emu_frame;
/* PC of the branch from a branch delay slot 'emulation' */
unsigned long bd_emu_branch_pc;
/* PC to continue from following a branch delay slot 'emulation' */
unsigned long bd_emu_cont_pc;
#ifdef CONFIG_MIPS_MT_FPAFF
/* Emulated instruction count */
unsigned long emulated_fp;
/* Saved per-thread scheduler affinity mask */
cpumask_t user_cpus_allowed;
#endif /* CONFIG_MIPS_MT_FPAFF */
/* Saved state of the DSP ASE, if available. */
struct mips_dsp_state dsp;
/* Saved watch register state, if available. */
union mips_watch_reg_state watch;
/* Other stuff associated with the thread. */
unsigned long cp0_badvaddr; /* Last user fault */
unsigned long cp0_baduaddr; /* Last kernel fault accessing USEG */
unsigned long error_code;
unsigned long trap_nr;
#ifdef CONFIG_CPU_CAVIUM_OCTEON
struct octeon_cop2_state cp2 __attribute__ ((__aligned__(128)));
struct octeon_cvmseg_state cvmseg __attribute__ ((__aligned__(128)));
#endif
#ifdef CONFIG_CPU_XLP
struct nlm_cop2_state cp2;
#endif
struct mips_abi *abi;
};
#ifdef CONFIG_MIPS_MT_FPAFF
#define FPAFF_INIT \
.emulated_fp = 0, \
.user_cpus_allowed = INIT_CPUMASK,
#else
#define FPAFF_INIT
#endif /* CONFIG_MIPS_MT_FPAFF */
#define INIT_THREAD { \
/* \
* Saved main processor registers \
*/ \
.reg16 = 0, \
.reg17 = 0, \
.reg18 = 0, \
.reg19 = 0, \
.reg20 = 0, \
.reg21 = 0, \
.reg22 = 0, \
.reg23 = 0, \
.reg29 = 0, \
.reg30 = 0, \
.reg31 = 0, \
/* \
* Saved cp0 stuff \
*/ \
.cp0_status = 0, \
/* \
* Saved FPU/FPU emulator stuff \
*/ \
.fpu = { \
.fpr = {{{0,},},}, \
.fcr31 = 0, \
.msacsr = 0, \
}, \
/* \
* FPU affinity state (null if not FPAFF) \
*/ \
FPAFF_INIT \
MIPS: Use per-mm page to execute branch delay slot instructions In some cases the kernel needs to execute an instruction from the delay slot of an emulated branch instruction. These cases include: - Emulated floating point branch instructions (bc1[ft]l?) for systems which don't include an FPU, or upon which the kernel is run with the "nofpu" parameter. - MIPSr6 systems running binaries targeting older revisions of the architecture, which may include branch instructions whose encodings are no longer valid in MIPSr6. Executing instructions from such delay slots is done by writing the instruction to memory followed by a trap, as part of an "emuframe", and executing it. This avoids the requirement of an emulator for the entire MIPS instruction set. Prior to this patch such emuframes are written to the user stack and executed from there. This patch moves FP branch delay emuframes off of the user stack and into a per-mm page. Allocating a page per-mm leaves userland with access to only what it had access to previously, and compared to other solutions is relatively simple. When a thread requires a delay slot emulation, it is allocated a frame. A thread may only have one frame allocated at any one time, since it may only ever be executing one instruction at any one time. In order to ensure that we can free up allocated frame later, its index is recorded in struct thread_struct. In the typical case, after executing the delay slot instruction we'll execute a break instruction with the BRK_MEMU code. This traps back to the kernel & leads to a call to do_dsemulret which frees the allocated frame & moves the user PC back to the instruction that would have executed following the emulated branch. In some cases the delay slot instruction may be invalid, such as a branch, or may trigger an exception. In these cases the BRK_MEMU break instruction will not be hit. In order to ensure that frames are freed this patch introduces dsemul_thread_cleanup() and calls it to free any allocated frame upon thread exit. If the instruction generated an exception & leads to a signal being delivered to the thread, or indeed if a signal simply happens to be delivered to the thread whilst it is executing from the struct emuframe, then we need to take care to exit the frame appropriately. This is done by either rolling back the user PC to the branch or advancing it to the continuation PC prior to signal delivery, using dsemul_thread_rollback(). If this were not done then a sigreturn would return to the struct emuframe, and if that frame had meanwhile been used in response to an emulated branch instruction within the signal handler then we would execute the wrong user code. Whilst a user could theoretically place something like a compact branch to self in a delay slot and cause their thread to become stuck in an infinite loop with the frame never being deallocated, this would: - Only affect the users single process. - Be architecturally invalid since there would be a branch in the delay slot, which is forbidden. - Be extremely unlikely to happen by mistake, and provide a program with no more ability to harm the system than a simple infinite loop would. If a thread requires a delay slot emulation & no frame is available to it (ie. the process has enough other threads that all frames are currently in use) then the thread joins a waitqueue. It will sleep until a frame is freed by another thread in the process. Since we now know whether a thread has an allocated frame due to our tracking of its index, the cookie field of struct emuframe is removed as we can be more certain whether we have a valid frame. Since a thread may only ever have a single frame at any given time, the epc field of struct emuframe is also removed & the PC to continue from is instead stored in struct thread_struct. Together these changes simplify & shrink struct emuframe somewhat, allowing twice as many frames to fit into the page allocated for them. The primary benefit of this patch is that we are now free to mark the user stack non-executable where that is possible. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com> Cc: Maciej Rozycki <maciej.rozycki@imgtec.com> Cc: Faraz Shahbazker <faraz.shahbazker@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: Matthew Fortune <matthew.fortune@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/13764/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-08 10:06:19 +00:00
/* Delay slot emulation */ \
.bd_emu_frame = ATOMIC_INIT(BD_EMUFRAME_NONE), \
.bd_emu_branch_pc = 0, \
.bd_emu_cont_pc = 0, \
/* \
* Saved DSP stuff \
*/ \
.dsp = { \
.dspr = {0, }, \
.dspcontrol = 0, \
}, \
/* \
* saved watch register stuff \
*/ \
.watch = {{{0,},},}, \
/* \
* Other stuff associated with the process \
*/ \
.cp0_badvaddr = 0, \
.cp0_baduaddr = 0, \
.error_code = 0, \
.trap_nr = 0, \
/* \
* Platform specific cop2 registers(null if no COP2) \
*/ \
COP2_INIT \
}
struct task_struct;
/* Free all resources held by a thread. */
#define release_thread(thread) do { } while(0)
/*
* Do necessary setup to start up a newly executed thread.
*/
extern void start_thread(struct pt_regs * regs, unsigned long pc, unsigned long sp);
static inline void flush_thread(void)
{
}
unsigned long get_wchan(struct task_struct *p);
#define __KSTK_TOS(tsk) ((unsigned long)task_stack_page(tsk) + \
THREAD_SIZE - 32 - sizeof(struct pt_regs))
#define task_pt_regs(tsk) ((struct pt_regs *)__KSTK_TOS(tsk))
#define KSTK_EIP(tsk) (task_pt_regs(tsk)->cp0_epc)
#define KSTK_ESP(tsk) (task_pt_regs(tsk)->regs[29])
#define KSTK_STATUS(tsk) (task_pt_regs(tsk)->cp0_status)
MIPS: Change definition of cpu_relax() for Loongson-3 Linux expects that if a CPU modifies a memory location, then that modification will eventually become visible to other CPUs in the system. Loongson 3 CPUs include a Store Fill Buffer (SFB) which sits between a core & its L1 data cache, queueing memory accesses & allowing for faster forwarding of data from pending stores to younger loads from the core. Unfortunately the SFB prioritizes loads such that a continuous stream of loads may cause a pending write to be buffered indefinitely. This is problematic if we end up with 2 CPUs which each perform a store that the other polls for - one or both CPUs may end up with their stores buffered in the SFB, never reaching cache due to the continuous reads from the poll loop. Such a deadlock condition has been observed whilst running qspinlock code. This patch changes the definition of cpu_relax() to smp_mb() for Loongson-3, forcing a flush of the SFB on SMP systems which will cause any pending writes to make it as far as the L1 caches where they will become visible to other CPUs. If the kernel is not compiled for SMP support, this will expand to a barrier() as before. This workaround matches that currently implemented for ARM when CONFIG_ARM_ERRATA_754327=y, which was introduced by commit 534be1d5a2da ("ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore"). Although the workaround is only required when the Loongson 3 SFB functionality is enabled, and we only began explicitly enabling that functionality in v4.7 with commit 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT"), existing or future firmware may enable the SFB which means we may need the workaround backported to earlier kernels too. [paul.burton@mips.com: - Reword commit message & comment. - Limit stable backport to v3.15+ where we support Loongson 3 CPUs.] Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Paul Burton <paul.burton@mips.com> References: 534be1d5a2da ("ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore") References: 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT") Patchwork: https://patchwork.linux-mips.org/patch/19830/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: Huacai Chen <chenhuacai@gmail.com> Cc: stable@vger.kernel.org # v3.15+
2018-07-13 07:37:57 +00:00
#ifdef CONFIG_CPU_LOONGSON3
/*
* Loongson-3's SFB (Store-Fill-Buffer) may buffer writes indefinitely when a
* tight read loop is executed, because reads take priority over writes & the
* hardware (incorrectly) doesn't ensure that writes will eventually occur.
*
* Since spin loops of any kind should have a cpu_relax() in them, force an SFB
* flush from cpu_relax() such that any pending writes will become visible as
* expected.
*/
#define cpu_relax() smp_mb()
#else
#define cpu_relax() barrier()
MIPS: Change definition of cpu_relax() for Loongson-3 Linux expects that if a CPU modifies a memory location, then that modification will eventually become visible to other CPUs in the system. Loongson 3 CPUs include a Store Fill Buffer (SFB) which sits between a core & its L1 data cache, queueing memory accesses & allowing for faster forwarding of data from pending stores to younger loads from the core. Unfortunately the SFB prioritizes loads such that a continuous stream of loads may cause a pending write to be buffered indefinitely. This is problematic if we end up with 2 CPUs which each perform a store that the other polls for - one or both CPUs may end up with their stores buffered in the SFB, never reaching cache due to the continuous reads from the poll loop. Such a deadlock condition has been observed whilst running qspinlock code. This patch changes the definition of cpu_relax() to smp_mb() for Loongson-3, forcing a flush of the SFB on SMP systems which will cause any pending writes to make it as far as the L1 caches where they will become visible to other CPUs. If the kernel is not compiled for SMP support, this will expand to a barrier() as before. This workaround matches that currently implemented for ARM when CONFIG_ARM_ERRATA_754327=y, which was introduced by commit 534be1d5a2da ("ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore"). Although the workaround is only required when the Loongson 3 SFB functionality is enabled, and we only began explicitly enabling that functionality in v4.7 with commit 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT"), existing or future firmware may enable the SFB which means we may need the workaround backported to earlier kernels too. [paul.burton@mips.com: - Reword commit message & comment. - Limit stable backport to v3.15+ where we support Loongson 3 CPUs.] Signed-off-by: Huacai Chen <chenhc@lemote.com> Signed-off-by: Paul Burton <paul.burton@mips.com> References: 534be1d5a2da ("ARM: 6194/1: change definition of cpu_relax() for ARM11MPCore") References: 1e820da3c9af ("MIPS: Loongson-3: Introduce CONFIG_LOONGSON3_ENHANCEMENT") Patchwork: https://patchwork.linux-mips.org/patch/19830/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com> Cc: Huacai Chen <chenhuacai@gmail.com> Cc: stable@vger.kernel.org # v3.15+
2018-07-13 07:37:57 +00:00
#endif
/*
* Return_address is a replacement for __builtin_return_address(count)
* which on certain architectures cannot reasonably be implemented in GCC
* (MIPS, Alpha) or is unusable with -fomit-frame-pointer (i386).
* Note that __builtin_return_address(x>=1) is forbidden because GCC
* aborts compilation on some CPUs. It's simply not possible to unwind
* some CPU's stackframes.
*
* __builtin_return_address works only for non-leaf functions. We avoid the
* overhead of a function call by forcing the compiler to save the return
* address register on the stack.
*/
#define return_address() ({__asm__ __volatile__("":::"$31");__builtin_return_address(0);})
#ifdef CONFIG_CPU_HAS_PREFETCH
#define ARCH_HAS_PREFETCH
#define prefetch(x) __builtin_prefetch((x), 0, 1)
#define ARCH_HAS_PREFETCHW
#define prefetchw(x) __builtin_prefetch((x), 1, 1)
#endif
2015-01-08 12:17:37 +00:00
/*
* Functions & macros implementing the PR_GET_FP_MODE & PR_SET_FP_MODE options
* to the prctl syscall.
*/
extern int mips_get_process_fp_mode(struct task_struct *task);
extern int mips_set_process_fp_mode(struct task_struct *task,
unsigned int value);
#define GET_FP_MODE(task) mips_get_process_fp_mode(task)
#define SET_FP_MODE(task,value) mips_set_process_fp_mode(task, value)
#endif /* _ASM_PROCESSOR_H */