linux/arch/mips/include/asm/barrier.h
Linus Torvalds d9862cfbe2 Here's the main MIPS pull request for v5.1:
- Support for the MIPSr6 MemoryMapID register & Global INValidate TLB
   (GINVT) instructions, allowing for more efficient TLB maintenance when
   running on a CPU such as the I6500 that supports these.
 
 - Enable huge page support for MIPS64r6.
 
 - Optimize post-DMA cache sync by removing that code entirely for kernel
   configurations in which we know it won't be needed.
 
 - The number of pages allocated for interrupt stacks is now calculated
   correctly, where before we would wastefully allocate too much memory
   in some configurations.
 
 - The ath79 platform migrates to devicetree.
 
 - The bcm47xx platform sees fixes for the Buffalo WHR-G54S board.
 
 - The ingenic/jz4740 platform gains support for appended devicetrees.
 
 - The cavium_octeon, lantiq, loongson32 & sgi-ip27 platforms all see
   cleanups as do various pieces of core architecture code.
 -----BEGIN PGP SIGNATURE-----
 
 iIsEABYIADMWIQRgLjeFAZEXQzy86/s+p5+stXUA3QUCXH3BQxUccGF1bC5idXJ0
 b25AbWlwcy5jb20ACgkQPqefrLV1AN1+4wD+Oh4JTfZN/NEOQMlrSkXxjEHqjX3u
 1Y6CiiPCs+q2UnYBANb+ic+ZH5MnvJxxmcvlYI2q3rIh4b8TDriip4KMUTUP
 =Sw9X
 -----END PGP SIGNATURE-----

Merge tag 'mips_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS updates from Paul Burton:

 - Support for the MIPSr6 MemoryMapID register & Global INValidate TLB
   (GINVT) instructions, allowing for more efficient TLB maintenance
   when running on a CPU such as the I6500 that supports these.

 - Enable huge page support for MIPS64r6.

 - Optimize post-DMA cache sync by removing that code entirely for
   kernel configurations in which we know it won't be needed.

 - The number of pages allocated for interrupt stacks is now calculated
   correctly, where before we would wastefully allocate too much memory
   in some configurations.

 - The ath79 platform migrates to devicetree.

 - The bcm47xx platform sees fixes for the Buffalo WHR-G54S board.

 - The ingenic/jz4740 platform gains support for appended devicetrees.

 - The cavium_octeon, lantiq, loongson32 & sgi-ip27 platforms all see
   cleanups as do various pieces of core architecture code.

* tag 'mips_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (66 commits)
  MIPS: lantiq: Remove separate GPHY Firmware loader
  MIPS: ingenic: Add support for appended devicetree
  MIPS: SGI-IP27: rework HUB interrupts
  MIPS: SGI-IP27: do boot CPU init later
  MIPS: SGI-IP27: do xtalk scanning later
  MIPS: SGI-IP27: use pr_info/pr_emerg and pr_cont to fix output
  MIPS: SGI-IP27: clean up bridge access and header files
  MIPS: SGI-IP27: get rid of volatile and hubreg_t
  MIPS: irq: Allocate accurate order pages for irq stack
  MIPS: dma-noncoherent: Remove bogus condition in dma_sync_phys()
  MIPS: eBPF: Remove REG_32BIT_ZERO_EX
  MIPS: eBPF: Always return sign extended 32b values
  MIPS: CM: Fix indentation
  MIPS: BCM47XX: Fix/improve Buffalo WHR-G54S support
  MIPS: OCTEON: program rx/tx-delay always from DT
  MIPS: OCTEON: delete board-specific link status
  MIPS: OCTEON: don't lie about interface type of CN3005 board
  MIPS: OCTEON: warn if deprecated link status is being used
  MIPS: OCTEON: add fixed-link nodes to in-kernel device tree
  MIPS: Delete unused flush_cache_sigtramp()
  ...
2019-03-05 11:28:25 -08:00

283 lines
10 KiB
C

/*
* This file is subject to the terms and conditions of the GNU General Public
* License. See the file "COPYING" in the main directory of this archive
* for more details.
*
* Copyright (C) 2006 by Ralf Baechle (ralf@linux-mips.org)
*/
#ifndef __ASM_BARRIER_H
#define __ASM_BARRIER_H
#include <asm/addrspace.h>
/*
* Sync types defined by the MIPS architecture (document MD00087 table 6.5)
* These values are used with the sync instruction to perform memory barriers.
* Types of ordering guarantees available through the SYNC instruction:
* - Completion Barriers
* - Ordering Barriers
* As compared to the completion barrier, the ordering barrier is a
* lighter-weight operation as it does not require the specified instructions
* before the SYNC to be already completed. Instead it only requires that those
* specified instructions which are subsequent to the SYNC in the instruction
* stream are never re-ordered for processing ahead of the specified
* instructions which are before the SYNC in the instruction stream.
* This potentially reduces how many cycles the barrier instruction must stall
* before it completes.
* Implementations that do not use any of the non-zero values of stype to define
* different barriers, such as ordering barriers, must make those stype values
* act the same as stype zero.
*/
/*
* Completion barriers:
* - Every synchronizable specified memory instruction (loads or stores or both)
* that occurs in the instruction stream before the SYNC instruction must be
* already globally performed before any synchronizable specified memory
* instructions that occur after the SYNC are allowed to be performed, with
* respect to any other processor or coherent I/O module.
*
* - The barrier does not guarantee the order in which instruction fetches are
* performed.
*
* - A stype value of zero will always be defined such that it performs the most
* complete set of synchronization operations that are defined.This means
* stype zero always does a completion barrier that affects both loads and
* stores preceding the SYNC instruction and both loads and stores that are
* subsequent to the SYNC instruction. Non-zero values of stype may be defined
* by the architecture or specific implementations to perform synchronization
* behaviors that are less complete than that of stype zero. If an
* implementation does not use one of these non-zero values to define a
* different synchronization behavior, then that non-zero value of stype must
* act the same as stype zero completion barrier. This allows software written
* for an implementation with a lighter-weight barrier to work on another
* implementation which only implements the stype zero completion barrier.
*
* - A completion barrier is required, potentially in conjunction with SSNOP (in
* Release 1 of the Architecture) or EHB (in Release 2 of the Architecture),
* to guarantee that memory reference results are visible across operating
* mode changes. For example, a completion barrier is required on some
* implementations on entry to and exit from Debug Mode to guarantee that
* memory effects are handled correctly.
*/
/*
* stype 0 - A completion barrier that affects preceding loads and stores and
* subsequent loads and stores.
* Older instructions which must reach the load/store ordering point before the
* SYNC instruction completes: Loads, Stores
* Younger instructions which must reach the load/store ordering point only
* after the SYNC instruction completes: Loads, Stores
* Older instructions which must be globally performed when the SYNC instruction
* completes: Loads, Stores
*/
#define STYPE_SYNC 0x0
/*
* Ordering barriers:
* - Every synchronizable specified memory instruction (loads or stores or both)
* that occurs in the instruction stream before the SYNC instruction must
* reach a stage in the load/store datapath after which no instruction
* re-ordering is possible before any synchronizable specified memory
* instruction which occurs after the SYNC instruction in the instruction
* stream reaches the same stage in the load/store datapath.
*
* - If any memory instruction before the SYNC instruction in program order,
* generates a memory request to the external memory and any memory
* instruction after the SYNC instruction in program order also generates a
* memory request to external memory, the memory request belonging to the
* older instruction must be globally performed before the time the memory
* request belonging to the younger instruction is globally performed.
*
* - The barrier does not guarantee the order in which instruction fetches are
* performed.
*/
/*
* stype 0x10 - An ordering barrier that affects preceding loads and stores and
* subsequent loads and stores.
* Older instructions which must reach the load/store ordering point before the
* SYNC instruction completes: Loads, Stores
* Younger instructions which must reach the load/store ordering point only
* after the SYNC instruction completes: Loads, Stores
* Older instructions which must be globally performed when the SYNC instruction
* completes: N/A
*/
#define STYPE_SYNC_MB 0x10
/*
* stype 0x14 - A completion barrier specific to global invalidations
*
* When a sync instruction of this type completes any preceding GINVI or GINVT
* operation has been globalized & completed on all coherent CPUs. Anything
* that the GINV* instruction should invalidate will have been invalidated on
* all coherent CPUs when this instruction completes. It is implementation
* specific whether the GINV* instructions themselves will ensure completion,
* or this sync type will.
*
* In systems implementing global invalidates (ie. with Config5.GI == 2 or 3)
* this sync type also requires that previous SYNCI operations have completed.
*/
#define STYPE_GINV 0x14
#ifdef CONFIG_CPU_HAS_SYNC
#define __sync() \
__asm__ __volatile__( \
".set push\n\t" \
".set noreorder\n\t" \
".set mips2\n\t" \
"sync\n\t" \
".set pop" \
: /* no output */ \
: /* no input */ \
: "memory")
#else
#define __sync() do { } while(0)
#endif
#define __fast_iob() \
__asm__ __volatile__( \
".set push\n\t" \
".set noreorder\n\t" \
"lw $0,%0\n\t" \
"nop\n\t" \
".set pop" \
: /* no output */ \
: "m" (*(int *)CKSEG1) \
: "memory")
#ifdef CONFIG_CPU_CAVIUM_OCTEON
# define OCTEON_SYNCW_STR ".set push\n.set arch=octeon\nsyncw\nsyncw\n.set pop\n"
# define __syncw() __asm__ __volatile__(OCTEON_SYNCW_STR : : : "memory")
# define fast_wmb() __syncw()
# define fast_rmb() barrier()
# define fast_mb() __sync()
# define fast_iob() do { } while (0)
#else /* ! CONFIG_CPU_CAVIUM_OCTEON */
# define fast_wmb() __sync()
# define fast_rmb() __sync()
# define fast_mb() __sync()
# ifdef CONFIG_SGI_IP28
# define fast_iob() \
__asm__ __volatile__( \
".set push\n\t" \
".set noreorder\n\t" \
"lw $0,%0\n\t" \
"sync\n\t" \
"lw $0,%0\n\t" \
".set pop" \
: /* no output */ \
: "m" (*(int *)CKSEG1ADDR(0x1fa00004)) \
: "memory")
# else
# define fast_iob() \
do { \
__sync(); \
__fast_iob(); \
} while (0)
# endif
#endif /* CONFIG_CPU_CAVIUM_OCTEON */
#ifdef CONFIG_CPU_HAS_WB
#include <asm/wbflush.h>
#define mb() wbflush()
#define iob() wbflush()
#else /* !CONFIG_CPU_HAS_WB */
#define mb() fast_mb()
#define iob() fast_iob()
#endif /* !CONFIG_CPU_HAS_WB */
#define wmb() fast_wmb()
#define rmb() fast_rmb()
#if defined(CONFIG_WEAK_ORDERING)
# ifdef CONFIG_CPU_CAVIUM_OCTEON
# define __smp_mb() __sync()
# define __smp_rmb() barrier()
# define __smp_wmb() __syncw()
# else
# define __smp_mb() __asm__ __volatile__("sync" : : :"memory")
# define __smp_rmb() __asm__ __volatile__("sync" : : :"memory")
# define __smp_wmb() __asm__ __volatile__("sync" : : :"memory")
# endif
#else
#define __smp_mb() barrier()
#define __smp_rmb() barrier()
#define __smp_wmb() barrier()
#endif
#if defined(CONFIG_WEAK_REORDERING_BEYOND_LLSC) && defined(CONFIG_SMP)
#define __WEAK_LLSC_MB " sync \n"
#else
#define __WEAK_LLSC_MB " \n"
#endif
#define smp_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
#ifdef CONFIG_CPU_CAVIUM_OCTEON
#define smp_mb__before_llsc() smp_wmb()
#define __smp_mb__before_llsc() __smp_wmb()
/* Cause previous writes to become visible on all CPUs as soon as possible */
#define nudge_writes() __asm__ __volatile__(".set push\n\t" \
".set arch=octeon\n\t" \
"syncw\n\t" \
".set pop" : : : "memory")
#else
#define smp_mb__before_llsc() smp_llsc_mb()
#define __smp_mb__before_llsc() smp_llsc_mb()
#define nudge_writes() mb()
#endif
#define __smp_mb__before_atomic() __smp_mb__before_llsc()
#define __smp_mb__after_atomic() smp_llsc_mb()
/*
* Some Loongson 3 CPUs have a bug wherein execution of a memory access (load,
* store or pref) in between an ll & sc can cause the sc instruction to
* erroneously succeed, breaking atomicity. Whilst it's unusual to write code
* containing such sequences, this bug bites harder than we might otherwise
* expect due to reordering & speculation:
*
* 1) A memory access appearing prior to the ll in program order may actually
* be executed after the ll - this is the reordering case.
*
* In order to avoid this we need to place a memory barrier (ie. a sync
* instruction) prior to every ll instruction, in between it & any earlier
* memory access instructions. Many of these cases are already covered by
* smp_mb__before_llsc() but for the remaining cases, typically ones in
* which multiple CPUs may operate on a memory location but ordering is not
* usually guaranteed, we use loongson_llsc_mb() below.
*
* This reordering case is fixed by 3A R2 CPUs, ie. 3A2000 models and later.
*
* 2) If a conditional branch exists between an ll & sc with a target outside
* of the ll-sc loop, for example an exit upon value mismatch in cmpxchg()
* or similar, then misprediction of the branch may allow speculative
* execution of memory accesses from outside of the ll-sc loop.
*
* In order to avoid this we need a memory barrier (ie. a sync instruction)
* at each affected branch target, for which we also use loongson_llsc_mb()
* defined below.
*
* This case affects all current Loongson 3 CPUs.
*/
#ifdef CONFIG_CPU_LOONGSON3_WORKAROUNDS /* Loongson-3's LLSC workaround */
#define loongson_llsc_mb() __asm__ __volatile__(__WEAK_LLSC_MB : : :"memory")
#else
#define loongson_llsc_mb() do { } while (0)
#endif
static inline void sync_ginv(void)
{
asm volatile("sync\t%0" :: "i"(STYPE_GINV));
}
#include <asm-generic/barrier.h>
#endif /* __ASM_BARRIER_H */