linux/arch
Chen Jie 615eb603f4 MIPS: csum_partial: Improve instruction parallelism.
Computing sum introduces true data dependency. This patch removes some
true data depdendencies, hence increases instruction level parallelism.

This patch brings up to 50% csum performance gain on Loongson 3a.

One example about how this patch works is in CSUM_BIGCHUNK1:
// ** original **    vs    ** patch applied **
    ADDC(sum, t0)           ADDC(t0, t1)
    ADDC(sum, t1)           ADDC(t2, t3)
    ADDC(sum, t2)           ADDC(sum, t0)
    ADDC(sum, t3)           ADDC(sum, t2)

In the original implementation, each ADDC(sum, ...) depends on the sum
value updated by previous ADDC(as source operand).

With this patch applied, the first two ADDC operations are independent,
hence can be executed simultaneously if possible.

Another example is in the "copy and sum calculating chunk":
// ** original **    vs    ** patch applied **
    STORE(t0, UNIT(0) ...   STORE(t0, UNIT(0) ...
    ADDC(sum, t0)           ADDC(t0, t1)
    STORE(t1, UNIT(1) ...   STORE(t1, UNIT(1) ...
    ADDC(sum, t1)           ADDC(sum, t0)
    STORE(t2, UNIT(2) ...   STORE(t2, UNIT(2) ...
    ADDC(sum, t2)           ADDC(t2, t3)
    STORE(t3, UNIT(3) ...   STORE(t3, UNIT(3) ...
    ADDC(sum, t3)           ADDC(sum, t2)

With this patch applied, ADDC and the **next next** ADDC are independent.

Signed-off-by: chenj <chenj@lemote.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/9608/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-04-01 17:22:11 +02:00
..
alpha asm-generic: uaccess.h cleanup 2015-02-18 10:02:24 -08:00
arc ARC: Fix thread_saved_pc() 2015-02-27 10:59:34 +05:30
arm Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2015-03-21 10:03:22 -07:00
arm64 arm64 fixes: 2015-03-21 10:24:10 -07:00
avr32 asm-generic: uaccess.h cleanup 2015-02-18 10:02:24 -08:00
blackfin Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-02-21 12:59:04 -08:00
c6x arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU 2015-03-12 18:46:08 -07:00
cris CRIS changes for 3.20 2015-02-15 18:02:02 -08:00
frv mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines 2015-02-28 09:57:51 -08:00
hexagon all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
ia64 asm-generic: uaccess.h cleanup 2015-02-18 10:02:24 -08:00
m32r mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines 2015-02-28 09:57:51 -08:00
m68k mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines 2015-02-28 09:57:51 -08:00
metag metag: Fix KSTK_EIP() and KSTK_ESP() macros 2015-02-24 12:54:21 +00:00
microblaze microblaze: Fix syscall error recovery for invalid syscall IDs 2015-03-04 15:12:27 +01:00
mips MIPS: csum_partial: Improve instruction parallelism. 2015-04-01 17:22:11 +02:00
mn10300 mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines 2015-02-28 09:57:51 -08:00
nios2 nios2: mm: do not invoke OOM killer on kernel fault OOM 2015-03-16 15:35:25 +08:00
openrisc asm-generic: uaccess.h cleanup 2015-02-18 10:02:24 -08:00
parisc mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines 2015-02-28 09:57:51 -08:00
powerpc powerpc/iommu: Remove IOMMU device references via bus notifier 2015-03-04 13:19:33 +11:00
s390 kvm: move advertising of KVM_CAP_IRQFD to common code 2015-03-10 21:18:59 -03:00
score all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
sh asm-generic: uaccess.h cleanup 2015-02-18 10:02:24 -08:00
sparc sparc: Fix /proc/kcore 2015-03-18 19:15:28 -07:00
tile tile: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:37 -08:00
um all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
unicore32 mm: vmalloc: pass additional vm_flags to __vmalloc_node_range() 2015-02-13 21:21:42 -08:00
x86 Power management and ACPI fixes for v4.0-rc5 2015-03-21 12:51:36 -07:00
xtensa asm-generic: uaccess.h cleanup 2015-02-18 10:02:24 -08:00
.gitignore
Kconfig