linux/kernel/rcu
Shan Wei d860d40327 rcu: Use __this_cpu_read() instead of per_cpu_ptr()
The __this_cpu_read() function produces better code than does
per_cpu_ptr() on both ARM and x86.  For example, gcc (Ubuntu/Linaro
4.7.3-12ubuntu1) 4.7.3 produces the following:

ARMv7 per_cpu_ptr():

force_quiescent_state:
    mov    r3, sp    @,
    bic    r1, r3, #8128    @ tmp171,,
    ldr    r2, .L98    @ tmp169,
    bic    r1, r1, #63    @ tmp170, tmp171,
    ldr    r3, [r0, #220]    @ __ptr, rsp_6(D)->rda
    ldr    r1, [r1, #20]    @ D.35903_68->cpu, D.35903_68->cpu
    mov    r6, r0    @ rsp, rsp
    ldr    r2, [r2, r1, asl #2]    @ tmp173, __per_cpu_offset
    add    r3, r3, r2    @ tmp175, __ptr, tmp173
    ldr    r5, [r3, #12]    @ rnp_old, D.29162_13->mynode

ARMv7 __this_cpu_read():

force_quiescent_state:
    ldr    r3, [r0, #220]    @ rsp_7(D)->rda, rsp_7(D)->rda
    mov    r6, r0    @ rsp, rsp
    add    r3, r3, #12    @ __ptr, rsp_7(D)->rda,
    ldr    r5, [r2, r3]    @ rnp_old, *D.29176_13

Using gcc 4.8.2:

x86_64 per_cpu_ptr():

    movl %gs:cpu_number,%edx    # cpu_number, pscr_ret__
    movslq    %edx, %rdx    # pscr_ret__, pscr_ret__
    movq    __per_cpu_offset(,%rdx,8), %rdx    # __per_cpu_offset, tmp93
    movq    %rdi, %r13    # rsp, rsp
    movq    1000(%rdi), %rax    # rsp_9(D)->rda, __ptr
    movq    24(%rdx,%rax), %r12    # _15->mynode, rnp_old

x86_64 __this_cpu_read():

    movq    %rdi, %r13    # rsp, rsp
    movq    1000(%rdi), %rax    # rsp_9(D)->rda, rsp_9(D)->rda
    movq %gs:24(%rax),%r12    # _10->mynode, rnp_old

Because this change produces significant benefits for these two very
diverse architectures, this commit makes this change.

Signed-off-by: Shan Wei <davidshan@tencent.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-09 09:15:21 -07:00
..
Makefile rcutorture: Abstract rcu_torture_random() 2014-02-23 09:00:58 -08:00
rcu.h rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone 2014-02-26 06:35:18 -08:00
rcutorture.c torture: Check for multiple concurrent torture tests 2014-05-14 09:46:29 -07:00
srcu.c rcu: Eliminate read-modify-write ACCESS_ONCE() calls 2014-07-09 09:14:49 -07:00
tiny_plugin.h rcu: Protect uses of ->jiffies_stall with ACCESS_ONCE() 2014-04-29 08:44:41 -07:00
tiny.c rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone 2014-02-26 06:35:18 -08:00
tree_plugin.h rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs 2014-07-09 09:15:02 -07:00
tree_trace.c rcu: Stop tracking FSF's postal address 2014-02-17 15:01:37 -08:00
tree.c rcu: Use __this_cpu_read() instead of per_cpu_ptr() 2014-07-09 09:15:21 -07:00
tree.h rcu: Simplify priority boosting by putting rt_mutex in rcu_node 2014-07-09 09:15:01 -07:00
update.c rcu: Reduce overhead of cond_resched() checks for RCU 2014-06-23 11:19:32 -07:00