Skip to content
  • Russell King's avatar
    ARM: Optimize multi-CPU tlb flushing a little more · 87067a93
    Russell King authored
    
    
    The compiler does not conditionalize the assembly instructions for
    the tlb operations, which leads to sub-optimal code being generated
    when building a kernel for multiple CPUs.
    
    We can tweak things fairly simply as the code fragment below shows:
    
        17f8:       e3120001        tst     r2, #1  ; 0x1
    ...
        1800:       0a000000        beq     1808 <handle_pte_fault+0x194>
        1804:       ee061f10        mcr     15, 0, r1, cr6, cr0, {0}
        1808:       e3120004        tst     r2, #4  ; 0x4
        180c:       0a000000        beq     1814 <handle_pte_fault+0x1a0>
        1810:       ee081f36        mcr     15, 0, r1, cr8, cr6, {1}
    becomes:
        17f0:       e3120001        tst     r2, #1  ; 0x1
        17f4:       1e063f10        mcrne   15, 0, r3, cr6, cr0, {0}
        17f8:       e3120004        tst     r2, #4  ; 0x4
        17fc:       1e083f36        mcrne   15, 0, r3, cr8, cr6, {1}
    
    Overall, for Realview with V6 and V7 CPUs configured:
    
       text    data     bss     dec     hex filename
    4153998  207340 5371036 9732374  948116 ../build/realview/vmlinux.before
    4153366  207332 5371036 9731734  947e96 ../build/realview/vmlinux.after
    
    Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
    87067a93