1. 08 May, 2018 2 commits
    • Vikram Narayanan's avatar
      Add another missing typecast · 06f253b5
      Vikram Narayanan authored
      06f253b5
    • Vikram Narayanan's avatar
      awe_mapper: Fix typecasting, assert bugs · 5b2c87a8
      Vikram Narayanan authored
      
      
      test_async():Basic do...finish/async test
      _thc_nested_async():ASYNC 1: inside
      test_async():Done with do...finish
      test_async_yield():Basic do...finish/async yield test
      _thc_nested_async():ASYNC 1: Ready to yield
      _thc_nested_async():ASYNC 2: Ready to yield
      _thc_nested_async():ASYNC 1: Got control back
      _thc_nested_async():ASYNC 2: Got control back
      test_async_yield():Done with do...finish
      test_basic_do_finish_create():Average time per do{ i++ }finish(): 11 cycles (Passed)
      test_basic_nonblocking_async_create():Average time per non blocking do{async{i++}}finish(): 55 cycles (Passed)
      test_basic_N_nonblocking_asyncs_create():Average time per 10 non blocking asyncs inside one do{ }finish(): 541 cycles (Passed)
      test_basic_1_blocking_asyncs_create():Average time per 1 blocking asyncs inside one do{ }finish(): 153 cycles (Passed)
      test_basic_N_blocking_asyncs_create():Average time per 10 blocking asyncs inside one do{ }finish(): 1300 cycles (Passed)
      test_basic_N_blocking_asyncs_create_pts():Average time per 10 blocking asyncs inside one do{ }finish() (thread pts): 1374 cycles (Passed)
      test_basic_N_blocking_id_asyncs():Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1454 cycles (Passed)
      test_basic_N_blocking_id_asyncs_pts():Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper, thread pts): 1473 cycles (Passed)
      test_basic_N_blocking_id_asyncs_and_N_yields_back():Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 1227 cycles (Passed)
      test_basic_N_blocking_id_asyncs_and_N_yields_back_extrnl_ids():Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper, external): 1055 cycles (Passed)
      test_do_finish_yield():Average time per do .. finish and two blocking yields: 280 cycles (Passed)
      test_do_finish_yield_no_dispatch():Average time per do .. finish and two blocking yields (no dispatch): 292 cycles (Passed)
      test_ctx_switch_no_dispatch():Average time per context switch (no dispatch): 50 cycles
      test_ctx_switch_no_dispatch_direct():Average time per context switch (no dispatch, direct): 49 cycles
      test_ctx_switch_no_dispatch_direct_trusted():Average time per context switch (no dispatch, direct, trusted): 46 cycles
      test_ctx_switch_to_awe():Average time per context switch (direct awe): 42 cycles
      test_create_awe():Average time to create and remove 64 awe_ids: 1723 cycles (awe:0x18ecae0)
      Signed-off-by: Vikram Narayanan's avatarVikram Narayanan <vikram186@gmail.com>
      5b2c87a8
  2. 04 May, 2018 1 commit
  3. 28 Apr, 2018 3 commits
  4. 27 Apr, 2018 1 commit
  5. 16 Apr, 2018 1 commit
  6. 15 Apr, 2018 1 commit
  7. 04 Apr, 2018 1 commit
  8. 23 Mar, 2018 2 commits
    • Anton Burtsev's avatar
      Minor text comment update · dc762229
      Anton Burtsev authored
      dc762229
    • Anton Burtsev's avatar
      Implement list for pending free stacks · 3320617b
      Anton Burtsev authored
      -- The idea is to eliminate a heavyweight dispatch loop, and
         dispatch awes from the list directly
      
      Basic do...finish/async test
      ASYNC 1: inside
      Done with do...finish
      Basic do...finish/async yield test
      ASYNC 1: Ready to yield
      ASYNC 2: Ready to yield
      ASYNC 1: Got control back
      ASYNC 2: Got control back
      Done with do...finish
      Average time per do{ i++ }finish(): 6 cycles (Passed)
      Average time per non blocking do{async{i++}}finish(): 35 cycles (Passed)
      Average time per 10 non blocking asyncs inside one do{ }finish(): 373 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish(): 1029 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (thread pts): 1043 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1238 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper, thread pts): 1245 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 935 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper, external): 850 cycles (Passed)
      Average time per do .. finish and two blocking yields: 241 cycles (Passed)
      Average time per do .. finish and two blocking yields (no dispatch): 208 cycles (Passed)
      Average time per context switch (no dispatch): 36 cycles
      Average time per context switch (no dispatch, direct): 36 cycles
      Average time per context switch (no dispatch, direct, trusted): 35 cycles
      Average time per context switch (direct awe): 29 cycles
      3320617b
  9. 22 Mar, 2018 6 commits
    • Anton Burtsev's avatar
      A couple of tests with explanations · b4419bc8
      Anton Burtsev authored
      b4419bc8
    • Anton Burtsev's avatar
      Ironically it's a bit slower... ok revert · 0ff47470
      Anton Burtsev authored
      0ff47470
    • Anton Burtsev's avatar
      Remove one unneeded pendingfree() · fc6d0424
      Anton Burtsev authored
      fc6d0424
    • Anton Burtsev's avatar
      Add "external" to the test description · a5b38e8e
      Anton Burtsev authored
      a5b38e8e
    • Anton Burtsev's avatar
      Direct invocation for self-managed AWEs · bfe8f4fa
      Anton Burtsev authored
      -- I.e., we manage the list of AWEs extenally instead of using the
         AWE mapper... this is a realistic case for us, since we'll send
         multiple blocking IPCs, we can keep an array of AWEs around, i.e.,
         in PTS and put IDs there one by one with a simple counter increment,
         later on when replies come we only have to check if the
         msg.id < counter and yield to the AWE pointed by the msg.id
      
      Basic do...finish/async test
      ASYNC 1: inside
      Done with do...finish
      Basic do...finish/async yield test
      ASYNC 1: Ready to yield
      ASYNC 2: Ready to yield
      ASYNC 1: Got control back
      ASYNC 2: Got control back
      Done with do...finish
      Average time per do{ i++ }finish(): 6 cycles (Passed)
      Average time per non blocking do{async{i++}}finish(): 37 cycles (Passed)
      Average time per 10 non blocking asyncs inside one do{ }finish(): 352 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish(): 1054 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1132 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 985 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper, external): 883 cycles (Passed)
      Average time per do .. finish and two blocking yields: 228 cycles (Passed)
      Average time per do .. finish and two blocking yields (no dispatch): 241 cycles (Passed)
      Average time per context switch (no dispatch): 37 cycles
      Average time per context switch (no dispatch, direct): 37 cycles
      Average time per context switch (no dispatch, direct, trusted): 36 cycles
      Average time per context switch (direct awe): 31 cycles
      bfe8f4fa
    • Anton Burtsev's avatar
      Replace calls with direct jumps (didn't save anything) · 5a5809a5
      Anton Burtsev authored
      -- The ASM is cleaner though ... I geuss
      
      Basic do...finish/async test
      ASYNC 1: inside
      Done with do...finish
      Basic do...finish/async yield test
      ASYNC 1: Ready to yield
      ASYNC 2: Ready to yield
      ASYNC 1: Got control back
      ASYNC 2: Got control back
      Done with do...finish
      Average time per do{ i++ }finish(): 6 cycles (Passed)
      Average time per non blocking do{async{i++}}finish(): 37 cycles (Passed)
      Average time per 10 non blocking asyncs inside one do{ }finish(): 342 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish(): 1007 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1086 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 956 cycles (Passed)
      Average time per do .. finish and two blocking yields: 220 cycles (Passed)
      Average time per do .. finish and two blocking yields (no dispatch): 221 cycles (Passed)
      Average time per context switch (no dispatch): 36 cycles
      Average time per context switch (no dispatch, direct): 36 cycles
      Average time per context switch (no dispatch, direct, trusted): 34 cycles
      Average time per context switch (direct awe): 29 cycles
      5a5809a5
  10. 21 Mar, 2018 3 commits
    • Anton Burtsev's avatar
      Thread the PTS pointer through all continuations · a8bd1896
      Anton Burtsev authored
      --Doesn't seem to help that much
      
      Basic do...finish/async test
      ASYNC 1: inside
      Done with do...finish
      Basic do...finish/async yield test
      ASYNC 1: Ready to yield
      ASYNC 2: Ready to yield
      ASYNC 1: Got control back
      ASYNC 2: Got control back
      Done with do...finish
      Average time per do{ i++ }finish(): 6 cycles (Passed)
      Average time per non blocking do{async{i++}}finish(): 37 cycles (Passed)
      Average time per 10 non blocking asyncs inside one do{ }finish(): 337 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish(): 1006 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1105 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 945 cycles (Passed)
      Average time per do .. finish and two blocking yields: 217 cycles (Passed)
      Average time per do .. finish and two blocking yields (no dispatch): 222 cycles (Passed)
      Average time per context switch (no dispatch): 37 cycles
      Average time per context switch (no dispatch, direct): 36 cycles
      Average time per context switch (no dispatch, direct, trusted): 35 cycles
      Average time per context switch (direct awe): 29 cycles
      a8bd1896
    • Anton Burtsev's avatar
      Instead of keeping list of awe pointers, keep list of awes · af07d012
      Anton Burtsev authored
      Basic do...finish/async test
      ASYNC 1: inside
      Done with do...finish
      Basic do...finish/async yield test
      ASYNC 1: Ready to yield
      ASYNC 2: Ready to yield
      ASYNC 1: Got control back
      ASYNC 2: Got control back
      Done with do...finish
      Average time per do{ i++ }finish(): 6 cycles (Passed)
      Average time per non blocking do{async{i++}}finish(): 38 cycles (Passed)
      Average time per 10 non blocking asyncs inside one do{ }finish(): 343 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish(): 1012 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1106 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 966 cycles (Passed)
      Average time per do .. finish and two blocking yields: 218 cycles (Passed)
      Average time per do .. finish and two blocking yields (no dispatch): 223 cycles (Passed)
      Average time per context switch (no dispatch): 39 cycles
      Average time per context switch (no dispatch, direct): 36 cycles
      Average time per context switch (no dispatch, direct, trusted): 35 cycles
      Average time per context switch (direct awe): 29 cycles
      af07d012
    • Anton Burtsev's avatar
      Re-write awe mapper to use bit arithmetic · 710625e7
      Anton Burtsev authored
      Basic do...finish/async test
      ASYNC 1: inside
      Done with do...finish
      Basic do...finish/async yield test
      ASYNC 1: Ready to yield
      ASYNC 2: Ready to yield
      ASYNC 1: Got control back
      ASYNC 2: Got control back
      Done with do...finish
      Average time per do{ i++ }finish(): 6 cycles (Passed)
      Average time per non blocking do{async{i++}}finish(): 38 cycles (Passed)
      Average time per 10 non blocking asyncs inside one do{ }finish(): 345 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish(): 998 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1193 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 1011 cycles (Passed)
      Average time per do .. finish and two blocking yields: 221 cycles (Passed)
      Average time per do .. finish and two blocking yields (no dispatch): 244 cycles (Passed)
      Average time per context switch (no dispatch): 53 cycles
      Average time per context switch (no dispatch, direct): 53 cycles
      Average time per context switch (no dispatch, direct, trusted): 53 cycles
      Average time per context switch (direct awe): 29 cycles
      710625e7
  11. 19 Mar, 2018 8 commits
    • Anton Burtsev's avatar
      a0e212ef
    • Anton Burtsev's avatar
      More realistic tests · ddd7e515
      Anton Burtsev authored
      -- Note that yielding through awe mapper
      
      "10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1274 cycles"
      
      is slower than a blind yield via dispatch loop
      
      "5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 1037 cycles"
      
      yet for whatever reason the 5 blocking asyncs inside one do{ }finish() and 5 yield backs is faster
      
      "5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 1037 cycles"
      
      Basic do...finish/async test
      ASYNC 1: inside
      Done with do...finish
      Basic do...finish/async yield test
      ASYNC 1: Ready to yield
      ASYNC 2: Ready to yield
      ASYNC 1: Got control back
      ASYNC 2: Got control back
      Done with do...finish
      Average time per do{ i++ }finish(): 7 cycles (res:1000000 ?= 1000000)
      Average time per non blocking do{async{i++}}finish(): 38 cycles (res:1000000 ?= 1000000)
      Average time per 10 non blocking asyncs inside one do{ }finish(): 341 cycles (res:10000000 ?= 1000000)
      Average time per 10 blocking asyncs inside one do{ }finish(): 1019 cycles (Passed)
      Average time per 10 blocking asyncs inside one do{ }finish() (yield via awe mapper): 1274 cycles (Passed)
      Average time per 5 blocking asyncs inside one do{ }finish() and 5 yield backs (yield via awe mapper): 1037 cycles (Passed)
      Average time per do...finish and two blocking yields: 229 cycles (res:2000000 ?= 2000000)
      Average time per do .. finish and two blocking yields (no dispatch): 245 cycles (res:2000000 ?= 2000000)
      Average time per context switch (no dispatch): 50 cycles
      Average time per context switch (no dispatch, direct): 53 cycles
      Average time per context switch (no dispatch, direct, trusted): 53 cycles
      Average time per context switch (direct awe): 29 cycles
      ddd7e515
    • Anton Burtsev's avatar
      Add basic blocking tests · 4fb0cb28
      Anton Burtsev authored
      Current results:
      Basic do...finish/async test
      ASYNC 1: inside
      Done with do...finish
      Basic do...finish/async yield test
      ASYNC 1: Ready to yield
      ASYNC 2: Ready to yield
      ASYNC 1: Got control back
      ASYNC 2: Got control back
      Done with do...finish
      Average time per do{ i++ }finish(): 7 cycles (res:1000000 ?= 1000000)
      Average time per non blocking do{async{i++}}finish(): 37 cycles (res:1000000 ?= 1000000)
      Average time per 10 non blocking asyncs inside one do{ }finish(): 348 cycles (res:10000000 ?= 1000000)
      Average time per 10 blocking asyncs inside one do{ }finish(): 1025 cycles (res:10000000 ?= 1000000)
      Average time per do...finish and two blocking yields: 230 cycles (res:2000000 ?= 2000000)
      Average time per do .. finish and two blocking yields (no dispatch): 247 cycles (res:2000000 ?= 2000000)
      Average time per context switch (no dispatch): 51 cycles
      Average time per context switch (no dispatch, direct): 50 cycles
      Average time per context switch (no dispatch, direct, trusted): 53 cycles
      Average time per context switch (direct awe): 29 cycles
      4fb0cb28
    • Anton Burtsev's avatar
      Introduce PTS()->direct_awe optimization · 78947ded
      Anton Burtsev authored
      -- Try not to exit into the dispatch loop as long as direct
         continuation is known.
      
         I.e., for ASYNC() the direct continuation is it's calling
         context (it's created with the  SCHEDULE_CONT(&_awe) function
         from inside the ASYNC macro.
      78947ded
    • Anton Burtsev's avatar
      Fix bug in Barrelfish ASYNC macro · f5d5e862
      Anton Burtsev authored
      -- Barrelfish assumes that rax will not be equal to 0 when ASYNC AWE will
         be resumed. Hence, they believe that _swizzle() will be called only on
         the original path. Of course this is wrong... eax can be 0 on the AWE path
         as well which will trigger the second _swizzle() and as a result exit into
         the idle function
      f5d5e862
    • Anton Burtsev's avatar
      64d1f68b
    • Anton Burtsev's avatar
      MInor fixes · 5bc024fb
      Anton Burtsev authored
      5bc024fb
    • Anton Burtsev's avatar
      Execute finish awe directly instead of scheduling it · 05d36bcd
      Anton Burtsev authored
      When the last async exits the do_finish block we know that we have to execute
      finish awe, there is no reason to schedule it and then jumping to the dispatch
      loop, we can execute it directly.
      05d36bcd
  12. 18 Mar, 2018 11 commits
    • Anton Burtsev's avatar
      Basic timing for do...finish, and async · 66346d0b
      Anton Burtsev authored
      -- Average time per do{ i++ }finish(): 7 cycles
      -- Average time per non blocking do{async{i++}}finish(): 47 cycles
      -- Average time per 10 non blocking asyncs inside one do{ }finish(): 464 cycles
      66346d0b
    • Anton Burtsev's avatar
      Direct context switch between two awe (29-31 cycles) · f54dda6f
      Anton Burtsev authored
      /********************************************************************************/
      /* Direct path from AWE to AWE: 29-31 cycles
      /* 16 memory accesses with push/pop/mov x 2 = 31 cycles
      /* Specifically:
           5x2 = 10 -- save and restore callee registers (r15, r14, r13, r12, rbx)
           save and restore eip, rbp, rsp (do we need rbp?)
      /* 1 jump, 1 call, 1 return, 1 add = 4
      /* Total is in 31 range...
      /********************************************************************************/
      
      Dump of assembler code for function THCYieldToAwe:
         0x00000000004011d0 <+0>:     push   %r15
         0x00000000004011d2 <+2>:     push   %r14
         0x00000000004011d4 <+4>:     push   %r13
         0x00000000004011d6 <+6>:     push   %r12
         0x00000000004011d8 <+8>:     push   %rbx
         0x00000000004011d9 <+9>:     callq  0x400ae0 <_thc_exec_awe_direct>
         0x00000000004011de <+14>:    pop    %rbx
         0x00000000004011df <+15>:    pop    %r12
         0x00000000004011e1 <+17>:    pop    %r13
         0x00000000004011e3 <+19>:    pop    %r14
         0x00000000004011e5 <+21>:    pop    %r15
         0x00000000004011e7 <+23>:    retq
      End of assembler dump.
      (gdb) disas _thc_exec_awe_direct
      Dump of assembler code for function _thc_exec_awe_direct:
         0x0000000000400ae0 <+0>:     mov    (%rsp),%rax            # save return eip of the awe_from (it's on the stack) into rax
         0x0000000000400ae4 <+4>:     mov    %rax,(%rdi)            # save eip into awe (it's in rdi)
         0x0000000000400ae7 <+7>:     mov    %rbp,0x8(%rdi)         # save rbp into awe->rbp
         0x0000000000400aeb <+11>:    mov    %rsp,0x10(%rdi)        # save rsp into awe->rsp
         0x0000000000400aef <+15>:    addq   $0x8,0x10(%rdi)
         0x0000000000400af4 <+20>:    mov    0x8(%rsi),%rbp         # restore rbp from awe_to (it's in rsi)
         0x0000000000400af8 <+24>:    mov    0x10(%rsi),%rsp        # restore rsp from awe_to
         0x0000000000400afc <+28>:    jmpq   *(%rsi)
         0x0000000000400afe <+30>:    int3
         0x0000000000400aff <+31>:    nop
      f54dda6f
    • Anton Burtsev's avatar
      Implement a trusted version for AWE mapper · 37ac9132
      Anton Burtsev authored
      -- Trusted version assumes that awe ids are in range and the code
      that provides them is trusted
      
      -- Direct path without checks for awe id in range
         -- 21 memory accesses with push/pop/mov x 2 = 42
         -- 10 register moves, adds, and so on... x1 = 10
         -- Total is in the 52 range... of course some execute
            in parallel, but in the end we're blocked on something
      37ac9132
    • Anton Burtsev's avatar
      Re-write the call continuation path in ASM · d5e0da04
      Anton Burtsev authored
       -- Replace the whole _thc_call_cont and _thc_call_cont_c business
          with a simple direct jump to awe_to->eip
      
       -- Surprisingly... while the instruction sequence is much shorter
          it only saves 1 cycle, which probably means that we bottleneck
          on the caches or what else can it be?
      d5e0da04
    • Anton Burtsev's avatar
      Updated asm ctx switch path (53 cycles on my laptop) · 3b7387e2
      Anton Burtsev authored
      Instruction stats:
        -- 28 memory instructions
        -- 5 arithms
        -- 5 jumps and 2 calls
      3b7387e2
    • Anton Burtsev's avatar
      c1598150
    • Anton Burtsev's avatar
      9bb64fa9
    • Anton Burtsev's avatar
      Add asm for context switch path · 27f969c4
      Anton Burtsev authored
      27f969c4
    • Anton Burtsev's avatar
      Context switch path · 6e53471c
      Anton Burtsev authored
        -- path.c is just a context switch path as a collection of C
           functions back to back, so it's easy to understand what is
           going on
      6e53471c
    • Anton Burtsev's avatar
      Stupid bugs here and there... -O2 · eb24acd5
      Anton Burtsev authored
      eb24acd5
    • Anton Burtsev's avatar
      More minor cleanup · a3a77dee
      Anton Burtsev authored
      a3a77dee