• Leigh B. Stoller's avatar
    I noticed in the 12 nodes tests that CPU was running at 5-6% now. I · 5c02231f
    Leigh B. Stoller authored
    also noticed that the slower machines were getting very far behind the
    faster machines (the faster machines requests chunks faster), and
    actually dropping them cause they have no room for the chunks
    (chunkbufs at 32). I increased the timeout on the client (if no blocks
    received for this long; request something) from 30ms to 90ms.  This
    helped a bit, but the real help was increasing chunkbufs up to 64.
    Now the clients run in pretty much single node speed (152/174), and
    the CPU usage on boss went back down 2-3% during the run. The stats
    show far less data loss and resending of blocks. In fact, we were
    resending upwards 300MB of data cause of client loss. That went down
    to about 14MB for the 12 node test.
    Then I ran a 24 node node test. Very sweet. All 24 nodes ran in 155 -
    180 seconds. CPU peaked at 6%, and dropped off to steady state of 4%.
    None of the nodes saw any duplicate chunks. Note that the client is
    probably going to need some backoff code in case the server dies, to
    prevent swamping the boss with unanswerable packets. Next step is to
    have Matt run a test when he swaps in his 40 nodes.