I noticed in the 12 nodes tests that CPU was running at 5-6% now. I
also noticed that the slower machines were getting very far behind the faster machines (the faster machines requests chunks faster), and actually dropping them cause they have no room for the chunks (chunkbufs at 32). I increased the timeout on the client (if no blocks received for this long; request something) from 30ms to 90ms. This helped a bit, but the real help was increasing chunkbufs up to 64. Now the clients run in pretty much single node speed (152/174), and the CPU usage on boss went back down 2-3% during the run. The stats show far less data loss and resending of blocks. In fact, we were resending upwards 300MB of data cause of client loss. That went down to about 14MB for the 12 node test. Then I ran a 24 node node test. Very sweet. All 24 nodes ran in 155 - 180 seconds. CPU peaked at 6%, and dropped off to steady state of 4%. None of the nodes saw any duplicate chunks. Note that the client is probably going to need some backoff code in case the server dies, to prevent swamping the boss with unanswerable packets. Next step is to have Matt run a test when he swaps in his 40 nodes.
Showing with 6 additions and 5 deletions