• Leigh B Stoller's avatar
    Rework how we store the sliver/slice status from the clusters: · e5d36e0d
    Leigh B Stoller authored
    In the beginning, the number and size of experiments was small, and so
    storing the entire slice/sliver status blob as json in the web task was
    fine, even though we had to lock tables to prevent races between the
    event updates and the local polling.
    But lately the size of those json blobs is getting huge and the lock is
    bogging things down, including not being able to keep up with the number
    of events coming from all the clusters, we get really far behind.
    So I have moved the status blobs out of the per-instance web task and
    into new tables, once per slice and one per node (sliver). This keeps
    the blobs very small and thus the lock time very small. So now we can
    keep up with the event stream.
    If we grow big enough that this problem comes big enough, we can switch
    to innodb for the per-sliver table and do row locking instead of table
    locking, but I do not think that will happen