Time synchronization at Cloudlab clusters
A recent question on the users list asked about time synchronization between the clusters which got me thinking about this again.
All of the nodes at a cluster use a local (ntp1
) NTP time server which by convention is ops
. We also stash away the "drift" value from each node (via the watchdog) and use the latest saved value to initialize the drift file when a node is imaged. The various cluster NTP servers use a range of upstream servers and NTP pools, but are not directly connected ("peers"). We seem to keep reasonable time between the cluster NTP servers at least, generally around 1-5ms.
Some questions:
- Is saving/restoring the drift value still a good thing to do?
- Should we be using PTP?
- Any chance of getting a GPS receiver at the main clusters?
- Should we use
chrony
which isaimed at ordinary computers, which are unstable, go into sleep mode or have intermittent connection to the Internet. chrony is also designed for virtual machines, a much more unstable environment.
? I think current Ubuntu images already use it.
At the very least, we should probably move the ntp1
alias off of ops
, which is a VM at all but Emulab, and onto the control node instead where there would be a more stable clock.