System time sync issue, causing relay wreck

Greetings all,

I hope this finds you well. Exciting times with the release of 1.23.0, wishing everyone the best! This topic is for those (like me) who have (or may) run into issues with their nodes following a system reboot.

I run cardano-node as a systemd service, and have never had issues starting/stopping or restarting the software. This ease-of-use also applied to the occasional system reboot. Until Wednesday.

On reboot, I got the rebooted relay up and running in no time, as usual. Seconds later it dropped all of its outgoing connections, and was booted by all its incoming peers. At first glance, the messages in the log pointed to a time sync issue, as per below (bold text):

{“at”:“2020-11-27T01:23:55.33Z”,“env”:“1.21.1:9577e”,“ns”:[“cardano.node.ChainDB”],“data”:{“kind”:“TraceAddBlockEvent.IgnoreInvalidBlock”,“block”:{“hash”:“5956c15”,“kind”:“Point”,“slot”:14873950},“reason”:“InFutureExceedsClockSkew (RealPoint (SlotNo 14873950) 5956c15ab55ef0f89d6d43d4e6e03a328ee158f28168471f15320dcceb6ae5c0)”},“app”:,“msg”:"",“pid”:“696”,“loc”:null,“host”:“ip-172-3”,“sev”:“Info”,“thread”:“39”}

I have since sent a ticket to IOHK, and am waiting for an answer. In the meantime, I have found two things that may be of use to anyone experiencing similar issues:

  1. the problem can be solved with a reset of the database (not ideal, due to huge wasted hours), but will happen again on next system reboot;

  2. the issue does not occur if your machine (or VM) is stopped and started (rather than rebooted);

After scratching my head for a while and fearing the worst, I resolved to install chrony. This solved the problem and I have been able to reboot the system consistently without it recurring.

Last comment, if this is happening to you on cloud services, please follow their directions to time sync.

I welcome any comments/experiences, once I have information from IOHK, I will post it here.

Cheers,

Adrem [RABIT]

Thanks for pointing this out. Does the Cardano consensus protocol depend on NTP to work at all? Can an attack on NTP bring down the Cardano chain?

My understanding is that external time dependency and attacks will be solved with the move to Ouroboros Chronos. At the moment Chrony is recommended and there are some configs out there to help. We use the Google Servers with Chrony as they are Stratum 1, and therefore very accurate (giving a local clock within uS ranges).

Is there a reason to use chrony as opposed to the default NTP client?

hi @cyberruss and @waldmops,

thank you both for taking the time to reply. Thank you for linking the paper, it seems like there is something in the works. I also found this information:

I have no intention to veer anyone toward the installation of chrony vs using the default NTP client. For me however, the use of chrony has solved the issue above and also resulted in more consistent reporting of propagation times.

I hope this helps and thanks again for taking the time to read.

Cheers,

Adrem [RABIT]

1 Like

hi all,

I just wanted to update this thread (and consider it solved) by posting the exchange I had with MrBliss on the github page:

I want to take the opportunity to thank MrBliss and all that have contributed their thoughts.

Cheers,

Adrem [RABIT]

1 Like