System time sync issue, causing relay wreck

Adrem · 28 November 2020 08:53

Greetings all,

I hope this finds you well. Exciting times with the release of 1.23.0, wishing everyone the best! This topic is for those (like me) who have (or may) run into issues with their nodes following a system reboot.

I run cardano-node as a systemd service, and have never had issues starting/stopping or restarting the software. This ease-of-use also applied to the occasional system reboot. Until Wednesday.

On reboot, I got the rebooted relay up and running in no time, as usual. Seconds later it dropped all of its outgoing connections, and was booted by all its incoming peers. At first glance, the messages in the log pointed to a time sync issue, as per below (bold text):

{“at”:“2020-11-27T01:23:55.33Z”,“env”:“1.21.1:9577e”,“ns”:[“cardano.node.ChainDB”],“data”:{“kind”:“TraceAddBlockEvent.IgnoreInvalidBlock”,“block”:{“hash”:“5956c15”,“kind”:“Point”,“slot”:14873950},“reason”:“InFutureExceedsClockSkew (RealPoint (SlotNo 14873950) 5956c15ab55ef0f89d6d43d4e6e03a328ee158f28168471f15320dcceb6ae5c0)”},“app”:,“msg”:"",“pid”:“696”,“loc”:null,“host”:“ip-172-3”,“sev”:“Info”,“thread”:“39”}

I have since sent a ticket to IOHK, and am waiting for an answer. In the meantime, I have found two things that may be of use to anyone experiencing similar issues:

the problem can be solved with a reset of the database (not ideal, due to huge wasted hours), but will happen again on next system reboot;
the issue does not occur if your machine (or VM) is stopped and started (rather than rebooted);

After scratching my head for a while and fearing the worst, I resolved to install chrony. This solved the problem and I have been able to reboot the system consistently without it recurring.

Last comment, if this is happening to you on cloud services, please follow their directions to time sync.

I welcome any comments/experiences, once I have information from IOHK, I will post it here.

Cheers,

Adrem [RABIT]

waldmops · 28 November 2020 09:57

Thanks for pointing this out. Does the Cardano consensus protocol depend on NTP to work at all? Can an attack on NTP bring down the Cardano chain?

cyberruss · 28 November 2020 15:19

My understanding is that external time dependency and attacks will be solved with the move to Ouroboros Chronos. At the moment Chrony is recommended and there are some configs out there to help. We use the Google Servers with Chrony as they are Stratum 1, and therefore very accurate (giving a local clock within uS ranges).

waldmops · 28 November 2020 18:38

Is there a reason to use chrony as opposed to the default NTP client?

Adrem · 28 November 2020 23:54

hi @cyberruss and @waldmops,

thank you both for taking the time to reply. Thank you for linking the paper, it seems like there is something in the works. I also found this information:

I have no intention to veer anyone toward the installation of chrony vs using the default NTP client. For me however, the use of chrony has solved the issue above and also resulted in more consistent reporting of propagation times.

I hope this helps and thanks again for taking the time to read.

Cheers,

Adrem [RABIT]

Adrem · 30 November 2020 09:16

hi all,

I just wanted to update this thread (and consider it solved) by posting the exchange I had with MrBliss on the github page:

I want to take the opportunity to thank MrBliss and all that have contributed their thoughts.

Cheers,

Adrem [RABIT]

Topic		Replies	Views
Relay node syncing issue - 1.27.0 Setup a Stake Pool	31	1755	27 May 2021
Problems into Relays and BP from yesterday Operate a Stake Pool	18	662	22 September 2021
Chrony, time sync, and firewall Stake Pool Security	4	985	19 April 2021
ChainDB notice 442 Community Technical Support	6	360	22 May 2023
Relay's dropping peers on epoch transition Operate a Stake Pool	13	1299	19 August 2020

System time sync issue, causing relay wreck

Related topics