Nodes slower to restart following 1.25.1 update, anyone?

Adrem · 2 February 2021 10:25

Hi all,

I just wanted to gauge the opinions on the topic as per title. I run the nodes as a systemd service and am used to them responding within roughly 15 seconds to a systemctl restart command. Since the update the time it takes is a minute or more, especially one node, which takes 5+ minutes.

I have explored the logs and found nothing sinister. One message is unusual though, and seems to point at the overall number of connected peers on one of the relays (15 in this case). Another observation is that the “slower” node is the one responsible for running sendmytip. Lastly, I have also found that topologyUpdater is no longer updating the topology file it points to.

Has anyone else experienced similar issues?

Cheers,

A

Alexd1985 · 2 February 2021 10:46

Hello,

I didn’t observe that; tbh I didn’t cared how much it takes because I have 3 relays and are seted to restart at 4 hours difference… for me 1 minute will not be a real problem…

I looked in grafana and seems to have the same behavior with one of my node

regarding topology updater issue, check if the service is still active, if it pushed the config to topology files, etc

you can also check the logs with journalctl -e -f -u cnode-tu-fetch.service ( * cnode-tu-fetch.service : fetches a fresh topology file before cnode.service file is started/restarted)

cheers,

bjorn · 3 February 2021 06:51

Probably needless to say, but does the service use SIGINT kill signal?

Adrem · 3 February 2021 07:23

Yes, it uses SIGINT as it always has. I have changed nothing except for the node software update.

Seeing as topologyUpdater also stopped working, i imagine it has to do with some of the metrics changes in the config. I have not had a chance to test this, but as soon as i do I’ll post my findings here.

Thank you all for your contribution so far, keep it coming! The more the merrier!

Cheers A

Adrem · 3 February 2021 10:21

thank you @Alexd1985 and @bjorn,

I have checked and topologyUpdater is still running and sending back a happy message in the logs. I’ll have to do some further digging to find out why it’s output is not updating.

Cheers,

A

PS on a side note @Alexd1985, I appreciate what you say about the relays’ delay (pun intended), I have three also.

But, let’s give the hypothetical were your BP goes down for whatever reason (power surge, kernel reboot, space-critters invasion, etc), I like to be reasonably sure that it will bounce back like a basketball, not after 5 minutes.

Alexd1985 · 3 February 2021 10:31

u are right, maybe they are aware about this behavior and it will be fixed on next update…

I remember I have this behavior also on older version, but they fixed in 1.24.2 or 1.23.x (don’t remember well)

you said you have a node which takes 5+ minutes… has a different hw configuration?

Adrem · 3 February 2021 11:15

Mmm, ok, that’s interesting. This is my first experience of it. I remember there were some issues with kill signals but sice SIGINT was accepted as standard I have never had nodes take longer than 45 seconds to come back after a full reboot.

Anyway, your input is, as usual, greatly appreciated Alex . I’m sure we’ll get to the bottom of it.

Cheers, A

MachTwo · 3 February 2021 12:19

Just going to jump in here to say a big thank you to all commenting here and on other threads

To see fellow pool operators and the Cardano community as a whole, helping and supporting is such a beautiful thing.

There are a number of things that seem a ‘bit odd’ at the moment with the world, so to have a space to come to and witness collaboration, support and goodwill to others is truly heart warming

Adrem · 6 February 2021 11:36

Hi all,

thank you for your contributions

As discussed, I finally got around to check what was going on, and it was just a -config file issue. All sorted. It also fixed the topologyUpdater thing.

All the best,

A

Adrem · 6 February 2021 11:40

hey @Alexd1985, nah, same HW everywhere. It seems like I had left some unwanted metrics in the new config file. This was causing topologyUpdater to have a hissy fit, and some confusion in the peer handling.

It’s all good now, back to 15-second bounces as usual

Thank you for your time,

A

Juraj_Spindor · 8 February 2021 07:23

Hi, do i need to update each time, when there is new version? can it be done automatically? I have 1.24.2 version. I am stake pool operator.

Alexd1985 · 8 February 2021 07:30

Hello,

Nope, as a pool operator you need to prepare/do these updates manually each time an update is required.

Cheers,

Adrem · 8 February 2021 08:36

+1 to what @Alexd1985 said. Check out the “building your node” section of the stake pool school. It is now up to date (albeit not all of it) and gives instructions how to update your software. It’s a rather quick process, particularly if you parallelize it over all your infrastructure.

I’m on mobile but will post links later.

Cheers A

PS Installing Cardano-node - Stake pool course

The section you’re interested in is down the bottom. Checkout with version 1.25.1

Topic		Replies	Views
How is the topology actually updated? Operate a Stake Pool	16	967	16 October 2021
Topology updates and relay restarts Operate a Stake Pool	0	343	31 May 2021
Relay suddenly removed from networks topology Operate a Stake Pool	15	668	8 August 2021
Cardano-node 1.25.1 released Operate a Stake Pool	92	2555	31 January 2021
Relays are not processing transactions after 1.29.0 upgrade Operate a Stake Pool	22	1137	12 September 2021

Nodes slower to restart following 1.25.1 update, anyone?

Related topics