I just wanted to gauge the opinions on the topic as per title. I run the nodes as a systemd service and am used to them responding within roughly 15 seconds to a systemctl restart command. Since the update the time it takes is a minute or more, especially one node, which takes 5+ minutes.
I have explored the logs and found nothing sinister. One message is unusual though, and seems to point at the overall number of connected peers on one of the relays (15 in this case). Another observation is that the āslowerā node is the one responsible for running sendmytip. Lastly, I have also found that topologyUpdater is no longer updating the topology file it points to.
I didnāt observe that; tbh I didnāt cared how much it takes because I have 3 relays and are seted to restart at 4 hours differenceā¦ for me 1 minute will not be a real problemā¦
I looked in grafana and seems to have the same behavior with one of my node
regarding topology updater issue, check if the service is still active, if it pushed the config to topology files, etc
you can also check the logs with journalctl -e -f -u cnode-tu-fetch.service ( * cnode-tu-fetch.service : fetches a fresh topology file before cnode.service file is started/restarted)
Yes, it uses SIGINT as it always has. I have changed nothing except for the node software update.
Seeing as topologyUpdater also stopped working, i imagine it has to do with some of the metrics changes in the config. I have not had a chance to test this, but as soon as i do Iāll post my findings here.
Thank you all for your contribution so far, keep it coming! The more the merrier!
I have checked and topologyUpdater is still running and sending back a happy message in the logs. Iāll have to do some further digging to find out why itās output is not updating.
Cheers,
A
PS on a side note @Alexd1985, I appreciate what you say about the relaysā delay (pun intended), I have three also.
But, letās give the hypothetical were your BP goes down for whatever reason (power surge, kernel reboot, space-critters invasion, etc), I like to be reasonably sure that it will bounce back like a basketball, not after 5 minutes.
Mmm, ok, thatās interesting. This is my first experience of it. I remember there were some issues with kill signals but sice SIGINT was accepted as standard I have never had nodes take longer than 45 seconds to come back after a full reboot.
Anyway, your input is, as usual, greatly appreciated Alex . Iām sure weāll get to the bottom of it.
Just going to jump in here to say a big thank you to all commenting here and on other threads
To see fellow pool operators and the Cardano community as a whole, helping and supporting is such a beautiful thing.
There are a number of things that seem a ābit oddā at the moment with the world, so to have a space to come to and witness collaboration, support and goodwill to others is truly heart warming
As discussed, I finally got around to check what was going on, and it was just a -config file issue. All sorted. It also fixed the topologyUpdater thing.
hey @Alexd1985, nah, same HW everywhere. It seems like I had left some unwanted metrics in the new config file. This was causing topologyUpdater to have a hissy fit, and some confusion in the peer handling.
Itās all good now, back to 15-second bounces as usual
+1 to what @Alexd1985 said. Check out the ābuilding your nodeā section of the stake pool school. It is now up to date (albeit not all of it) and gives instructions how to update your software. Itās a rather quick process, particularly if you parallelize it over all your infrastructure.