The information at Issuing a New Operational Certificate - CoinCashew is worth reading carefully. Prior to Vasil, as we know, the value of node.counter for an operational certificate may be any value that you choose, as long as the value is greater than the highest value that you used previously. Issuing a new operational certificate automatically increments the value of node.counter by one (1) in anticipation of issuing the next certificate. After Vasil, the node.counter value must be consistent with the current counter value for your stake pool registered on the blockchain by the protocol. If the node.counter value of your operational certificate does not correspond with the counter value that the protocol maintains, then the network will not accept blocks that your stake pool may mint because the operational certificate is invalid. If your stake pool mints at least one block using each operational certificate that you issue, then your pool is less affected by the changing functionality related to operational certificates after the Vasil hard fork. If your pool mints blocks less frequently, then after Vasil you need to get into the habit of setting the node.counter value manually when issuing a new operational certificate.
Also, Cardano Node 1.35.3 seems to require the libsecp256k1 library installed on the offline, air-gapped computer as well. There are some approaches for installing the libsecp256k1 library without connecting to the Internet in the post at Updating Offline Nodes The same information is scheduled to be included in the next release of the Coin Cashew guide as well.
Without it, I get an error that libsecp256k1.so cannot be found. I see that the CoinCashew guide solves this differently by making a link to that file in the /usr/lib/ folder (I donāt know why their file ends in .23 instead of .0 though).
I had the same problem of wget 502 errors. To resolve, I moved the configuration files back from .bak to .json, and then continued rest of steps with the original config files (I didnāt update the config files)
i also had a problem updating .gLiveView that early in the process, so did that after I completed the upgrade instead.
I donāt understand this direction actuallyā¦
Iāve ran my systems from Jan 28 to Aug 27th without restarting any service until today when Iām upgrading, and continued to produce blocks the entire timeā¦ so not sure why this recommendation for topologyUpdater.
On the one hand, for getting your produced blocks distributed the incoming connections are more important, since they pull the blocks from you. And for incoming connections, it is only important that you ping the topology updater service every hour.
On the other hand, some of the pools that you got from the topology updater service half a year ago might still be there. So, it will somehow work even without restarting. You will still have some outgoing connections.
But if everybody did it that way, new pools would have a hard time getting incoming connections at all. They register with topology updater, topology updater will put them in their database, but if nobody restarts regularly and uses that information, nobody will connect to them.
That being said, things will hopefully move to P2P mode soon, eliminating the need for topology updater entirely.
Thanks. Great guide. All nodes are running!
1 - I did not update Libsodium
2 - I had to revert to the original Configuration files so that Grafana reporting wasnāt affected.
Just a heads up. Missed slots have increased DRAMATICALLY from this update. A way you can resolve this is adding āSnapshotIntervalā: 86400(core) & āSnapshotIntervalā: 43200(relay) to your mainnet-config.json file and restarting your node.
Do you know what these default to if they are left unset?
I assume by ācoreā you mean block producer node. I donāt understand how changing something on your relay will affect missed slot checks on your block producer?
My understanding around missed slot checks centres on the processor being tied up doing memory allocation + garbage collection and hence is unavailable for the leadership check right at the precise tick required. When a snapshot occurs, this involves significant allocation of new memory and garbage collection so this causes the tying up of the processor.
Snapshots still need to happen though, so how does changing the timing interval fix the problem as opposed to just changing its timing?
So it is 1 hour by default I believe.And yes core is block producer. Changing the SnapShot Interval makes it happen on a less frequent cadence. And the only effect I have noticed is that the node takes about one more min to start up because the snapshot is a bit older. I have ran this fix now for a whole epoch and have noticed my missed slot percentage drop for 0.23% ā 0.03%
Again this is just a quick fix suggested by a member of our xSPO group and a majority of us are running it now with no issues. If you have a better solution would love to hear it!
I found that running with RTS settings ā-N --nonmoving-gcā enabled my block producer to have 0 missed slot leadership checks. However, this also means that I have to restart cardano-node every day otherwise it gradually uses more memory and eventually gets killed by the OS for causing an out of memory condition. This is because the --nonmoving-gc in Haskell 8.10.7 doesnāt release memory back to the OS. Apparently this is fixed in Haskell 9.2.x but it is not possible to compile cardano-node with this version yet.
So, I just keep restarting my node and wait hopefully for the Cardano devs to refactor the code so that it will compile with ghc 9.2.x one day.
Maybe it could be possible for the code to be refactored so the slot leadership check was done in a separate thread not dependent on waiting for the garbage collector???