Block Producer does not start after 1.34.1 Update

Hello Cardano comunity, need some advice here please.
Updated block production to 1.34.1 and it remains in the starting state.
Screenshot_1

When I check cnode service it is in up and running state.

● cnode.service - Cardano Node
     Loaded: loaded (/etc/systemd/system/cnode.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-05-25 20:13:11 UTC; 17h ago
   Main PID: 82863 (bash)
      Tasks: 35 (limit: 19035)
     Memory: 11.9G
     CGroup: /system.slice/cnode.service
             ├─82863 bash /opt/cardano/cnode/scripts/cnode.sh
             └─83508 /home/kaadmin/.cabal/bin/cardano-node +RTS -N8 -RTS run --topology /opt/cardano/cnode/files/topology.json --config /opt/cardano/cnode/files/config.json --database-path /opt/cardano/cnode/db --socket-path /opt/cardan>

May 25 20:13:11 spadabp01 systemd[1]: Started Cardano Node.
May 25 20:13:13 spadabp01 cnode[83508]: Listening on http://0.0.0.0:12798

Results for journalctl -e -f -u cnode are the following.

– Reboot –

May 25 19:42:48 spadabp01 systemd[1]: Started Cardano Node.
May 25 19:42:51 spadabp01 cnode[1575]: Listening on http://0.0.0.0:12798
May 25 20:05:02 spadabp01 systemd[1]: Stopping Cardano Node...
May 25 20:05:02 spadabp01 systemd[1]: cnode.service: Control process exited, code=killed, status=2/INT
May 25 20:05:02 spadabp01 cnode[1575]: Shutting down..
May 25 20:05:02 spadabp01 cnode[1575]: Node configuration: NodeConfiguration {ncSocketConfig = SocketConfig {ncNodeIPv4Addr = Last {getLast = Just 0.0.0.0}, ncNodeIPv6Addr = Last {getLast = Nothing}, ncNodePortNumber = Last {getLast = Just 6000}, ncSocketPath = Last {getLast = Just "/opt/cardano/cnode/sockets/node0.socket"}}, ncConfigFile = "/opt/cardano/cnode/files/config.json", ncTopologyFile = "/opt/cardano/cnode/files/topology.json", ncDatabaseFile = "/opt/cardano/cnode/db", ncProtocolFiles = ProtocolFilepaths {byronCertFile = Nothing, byronKeyFile = Nothing, shelleyKESFile = Just "/opt/cardano/cnode/priv/pool/Kairos1/hot.skey", shelleyVRFFile = Just "/opt/cardano/cnode/priv/pool/Kairos1/vrf.skey", shelleyCertFile = Just "/opt/cardano/cnode/priv/pool/Kairos1/op.cert", shelleyBulkCredsFile = Nothing}, ncValidateDB = False, ncShutdownConfig = ShutdownConfig {scIPC = Nothing, scOnSlotSynced = Just NoMaxSlotNo}, ncProtocolConfig = NodeProtocolConfigurationCardano (NodeByronProtocolConfiguration {npcByronGenesisFile = "/opt/cardano/cnode/files/byron-genesis.json", npcByronGenesisFileHash = Nothing, npcByronReqNetworkMagic = RequiresNoMagic, npcByronPbftSignatureThresh = Nothing, npcByronApplicationName = ApplicationName {unApplicationName = "cardano-sl"}, npcByronApplicationVersion = 1, npcByronSupportedProtocolVersionMajor = 3, npcByronSupportedProtocolVersionMinor = 0, npcByronSupportedProtocolVersionAlt = 0}) (NodeShelleyProtocolConfiguration {npcShelleyGenesisFile = "/opt/cardano/cnode/files/genesis.json", npcShelleyGenesisFileHash = Nothing}) (NodeAlonzoProtocolConfiguration {npcAlonzoGenesisFile = "/opt/cardano/cnode/files/alonzo-genesis.json", npcAlonzoGenesisFileHash = Just "7e94a15f55d1e82d10f09203fa1d40f8eede58fd8066542cf6566008068ed874"}) (NodeHardForkProtocolConfiguration {npcTestEnableDevelopmentHardForkEras = False, npcTestShelleyHardForkAtEpoch = Nothing, npcTestShelleyHardForkAtVersion = Nothing, npcTestAllegraHardForkAtEpoch = Nothing, npcTestAllegraHardForkAtVersion = Nothing, npcTestMaryHardForkAtEpoch = Nothing, npcTestMaryHardForkAtVersion = Nothing, npcTestAlonzoHardForkAtEpoch = Nothing, npcTestAlonzoHardForkAtVersion = Nothing}), ncDiffusionMode = InitiatorAndResponderDiffusionMode, ncSnapshotInterval = DefaultSnapshotInterval, ncTestEnableDevelopmentNetworkProtocols = False, ncMaxConcurrencyBulkSync = Nothing, ncMaxConcurrencyDeadline = Just 2, ncLoggingSwitch = True, ncLogMetrics = True, ncTraceConfig = TracingOnLegacy (TraceSelection {traceVerbosity = NormalVerbosity, traceAcceptPolicy = OnOff {isOn = False}, traceBlockFetchClient = OnOff {isOn = True}, traceBlockFetchDecisions = OnOff {isOn = True}, traceBlockFetchProtocol = OnOff {isOn = True}, traceBlockFetchProtocolSerialised = OnOff {isOn = True}, traceBlockFetchServer = OnOff {isOn = True}, traceBlockchainTime = OnOff {isOn = False}, traceChainDB = OnOff {isOn = True}, traceChainSyncBlockServer = OnOff {isOn = True}, traceChainSyncClient = OnOff {isOn = True}, traceChainSyncHeaderServer = OnOff {isOn = True}, traceChainSyncProtocol = OnOff {isOn = True}, traceConnectionManager = OnOff {isOn = True}, traceConnectionManagerCounters = OnOff {isOn = True}, traceConnectionManagerTransitions = OnOff {isOn = False}, traceDebugPeerSelectionInitiatorTracer = OnOff {isOn = False}, traceDebugPeerSelectionInitiatorResponderTracer = OnOff {isOn = False}, traceDiffusionInitialization = OnOff {isOn = False}, traceDnsResolver = OnOff {isOn = False}, traceDnsSubscription = OnOff {isOn = True}, traceErrorPolicy = OnOff {isOn = True}, traceForge = OnOff {isOn = True}, traceForgeStateInfo = OnOff {isOn = True}, traceHandshake = OnOff {isOn = False}, traceInboundGovernor = OnOff {isOn = True}, traceInboundGovernorCounters = OnOff {isOn = True}, traceInboundGovernorTransitions = OnOff {isOn = True}, traceIpSubscription = OnOff {isOn = True}, traceKeepAliveClient = OnOff {isOn = False}, traceLedgerPeers = OnOff {isOn = False}, traceLocalChainSyncProtocol = OnOff {isOn = True}, traceLocalConnectionManager = OnOff {isOn = False}, traceLocalErrorPolicy = OnOff {isOn = True}, traceLocalHandshake = OnOff {isOn = False}, traceLocalInboundGovernor = OnOff {isOn = False}, traceLocalMux = OnOff {isOn = False}, traceLocalRootPeers = OnOff {isOn = False}, traceLocalServer = OnOff {isOn = False}, traceLocalStateQueryProtocol = OnOff {isOn = False}, traceLocalTxMonitorProtocol = OnOff {isOn = False}, traceLocalTxSubmissionProtocol = OnOff {isOn = True}, traceLocalTxSubmissionServer = OnOff {isOn = True}, traceMempool = OnOff {isOn = True}, traceMux = OnOff {isOn = False}, tracePeerSelection = OnOff {isOn = True}, tracePeerSelectionCounters = OnOff {isOn = True}, tracePeerSelectionActions = OnOff {isOn = True}, tracePublicRootPeers = OnOff {isOn = False}, traceServer = OnOff {isOn = False}, traceTxInbound = OnOff {isOn = False}, traceTxOutbound = OnOff {isOn = False}, traceTxSubmissionProtocol = OnOff {isOn = False}, traceTxSubmission2Protocol = OnOff {isOn = False}}), ncMaybeMempoolCapacityOverride = Nothing, ncProtocolIdleTimeout = 5s, ncTimeWaitTimeout = 60s, ncAcceptedConnectionsLimit = AcceptedConnectionsLimit {acceptedConnectionsHardLimit = 512, acceptedConnectionsSoftLimit = 384, acceptedConnectionsDelay = 5s}, ncTargetNumberOfRootPeers = 100, ncTargetNumberOfKnownPeers = 100, ncTargetNumberOfEstablishedPeers = 50, ncTargetNumberOfActivePeers = 20, ncEnableP2P = DisabledP2PMode}
May 25 20:05:02 spadabp01 systemd[1]: cnode.service: Failed with result 'signal'.
May 25 20:05:02 spadabp01 systemd[1]: Stopped Cardano Node.
May 25 20:13:11 spadabp01 systemd[1]: Started Cardano Node.
May 25 20:13:13 spadabp01 cnode[83508]: Listening on http://0.0.0.0:12798

This is where I am not sure what is going on, please share your experience if you came across that issue.
Thank you in advance

type free -m and check the RAM, is there enough? did u also updated the env file or any other files?
it looks like you don’t have out peers … can u check if the relay is accessible from BP?

from BP try

telnet Relay_IP Relay_cnode_port eg telnet 192.168.1.10 6000

you should see connected

Hey Alex, Thanks for the tip.
Mem is 5 GB free BP is able to connect to both relays. What I found is that my topology.json on Block Producer was completely off, like is not my file. with unknown ip’s. Very strange. So I deleted it and recreated now I am up and running. But now I wonder how on earth that file got changed.

1 Like

The problem still exists, worked for 10 min and came back to Starting state.
journalctl -e -f -u cnode shows the following:

May 26 15:44:34 spadabp01 systemd[1]: cnode.service: Failed with result 'signal'.
May 26 15:44:34 spadabp01 systemd[1]: Stopped Cardano Node.
-- Reboot --
May 26 15:44:41 spadabp01 systemd[1]: Started Cardano Node.
May 26 15:44:44 spadabp01 cnode[1572]: Listening on http://0.0.0.0:12798

image
This time it does not see it is Block Producer

I also noticed in other posts that you have to rotate keys after 1.34.1 update so I am trying to do so and getting this

 >> POOL >> ROTATE KES

Select pool to rotate KES keys on

Selected pool: Kairos1
Command failed: node issue-op-cert Error: /opt/cardano/cnode/priv/pool/Kairos1/cold.skey: /opt/cardano/cnode/priv/pool/Kairos1/cold.skey: openBinaryFile: permission denied (Permission denied)

press any key to proceed …

changed ownership of cold.skey using [Chown] and after that was able to rotate my keys. rebooted my BP node and it is in starting state. I guess need to wait a bit longer. Will update shortly.

journalctl -e -f -u cnode shows the following

May 26 16:39:30 spadabp01 systemd[1]: cnode.service: Failed with result 'signal'.
May 26 16:39:30 spadabp01 systemd[1]: Stopped Cardano Node.
-- Reboot --
May 26 16:39:37 spadabp01 systemd[1]: Started Cardano Node.
May 26 16:39:38 spadabp01 cnode[754]: ERROR: You specified 12788 as your EKG port, but it looks like the cardano-node (PID: 1216 ) is not listening on this port. Please update the config or kill the conflicting process first.
May 26 16:39:40 spadabp01 cnode[1412]: Listening on http://0.0.0.0:12798

hmmm, if the KES are valid you don’t need to update them… now, try to reboot the server sudo reboot

and let me know if the node will start this time

I guess you didn’t check the date when it was modified right?

Rebooted
It’s been 30 min still starting as relay
image

ok, you will need to perform a WA:

coincashew or cntools?

cntools… wait 1 min

sometimes, when you stop/start/restart the node often you will hit this issue and you will need to perform the steps

  • stop the node
sudo systemctl stop cnode
  • rename ledger, immutable and volatile folders
cd $CNODE_HOME/db
ls -l
mv immutable imm
mv ledger led
mv volatile vol
ls -l
  • start/stop the node
sudo systemctl start cnode
stop the node after 10 sec
sudo systemctl stop cnode
  • you should also see now the new folders ledger, immutable and volatile
    delete the new folders created (not the old one renamed)
ls -l
rm -R ledger
rm -R immutable
rm -R volatile
ls -l
  • rename back the original folders
mv imm immutable
mv led ledger
mv vol volatile
ls -l
  • start the node and check glive (you should see now Mem RSS slowly increasing)
sudo systemctl start cnode
cd ..
cd scripts
./gLiveView.sh

question, are u using or used topology updater on BP? I know topology updater can modify the topology.json file

2 Likes

Hey Alex, did as you advised, BP started in 10 min, so far up and running, will monitor
image

1 Like

So it started, was working for 15 min than my RAM spiked to 100% so as SWAP and dropped again.
That is odd I have 16 GB of RAM configured for BP.
Sitting starting again
image

type free -m

          total        used        free      shared  buff/cache   available

Mem: 15951 535 169 0 15246 15105
Swap: 4095 48 4047

Right now it is all free as it does not doo anything just seating there.

try to download the new/latest scripts

cd ~/tmp
./prereqs.sh

stop/start the node

also check the cncli service if it is running

sudo systemctl status | grep cncli

u can type top and check what is consuming the RAM

are u trying to run cncli when the node is up and running?

I am not able to check it now as it does not start.