Relay stuck on starting after upgrading to 1.34.1

So on the old version, the node was syncing fine, and only after I’ve updated to 1.34.1 the node is now stuck on starting… it’s been doing that for about 24 hours…

1 Like

Share the glive output.
cntools or coincashew?

type sudo systemctl status cnode or sudo systemctl status cardano-node and journalctl -e -f -u cnode or journalctl -e -f -u cardano-node Do u see any errors?

Cheers,

1 Like

cntools

here is the glive output:
image

systemctl status cnode:
image

journalctl -e -f -u cnode
image

I don’t think i see any errors

thanks, @Alexd1985

1 Like

Ok, now go to db folder and rename ledger, immutable and volatile

mv immutable imutable_bkp
mv ledger ledger_db
mv volatile volatile_db

restart the node and check glive, it should start to sync again
then stop the node and go back to db folder

now u should have more files
delete the new files (NOT the bkp) ledger, immutable and volatile

rm -R ledger
rm -R immutable
rm -R volatile

now rename the bkp files

mv ledger_db ledger
mv immutable_bkp immutable
mv volatile_db volatile

restart the node

Check glive again

1 Like

so i’ve followed those steps and it was syncing once I’ve renamed the folders inside the /db/ folders

but once i’ve renamed the old files its stuck on starting again, its been 3 hours already

Could I copy the /db/ folder from a node that is working on the same version - would that help?

Then perhaps it needs more time to start, from which version did u upgraded to 1.34?

I had similar issue and after hours of debugging without finding any reasonable cause I also opted to copy of the /db/ from other node running 1.34.1 and in sync.

  1. first in healthy and synced node where I want to get the db - I stopped that node so that db would not corrupt while coping.
  2. then in problematic node I ran
rm -rf db/
mkdir db && cd db
rsync -chavzP -e "ssh -p 22" <you--ssh-user>@<node-ip-or-host>:/ful/path/to/db/ .
  1. then starting both nodes it took only some minutes when both nodes were syncing again
2 Likes

i was upgrading from 1.33 - i’ve given it 20+ hours before, I’ll leave it overnight again

thank you

i am going to give this a try and keep you posted - thank you

if u type top do u see the CPU ~ 100%?
type free -m or df -h

Hey,

thank you this resolved my issue for me - I’ve copied the db files from my working node and its working normally now

thank you for this!

1 Like

Hi,

I am having a similar issue.

The node syncs fine until epoch 200ish but then the CPU starts running at 200% and the synchronisation slow down massively.

I let it run for more than 72 hours the first time, then I stopped the node deleted the db and restarted the synchronisation from the beginning, but here I am 36 hours later having the same issue.

Unfortunately, I don’t have a full db that I can copy over from another source.

I checked the log and I can see a few of the following errors:

[vmi60218:cardano.node.ChainDB:Notice:36] [2022-07-14 08:27:10.28 UTC] Chain extended, new tip: 7f0d5fe98c7d1857793fac2e6c33802c8960c99132b9dfc4525f5e5ee9c9b3eb at slot 8765028
[vmi60218:cardano.node.ErrorPolicy:Warning:79] [2022-07-14 08:27:10.56 UTC] IP 99.18.45.153:23812 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError (MuxIOException writev: resource vanished (Connection reset by peer)) "(sendAll errored)"))) 20s 20s
[vmi60218:cardano.node.ErrorPolicy:Warning:79] [2022-07-14 08:27:11.10 UTC] IP 45.32.187.141:33251 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError (MuxIOException writev: resource vanished (Connection reset by peer)) "(sendAll errored)"))) 20s 20s
[vmi60218:cardano.node.IpSubscription:Error:982035] [2022-07-14 08:27:12.07 UTC] IPs: 0.0.0.0:0 [199.247.26.225:4002,94.104.125.117:3001,65.109.3.248:6000,161.97.84.25:6000,89.47.161.139:6006,15.222.244.219:3001,178.128.232.212:3001,3.20.137.144:3001,144.126.158.237:22378,164.90.189.93:6000,66.94.103.66:6000,118.140.168.102:3001,186.32.202.127:3003] Application Exception: 65.109.3.248:6000 ExceededTimeLimit (KeepAlive) (ServerAgency TokServer)
[vmi60218:cardano.node.IpSubscription:Info:982035] [2022-07-14 08:27:12.07 UTC] IPs: 0.0.0.0:0 [199.247.26.225:4002,94.104.125.117:3001,65.109.3.248:6000,161.97.84.25:6000,89.47.161.139:6006,15.222.244.219:3001,178.128.232.212:3001,3.20.137.144:3001,144.126.158.237:22378,164.90.189.93:6000,66.94.103.66:6000,118.140.168.102:3001,186.32.202.127:3003] Closed socket to 65.109.3.248:6000
[vmi60218:cardano.node.ErrorPolicy:Notice:52] [2022-07-14 08:27:12.07 UTC] IP 65.109.3.248:6000 ErrorPolicySuspendConsumer (Just (ApplicationExceptionTrace ExceededTimeLimit (KeepAlive) (ServerAgency TokServer))) 20s

The server spec are fine (6 cores, 16 GB ram) and the nodes have been running for more than 18 months.

I managed to mint my first block about 1 month ago but after upgrading from 1.33 to 1.34.1 the node stopped working.

gLiveView stats seems to be as usual and the Grafana dashboard doesn’t flag anything anomalous.

At this point I am running out of options.

Anyone has an idea of what it is happening?

Thanks

Probably the db needs to resync and it will take more days… if u have another node already synced u can download the db

Hi @Alexd1985 Thanks for the quick reply. I guess the only thing left to do is to wait a few days then.
Unfortunately, I don’t have a full db anywhere after I deleted all the copies a few days back.
doh!

Then … if u deleted the db… it will take more days to downlod it again

1 Like

Hi,

after literally weeks of waiting for my node to synchronise, I stumbled across this comment which was the cause of my issue

I hope this will help

2 Likes