Relay stuck on starting after upgrading to 1.34.1

poonasor · 29 March 2022 02:07

So on the old version, the node was syncing fine, and only after I’ve updated to 1.34.1 the node is now stuck on starting… it’s been doing that for about 24 hours…

Alexd1985 · 29 March 2022 03:34

Share the glive output.
cntools or coincashew?

type sudo systemctl status cnode or sudo systemctl status cardano-node and journalctl -e -f -u cnode or journalctl -e -f -u cardano-node Do u see any errors?

Cheers,

poonasor · 29 March 2022 04:44

cntools

here is the glive output:

systemctl status cnode:

journalctl -e -f -u cnode

I don’t think i see any errors

thanks, @Alexd1985

Alexd1985 · 29 March 2022 06:36

Ok, now go to db folder and rename ledger, immutable and volatile

mv immutable imutable_bkp
mv ledger ledger_db
mv volatile volatile_db

restart the node and check glive, it should start to sync again
then stop the node and go back to db folder

now u should have more files
delete the new files (NOT the bkp) ledger, immutable and volatile

rm -R ledger
rm -R immutable
rm -R volatile

now rename the bkp files

mv ledger_db ledger
mv immutable_bkp immutable
mv volatile_db volatile

restart the node

Check glive again

poonasor · 29 March 2022 23:40

so i’ve followed those steps and it was syncing once I’ve renamed the folders inside the /db/ folders

but once i’ve renamed the old files its stuck on starting again, its been 3 hours already

Could I copy the /db/ folder from a node that is working on the same version - would that help?

Alexd1985 · 30 March 2022 04:45

Then perhaps it needs more time to start, from which version did u upgraded to 1.34?

mkungla · 30 March 2022 11:05

I had similar issue and after hours of debugging without finding any reasonable cause I also opted to copy of the /db/ from other node running 1.34.1 and in sync.

first in healthy and synced node where I want to get the db - I stopped that node so that db would not corrupt while coping.
then in problematic node I ran

rm -rf db/
mkdir db && cd db
rsync -chavzP -e "ssh -p 22" <you--ssh-user>@<node-ip-or-host>:/ful/path/to/db/ .

then starting both nodes it took only some minutes when both nodes were syncing again

poonasor · 30 March 2022 16:49

i was upgrading from 1.33 - i’ve given it 20+ hours before, I’ll leave it overnight again

thank you

poonasor · 30 March 2022 16:49

i am going to give this a try and keep you posted - thank you

Alexd1985 · 30 March 2022 16:54

if u type top do u see the CPU ~ 100%?
type free -m or df -h

poonasor · 31 March 2022 05:16

Hey,

thank you this resolved my issue for me - I’ve copied the db files from my working node and its working normally now

thank you for this!

Gilberto · 14 July 2022 08:31

Hi,

I am having a similar issue.

The node syncs fine until epoch 200ish but then the CPU starts running at 200% and the synchronisation slow down massively.

I let it run for more than 72 hours the first time, then I stopped the node deleted the db and restarted the synchronisation from the beginning, but here I am 36 hours later having the same issue.

Unfortunately, I don’t have a full db that I can copy over from another source.

I checked the log and I can see a few of the following errors:

[vmi60218:cardano.node.ChainDB:Notice:36] [2022-07-14 08:27:10.28 UTC] Chain extended, new tip: 7f0d5fe98c7d1857793fac2e6c33802c8960c99132b9dfc4525f5e5ee9c9b3eb at slot 8765028
[vmi60218:cardano.node.ErrorPolicy:Warning:79] [2022-07-14 08:27:10.56 UTC] IP 99.18.45.153:23812 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError (MuxIOException writev: resource vanished (Connection reset by peer)) "(sendAll errored)"))) 20s 20s
[vmi60218:cardano.node.ErrorPolicy:Warning:79] [2022-07-14 08:27:11.10 UTC] IP 45.32.187.141:33251 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError (MuxIOException writev: resource vanished (Connection reset by peer)) "(sendAll errored)"))) 20s 20s
[vmi60218:cardano.node.IpSubscription:Error:982035] [2022-07-14 08:27:12.07 UTC] IPs: 0.0.0.0:0 [199.247.26.225:4002,94.104.125.117:3001,65.109.3.248:6000,161.97.84.25:6000,89.47.161.139:6006,15.222.244.219:3001,178.128.232.212:3001,3.20.137.144:3001,144.126.158.237:22378,164.90.189.93:6000,66.94.103.66:6000,118.140.168.102:3001,186.32.202.127:3003] Application Exception: 65.109.3.248:6000 ExceededTimeLimit (KeepAlive) (ServerAgency TokServer)
[vmi60218:cardano.node.IpSubscription:Info:982035] [2022-07-14 08:27:12.07 UTC] IPs: 0.0.0.0:0 [199.247.26.225:4002,94.104.125.117:3001,65.109.3.248:6000,161.97.84.25:6000,89.47.161.139:6006,15.222.244.219:3001,178.128.232.212:3001,3.20.137.144:3001,144.126.158.237:22378,164.90.189.93:6000,66.94.103.66:6000,118.140.168.102:3001,186.32.202.127:3003] Closed socket to 65.109.3.248:6000
[vmi60218:cardano.node.ErrorPolicy:Notice:52] [2022-07-14 08:27:12.07 UTC] IP 65.109.3.248:6000 ErrorPolicySuspendConsumer (Just (ApplicationExceptionTrace ExceededTimeLimit (KeepAlive) (ServerAgency TokServer))) 20s

The server spec are fine (6 cores, 16 GB ram) and the nodes have been running for more than 18 months.

I managed to mint my first block about 1 month ago but after upgrading from 1.33 to 1.34.1 the node stopped working.

gLiveView stats seems to be as usual and the Grafana dashboard doesn’t flag anything anomalous.

At this point I am running out of options.

Anyone has an idea of what it is happening?

Thanks

Alexd1985 · 14 July 2022 10:30

Probably the db needs to resync and it will take more days… if u have another node already synced u can download the db

Gilberto · 14 July 2022 13:03

Hi @Alexd1985 Thanks for the quick reply. I guess the only thing left to do is to wait a few days then.
Unfortunately, I don’t have a full db anywhere after I deleted all the copies a few days back.
doh!

Alexd1985 · 14 July 2022 13:07

Then … if u deleted the db… it will take more days to downlod it again

Gilberto · 23 August 2022 14:00

Hi,

after literally weeks of waiting for my node to synchronise, I stumbled across this comment which was the cause of my issue

I hope this will help

Topic		Replies	Views
My Relay node is starting Setup a Stake Pool	11	871	8 July 2022
Node not starting Setup a Stake Pool	17	660	28 February 2023
Cnode Relay stuck at "starting" Setup a Stake Pool	7	1354	7 March 2021
Node stuck starting Setup a Stake Pool	65	3578	23 November 2021
8.0.0 nodes won't start Operate a Stake Pool	2	419	2 June 2023

Relay stuck on starting after upgrading to 1.34.1

Related topics