1.35.3 Relay sync days, even with csnapshots.io

I’m scratching my head on this one. I have a relay I have upgraded to 1.35.3 and the sync is taking days. I deleted my cnode db folder (I use the cntools flow, compile the binaries from source), followed the download and extraction instructions from csnapshots and the node is still taking over 2d to sync.

To be clear, I compiled the new binaries, rebooted, stopped the node, deleted the db folder, downloaded and extracted the latest csnapshots release. I checked all the timestamps in the db folder - they were current - and the db dir size was 77GB. This all lines up with expectations. I restarted the node and after 5 mins it was about 40% synced, which is what I would expect. 11 hours later, it’s saying 52.3% synced and there are 2d+ remaining.

So, I’m not sure what the problem is here. I see no warnings, no errors - everything looks fine. I have plenty of disk, cpu, and ram - I don’t think any of that is an issue. I have incoming and outgoing connects as expected, my latency is good, 95% of blocks propagated within 1s, the Tip is happy in the green, etc.

If anyone has any suggestions as to what could be happening, I’d appreciate it. Thanks.

Hi,

Mainnet or testnet?
Did u download the db inside the correct folder/correct path from inside startup script?

Cheers,

Hi Alex,

Yah, I am pretty sure I did. For cntools install the db folder is /opt/cardano/cnode/ and I deleted the db folder completly; then did the cspanpshots curl and verified the db fully downloaded and extracted as expected.

This is for mainnet.

Ok, but u must wait… inside glive u should see Mem RSS slowly increase … and starting not syncing… if u see syncing then something is not ok and the db is re-syncing

The Mem RSS is definitely increasing… I’ve attached a screenshot.

1-35-3-relay

I don’t see any issues… the node is up and running. U have only 2 OUT peers :thinking: wondering why…

Re: the out peers, I think another guy on our team locked down the relay to only feed one of our BP’s and our other relay. I will check on that - yah, doesn’t make sense maybe…

Re: the sync time, then for some reason, this is still an issue. I’m very hesitant to update our BP’s if it takes almost a whole epoch to sync… I’ll have to keep digging I guess.

I followed the instructions on csnapshots and I think I saw other people say their node was synced this way in minutes, not hours or days. I did check the folder before I started the node, and like I said, all the files seemed to be in the correct places and 77GB db dir size seemed to be correct.

Thanks for your help and if I figure out why, I will post an update.

ok, but upgrading to 1.35.3 will re-read all db (thats why it will take more time to start for the first time; also depends of HW configuration of the node) so this time for BP if u wanna accelerate the process u can copy the db from Relay

Cheers,

What’s the point then of loading in the db snapshot? My assumption was using the db from csnapshots will bring it very close and the sync would only take a few mins (depending on where the snapshot ends, of course).

I must be missing something with this whole process… :confused:

If you already have a fully synced BP node, why don’t you just use rsync to copy the db folder over to any new node. You can save a lot of time.

Indeed, if you have a fully synced and started 1.35.3 node, you can copy to DB to another node, and your upgrade / startup time should be around 4 mins on the other node.

I don’t have a fully synced BP yet - I always upgrade the relays first just in case.

My original question/issue was that it didn’t seem like csnapshots.io works. They provide a synced db up unto some recent point in time, but I couldn’t get it to work as expected.

BP or Relay same thing… the DB is the same

Right, makes sense.

I think the discussion got a little off track. I tried the cnsnapshots on a relay because if it didn’t work (which it does not for me), then I would only have a relay that needs to take the hit. If I did this for our BP (which has been producing blocks since mainnet went online), then we’d potentially lose all or most of an epoch.

My relay has a day left to sync (sigh). I will then take that db, copy it over to our second relay and see if it sync’s faster.

We have a standard cntools install - we have from the beginning. Our db is in the standard place and we don’t have any special config. I’m just scratching my head as to why cnsnapshots did not work. It makes perfect sense, the files were all copied correctly, and it seemed like it did sync about 40% very quickly (within minutes). And that’s what I would have expected from the entire startup.

But once it hit 40%, it’s like it didn’t see anything after that and continued with the network sync. So, this is what I was trying to figure out.

Thanks again for taking the time to respond.

I started with a single server and installed the node using cntools but I did not have any fully synced node at that time so I started the node and waited. I waited for almost 2 days and it was still at 94%. I don’t see any issue on gLiveView.sh so I thought it’s normal. But I have an almost 100% synced virtual machine so I just used that to copy the db folder to the vps server. I only have a 10Mbs internet bandwidth at home so it took me around an hour or so to fully synced. :slight_smile:

I recently built (not upgrade) a relay on 1.53.3 and used csnapshot and worked fine.

I have a standard CNTOOLS install, and I remember tweaking the published commands on csnaphsots to work for my cntools install.

Also, the screenshot you posted earlier looks like a node that is fully synced and running. I will post a reference screenshot of a fully started gLiveView, and a syncing gLiveView. I have circled the syncing in GREEN, and the Epoch Countdown in Blue. The Epoch Countdown is only showing how much time is left in the present epoch.

I apologize in advance if I am misunderstanding you, but I’m only trying to help!

syncvsepoch

Wow… And there’s the problem. :man_facepalming: I knew I was missing something simple - none of this added up. I never had issues before like this - all the node operations (even compiling source for each release) pretty much always worked. So on this one I was thinking W…T…F…!!!

You are absolutely right - the sync did work. I was totally thinking the Epoch countdown was the node sync for some dumb reason. Tbh, I’m on another somewhat stressful proj that’s gone over delivery date, so my mind is not really here on the node stuff.

Thanks so much for everyone sticking with this and helping out. It was operator error (of course). I will use the snapshots on our second relay today and if all goes well, probably upgrade our bp’s tomorrow. Thanks again and have a good weekend! :pray:

1 Like

Happy to have helped, and hopefully this takes a little stress off your plate! We are all humans and sometimes miss things. Have a great weekend!!

1 Like

Fyi ours at plebpool.com took 8 hours to sync