Node sync suddenly extremely slow after half a day of syncing (testnet)

kitsunecrypto · 18 August 2021 13:10

Hello,

First time post in this forum! I have successfully set up and started a block producing node on testnet using this guide for the most part. Guide: How to build a Cardano Stake Pool - CoinCashew . I would say that the only major difference is that I decided to dockerize everything, so the cardano node process is actually running inside of a docker container, which is being run inside of a Compute Engine VM running Ubuntu. (Hardware: 2 cores, 4 gb mem, 60 gb standard disk - obviously I’d do higher in a prod/mainnet environment)

After starting, I noticed the syncing was going quite fast (300-400 slots processed per second). After about 18 hours (reaching approx slot #30000000+), I’ve noticed a significant slowdown in syncing (now about 1 per second!). I have plenty of disk space left, CPU is only at 10% utilisation. At this rate it will almost never finish syncing!

Any idea what could be going on? I can provide more info as needed. Advice much appreciated, thank you.

DevJohn · 18 August 2021 15:14

4GB is low, you need to have at least 8GB, preferable 16GB.

My nodes are using 7.4GB RAM each, and all of they have 16GB.

kitsunecrypto · 18 August 2021 16:34

Thanks. I upgraded to 4 cores + 16gb on both block producer and relay and will update here tomorrow.

DevJohn · 18 August 2021 18:28

Perfect, thinks will run much better now. Let us know how it goes

kitsunecrypto · 19 August 2021 03:41

HI @DevJohn and all, so I did upgrade to 4 cores + 16gb ram each on both the relay node and the block producing node, and I did notice a real speed increase in the beginning for the blockchain sync, however after reaching Epoch 151-ish, it’s slowing down tremendously again.

I attached a screenshot of gLiveView on the block producing node, and some infra metrics of the VM so you can see that we’re not constrained on resources.

Any other info I can provide? I’m really perplexed by why there would be a sudden slowdown.

DevJohn · 19 August 2021 09:15

I can see something wrong in your gLiveView. Now this is the image of your relay, and it only shows 1 in and out, that’s bad news! If you did everything correctly, you should have much more. For example: One of my relays have “23 Out / 17 In”.

That means, there is potentially something wrong in your topologyUpdater. Visit this section 14. Configure your topology files and make sure you have everything right.

You need to do the above on all of your relays if you have more than one.

Let me know how it goes.

kitsunecrypto · 19 August 2021 12:11

First of all, thank you very much @DevJohn and all. The support in this community is amazing, and I hope to pay this back to others shortly!

So a few clarifications:

the gLiveView I showed is in fact the block producing node’s (even though it says relay at the top). I attached an image to show the nodes side by side (block producing is on left and relay is on right).
Screen Shot 2021-08-19 at 7.47.42 AM3356×1472 478 KB
. I don’t know why it says “relay” on both of them. Even though I followed CoinCashew’s instructions exactly, is it possible they missed a step and my “env” file for gLiveView is configured incorrectly? I notice that the “#TOPOLOGY” entry in that file is commented out for example.
I have not gotten to Step 14 yet. I’m still on Step 8, which if I read correctly, is indicating that I should be able to fully sync the blockchain at this point.
I checked and each node is running their expected entrypoint files (startBlockProducingNode.sh and startRelayNode1.sh, respectively).
Here are my topology files:
Block Producing testnet-topology.json

{
    "Producers": [
      {
        "addr": "35.X.X.X <relay node public IP>",
        "port": 6000,
        "valency": 1
      }
    ]
  }

Relay testnet-topology.json

{
    "Producers": [
      {
        "addr": "104.X.X.X <block producing node public IP>",
        "port": 6000,
        "valency": 1
      },
      {
        "addr": "relays-new.cardano-testnet.iohkdev.io",
        "port": 3001,
        "valency": 2
      }
    ]
  }

When I look at journalctl logs, I get one line like this every 30 secs or so. This is extremely slow, as it was speeding through all of these slots in the first few hours. I should clarify that after your first feedback yesterday to increase the cpu/mem, I started from scratch, so this is the second time I’m observing this behaviour.

ug 19 12:07:33 cardano-block-producing-node-01-testnet cardano-block-producing-node[19755]: [cardano-:car
dano.node.ChainDB:Notice:150] [2021-08-19 12:07:33.29 UTC] Chain extended, new tip: cf6774e197abd36871cd15
9b06da8a3e491820172b3c6494833de342185c3a11 at slot 35005637

kitsunecrypto · 19 August 2021 12:51

OHHHH! Epoch 151 is the latest epoch on Testnet!
https://explorer.cardano-testnet.iohkdev.io/en.html
I was under the false impression that it should go all the way to 285!

Any idea why it says “Relay - Testnet” at the top when this is definitely the block producing node?
Thanks again.

ToTheMoonADA · 20 August 2021 17:53

You need to switch download and over to mainnet config files on your instances.

https://hydra.iohk.io/build/7191656/download/1/index.html
or step 3 on coincashew guide as you mentioned earlier.
wget -N https://hydra.iohk.io/build/${NODE_BUILD_NUM}/download/1/${NODE_CONFIG}-byron-genesis.json

wget -N https://hydra.iohk.io/build/${NODE_BUILD_NUM}/download/1/${NODE_CONFIG}-topology.json

wget -N https://hydra.iohk.io/build/${NODE_BUILD_NUM}/download/1/${NODE_CONFIG}-shelley-genesis.json

wget -N https://hydra.iohk.io/build/${NODE_BUILD_NUM}/download/1/${NODE_CONFIG}-config.json

Topic		Replies	Views
Cardano-node slow sync Operate a Stake Pool	3	1086	27 February 2021
Initial Sync Issues Setup a Stake Pool cardano-node	32	1375	3 June 2021
Extremely slow sync with blockchain Setup a Stake Pool	8	1748	4 April 2021
Relay node syncing issue - 1.27.0 Setup a Stake Pool	31	1757	27 May 2021
Block producer and Relay node both stuck on starting on 1.30.1 Setup a Stake Pool	12	610	11 October 2021

Node sync suddenly extremely slow after half a day of syncing (testnet)

Related topics