Is it safe to restart the Core node each day at specific time just like the Relay?

gmihaylov · 30 April 2021 18:16

Hello guys,
I have set up a scheduled restart of the Relay node every night at 3 A.M. just after the mainnet-topology.json update from https://api.clio.one/htopology/v1/fetch/. After the 1.26.2 update the Core node seems to use a lot of CPU and sometimes it hangs and can’t connect to the Relay. Is it safe to restart the Core node each day at specific time just like the Relay?

Thanks!

laplasz · 30 April 2021 18:50

It is safe only if at that time around the node is not scheduled to create block.
does the mempool trace enabled in the config? that usually solves the high CPU usage.
in which state it hangs? during a running state? or during a starting state? in starting state it can hang up to ~5 min…

gmihaylov · 30 April 2021 20:22

@lapasz TraceMemPool is true on both Relay & Core Nodes

No, it doesn’t stuck at the start. It loses the connection to the Relay node at some point, let’s say 10 hours after the last restart of the Core node. That started to happen when I migrated to 1.26.2. From the screenshot below, seems that the relay is also loosing connection to other relays.

laplasz · 30 April 2021 20:29

ok, so try running the nodes without mempool trace and see what happens

gmihaylov · 30 April 2021 20:34

Looking at the RTView the the Relay Node is showing that it can’t connect to the Core node (192.168.220.220:6000), The Core node (at the right side) is blinking the Relay node 192.168.220.220:3001. At the same time the Grafana shows the both nodes synced and processing TX’s.

laplasz · 30 April 2021 21:05

somehow the core rejects the connection attempt… would be good to know why… but right now I dont have any idea - in the block producer config do you have TraceBlockFetchClient enabled?

tomdx · 1 May 2021 05:42

It is probably best to obtain the leaderlog first, to be sure that your pool is not scheduled for a block before you do any maintenance.

gmihaylov · 1 May 2021 14:16

I set TraceMemPool = false on the BP node. There is a little improvement. But still the BP node looses the connection to the Relay node sometimes and the Relay looses the connection to other relays.

This is the graph from the last three hours:

Both the relay and the core node are running on the same machine. And I think the problem is because there is not enough memory. Both nodes are using 100% of the total of 8Gigs. I think it’s time to upgrade to 16gigs of RAM

Psychomb · 1 May 2021 15:49

8Gb requirement is for one node on one machine. So yeah, I’d say running BP and relay on the same machine, on top of being a bad practice, is also hard to do now.

gmihaylov · 1 May 2021 17:28

Yeah, I know that. At the beginning I started the two nodes on the same bare metal server only for the proof of concept. Now I think it’s time to separate the core node to new machine

Topic		Replies	Views
Topology updates and relay restarts Operate a Stake Pool	0	343	31 May 2021
Periodically restarting nodes? Operate a Stake Pool	15	1030	29 June 2021
Restarting nodes everyday Operate a Stake Pool	2	394	29 July 2021
Why my relay node crashes and restarts at the end of each epoch? Operate a Stake Pool	3	569	7 March 2021
Block producing node is dropping in connection to relay Operate a Stake Pool	12	673	21 April 2021

Is it safe to restart the Core node each day at specific time just like the Relay?

Related topics