I have set up a scheduled restart of the Relay node every night at 3 A.M. just after the mainnet-topology.json update from https://api.clio.one/htopology/v1/fetch/. After the 1.26.2 update the Core node seems to use a lot of CPU and sometimes it hangs and can’t connect to the Relay. Is it safe to restart the Core node each day at specific time just like the Relay?
It is safe only if at that time around the node is not scheduled to create block.
does the mempool trace enabled in the config? that usually solves the high CPU usage.
in which state it hangs? during a running state? or during a starting state? in starting state it can hang up to ~5 min…
@lapasz TraceMemPool is true on both Relay & Core Nodes
No, it doesn’t stuck at the start. It loses the connection to the Relay node at some point, let’s say 10 hours after the last restart of the Core node. That started to happen when I migrated to 1.26.2. From the screenshot below, seems that the relay is also loosing connection to other relays.
ok, so try running the nodes without mempool trace and see what happens
Looking at the RTView the the Relay Node is showing that it can’t connect to the Core node (192.168.220.220:6000), The Core node (at the right side) is blinking the Relay node 192.168.220.220:3001. At the same time the Grafana shows the both nodes synced and processing TX’s.
somehow the core rejects the connection attempt… would be good to know why… but right now I dont have any idea - in the block producer config do you have TraceBlockFetchClient enabled?
It is probably best to obtain the leaderlog first, to be sure that your pool is not scheduled for a block before you do any maintenance.
I set TraceMemPool = false on the BP node. There is a little improvement. But still the BP node looses the connection to the Relay node sometimes and the Relay looses the connection to other relays.
This is the graph from the last three hours:
Both the relay and the core node are running on the same machine. And I think the problem is because there is not enough memory. Both nodes are using 100% of the total of 8Gigs. I think it’s time to upgrade to 16gigs of RAM
8Gb requirement is for one node on one machine. So yeah, I’d say running BP and relay on the same machine, on top of being a bad practice, is also hard to do now.
Yeah, I know that. At the beginning I started the two nodes on the same bare metal server only for the proof of concept. Now I think it’s time to separate the core node to new machine