V1.25.1 relay node ram usage very high - is that a bug?

Hello Cardanians,

i’am setting up my nodes right now for the new tigr pool. I started first with a relay node on the testnet.
During this testing i saw that RAM is increasing steadily over time, which raises the question if there is a memory leak in the V1.25.1 version ?

After 14h40 mins: 3.83GB RAM usage. 33 peers in topology


After 32 mins: 2.5GB RAM usage. 33 peers in topology.

I read in the forum that they have around 700MB / 1000MB ram usage per Relay even on longer runtimes. 3.83GB for serving 33 peers sounds super inefficient - is something wrong with my node ? Where does this high memory consumption come from ?.

Normally i deal with c# backend applications facing multiple hundreds of users with less than a GB of RAM usage. Thats a huge difference here.

I’ve checked the logs:

If found:
[hn-thub1:cardano.node.DnsSubscription:Warning:84] [2021-02-06 21:40:48.82 UTC] Domain: “relay1.osterlindh.com” Failed to start all required subscriptions
[hn-thub1:cardano.node.DnsSubscription:Warning:77] [2021-02-06 21:40:48.82 UTC] Domain: “iodc.hopto.org” Failed to start all required subscriptions
… (9 more of those)

And

[hn-thub1:cardano.node.IpSubscription:Error:190] [2021-02-06 21:41:44.58 UTC] IPs: 0.0.0.0:0 [hn-thub1:cardano.node.IpSubscription:Error:715] [2021-02-06 22:15:39.99 UTC] IPs: 0.0.0.0:0 [192.168.16.1:50000,192.168.16.2:50000,192.168.16.3:50000,51.79.141.170:7900,95.216.178.106:3001,95.217.133.234:6000,95.179.169.157:6600,116.203.233.9:3002,185.173.235.164:5001,18.132.238.21:3001,3.9.209.70:6000,146.166.116.172:7172,146.166.116.170:7170,24.37.174.13:3005,157.245.131.60:6000,79.97.151.246:30000,54.241.77.32:3001,209.126.3.185:7031,104.198.217.123:3010,3.135.9.245:6001,3.14.16.248:3001,198.0.113.61:3001,40.76.58.6:6000] Connection Attempt Exception, destination 192.168.16.2:50000 exception: Network.Socket.connect: <socket: 36>: timeout (Connection timed out)
[hn-thub1:cardano.node.IpSubscription:Error:59] [2021-02-06 22:15:39.99 UTC] IPs: 0.0.0.0:0 [192.168.16.1:50000,192.168.16.2:50000,192.168.16.3:50000,51.79.141.170:7900,95.216.178.106:3001,95.217.133.234:6000,95.179.169.157:6600,116.203.233.9:3002,185.173.235.164:5001,18.132.238.21:3001,3.9.209.70:6000,146.166.116.172:7172,146.166.116.170:7170,24.37.174.13:3005,157.245.131.60:6000,79.97.151.246:30000,54.241.77.32:3001,209.126.3.185:7031,104.198.217.123:3010,3.135.9.245:6001,3.14.16.248:3001,198.0.113.61:3001,40.76.58.6:6000] Failed to start all required subscriptions

[2021-02-06 22:14:42.78 UTC] IPs: 0.0.0.0:0 [51.79.141.170:7900,95.216.178.106:3001,95.217.133.234:6000,95.179.169.157:6600,116.203.233.9:3002,185.173.235.164:5001,18.132.238.21:3001,3.9.209.70:6000,146.166.116.172:7172,146.166.116.170:7170,24.37.174.13:3005,157.245.131.60:6000,79.97.151.246:30000,54.241.77.32:3001,209.126.3.185:7031,104.198.217.123:3010,3.135.9.245:6001,3.14.16.248:3001,198.0.113.61:3001,40.76.58.6:6000] Application Exception: 104.198.217.123:3010 ExceededTimeLimit (ChainSync (Header (HardForkBlock (‘: * ByronBlock (’: * (ShelleyBlock (ShelleyEra StandardCrypto)) (‘: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (Tip HardForkBlock (’: * ByronBlock (‘: * (ShelleyBlock (ShelleyEra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Allegra StandardCrypto)) (’: * (ShelleyBlock (ShelleyMAEra ‘Mary StandardCrypto)) (’ *))))))) (ServerAgency TokNext TokCanAwait)

→ I assume that this is the badboy.

Any thoughts on that ?

I am not sure I know the answer but I it seemed to me that when I added a bunch of relays it also corresponded with having more memory consumed. So less relays will equal less memory. Also seems another parallel process is spun up which may also take up cpu. I actually ended up having to reboot when I had 70 relays in my topology.

1 Like

70? The recomandation number is ~ 20

Cheers!

1 Like

Indeed I had not gotten around to reading that. So learned via pain :stuck_out_tongue:

1 Like

Indeed there is a correlation. More peers equals more ram usage.
Lets assume the node uses 500 MB in idle mode it would then need 3330MB for 33 peers.
Thats 100MB per node connected, thats way too much (?). I would say there is room to improve for the future. I might ask that question directly on the github page.

From the other responses i understand, it behaves the same on your side and the solution you chose is to limit the peers. I will do further long term tests next weeks, and if it stays on a certain level (does not raise over time), i will leave it as it is.

Thanks for the feedback!

@tigrpool.com could you please mark one of the answers as solution if helped on your question?

@laplasz : Hello laplasz, the question is actually not answered. I just expect from the low resonance that people have the same issue and that they accept the high memory usage.
Hence so far i would not like to say the topic is “solved”.

I will set it solved if some one else confirmes my theory.

1 Like

I also noticed more RAM usage on the turn of last epoch with the new 1.25.1 version
The 4GB is not enough anymore. The spike RAM usage is not directly related to the number of peers as I consistently have c. 14 to 15 peers on both relays and the RAM utilisation spiked by about 20%.

I did not investigate further, but rather expected that the utilisation keeps growing as more functionality gets added in the run up to the go live of Goguen

I am actually having the same issue. Suppose we are using the topologyupdater.sh, it’s causing us to have more relays, I think. Does anyone have any suggestions on how to remediate the issue?

how many peers do you have?
also try to set to default “TraceMempool”: true,
try “TraceMempool”: false, in your config file, try to restart the node after and keep it under monitoring

2 Likes

That one really is a good advise.

how many peers do you have?
also try to set to default “TraceMempool”: true,
try “TraceMempool”: false, in your config file, try to restart the node after and keep it under monitoring

It lowered the memory consuption (not that much, but it helped).

Thanks.

What is the command your using to bring up those stats I am curious to monitor mystuff myself.

Hello @Anti.biz,
i use standard linux tools for that, they are available in all usually known distributions

Monitoring of CPU, RAM and Processes use HTOP

apt-get -y install htop;
htop;

Monitoring of Disk usage use iotop

apt-get -y install iotop;
iotop;

Monitoring of Network bandwith and usage use iftop

apt-get -y install iftop;
iftop;

To install all at once use:

apt-get -y install htop iotop iftop;

Good luck!

1 Like

Hi @tigrpool.com have you solved the issue with memory.
I have the same issue memory slowly climbing to 100% every 24 hours, so I have to restart cnode service manually. both relays are behaving identically.
In my topologyUpdater Max_peers set to 14 and nodes are running 17 out and 7 in
When I disable TraceMempool I do not see Processed transactions and in this case hard to say if relay is processing transactions.

Have you tried to disable TraceMempool as Alex1985 suggested earlier? It did the job for my pool.

Hi @CryptoTorben thank you for reply. Yes I did, but I do not like idea not being able to see Processed transactions. and just wondering if this will be fixed in the nearest future.

As long as you are able to see processed transactions on the Block Producer everything should be fine, that aside, I would like to see it fixed as well.

@CryptoTorben so all of your relays are running with TraceMempool disabled ?

Yea, and the BP is crunching away ;o)

You can subscribe to this issue to see how they’re doing on fixing this in Ouroboros (so the fix can then be incorporated into the next release of the node):

1 Like