Solving the Cardano node huge memory usage - done

If you set -H parameter, there is no use in setting -A. GHC will dynamically adjust allocation area to whatever is left of the -H parameter. :slight_smile:

Yes but when nothing is left the -A you set (or the one set at compile time) is used.

1 Like

Yeah i already have that running on a test node for 2 days and the RAM usage is at around 3GB :wink:

Just wanted to say thank you to the OP, found this thread by chance and have been running my node with the suggested settings for a couple days now. Dramatic improvement in memory usage, always had problems running the nodes, they never felt stable regardless of the hardware specs, now all is fine.

2 Likes

I can also confirm what’s written here is still helpful & relevant… though somewhat anxiously waiting to see if this will still be true on whatever node version is used post-Vasil on the mainnet :grimacing::crossed_fingers:

1 Like

@COSDpool are you still using these RTS settings as stated in your previous post?

I have tried similar and still got a couple of missed slot leader checks when a GC event happened. I ended up switching to using --nonmoving-gc setting. With this setting alone, I never get any missed slot leader checks, but memory use progressively increases, as it is not freed back to the OS, so I have to restart my node very day or so. Since I run a small pool, this isn’t a problem for me because at least I can time when the restart happens whereas I can’t forecast when the GC will run.

1 Like

yes, we’re still using those parameters: not only on our relays, as suggested above, but also on our BP. We never lose blocks due to memory issues and our network connectivity is trouble-free (I read the logs manually, with some filtering, several times a day). We miss about one block in every 40 which seems to be unavoidable under the current conditions of the network.

All our servers (the BP and the 2 relays) are still on 8GB RAM and we’re still one of the top performing pools. I do have to watch the servers’ memory use more often than I would like: to ensure we always have at least ¼GB “Mem: available” according to the free command about 2 hours before we’re scheduled to produce a block. I restart all our servers an average of once per day.

The BP misses about 1 of every 1600 slot leadership checks: but the minute the BP begins using swap this ratio increases dramatically (as seen by several-second GC’s in the GC output file).

A few weeks ago I did an experiment when it seemed like we were missing blocks about twice as much (around epochs 346 to 347 when everybody else’s performance was also appalling). Mainly this was substituting -M7300 for -O3500M in an attempt to limit the node’s actual memory use within the physical memory size: no luck of course. The node will always use all 8GB of physical memory somewhere between 4 to 48 hours after it starts… and rebooting always resets the clock.

I expect something about this tidy arrangement will change with 1.35.x and probably the whole GC question will have to be revisited. I appreciate your notification of using --nonmoving-gc successfully but it’s been my understanding that this option is incompatible with the 8GB footprint because of the larger amount of memory it needs even when the node first starts.

But if an 8GB node becomes totally impossible after the Vasil fork on mainnet then I suppose we would finally have to upgrade our tiny servers anyway. In the meantime it would be helpful to know if you’re running --nonmoving-gc on an 8GB block producer… otherwise I’ll assume 16GB or more based on prior discussion. :sunglasses:

1 Like

I am running with 16G RAM and 16G Swap. However, only around 7-8.5G RAM is used just after start. Every time I restart the BP node it uses a different amount of RAM which I presume depends on how recent the last chain snapshot was taken. After running for 6-10 hrs the RAM usage increases to 15+G and after around 24Hrs swap starts getting used. Still despite this I never see missed slot leader checks unless I let the node continue to run for several days without a restart.

The only RTS setting I have currently is “–nonmoving-gc”. In particular I am therefore using the default “-F” setting so this is why RAM usage doubles from around 7-8 to 15+ after 6-10 hrs.

I have also tested running my block producer on my ARM server which is not very powerful. I ran the BP as a virtual machine with only 4 Cortex-A72 cpus running at 2GHz with 16G RAM and 16G swap. It never missed a slot check when using the --nonmoving-gc option but with every other setting I tried slot checks were missed occasionally, which I presume lined up with Haskell GC events.

A test I found useful is running the leadership schedule calculations because this pushes the limits of performance and makes slot leadership checks more likely to be missed. It also causes Haskell to allocate more memory, so I always restart the node after running the leadership schedule check.

I also found that you can occasionally get cardano-node to produce an “out of memory” event, that causes it to be killed by the OS, if you run the leadership schedule calcs when the node is already using 15+G RAM and some swap.

I haven’t tried compiling the node with ghc version 9.2.x but apparently the nonmoving-gc works properly with this version of Haskell whereby memory is released back to the OS. If we could have cardano-node compiled against ghc version 9.2.x then maybe “–nonmoving-gc” RTS setting would be a complete fix for the problem???

1 Like

update 1 of 2 (now that most of us are now on node version 1.35.3 though haven’t been through the Vasil fork yet): TL;DR the node performance, as IOG promised in the release notes, has better memory management and therefore it’s still possible for us to run BP as well as relays on 8GB physical RAM.

In the last day of each epoch, certainly across the epoch boundary, and in the first day, the node & network activity has kept the memory usage pushing beyond 8GB into swap. But the node starts up in about ½ the time (seems to get through the ledger validation more quickly) so we can schedule staggered reboots about ½ hour before every scheduled block production in these periods.

I followed every link I could find about compiling the node with GHC 9.2 and found a couple statements from IOG that it simply does not work, and will not work until it becomes the official platform. I was looking forward to trying --nonmoving-gc this time around but I’m happy to wait because the slight improvement in memory performance (pending the upcoming HFC) is allowing us to stay within our existing budget.

Happily we haven’t missed a block since the network experienced that strange disruption in epochs 346 to 347. I believe this is good news for people running low delegation pools who have to survive several epochs without producing a block, since ability to operate with 100% performance on 8GB nodes would generally cut their operating expenses in half vs. the 16GB standard.

After Vasil HFC I’ll post update 2 of 2 with any further observations… and if it proves then or in the future that we simply can’t run on 8GB anymore without perfect performance I’ll report it here :nerd_face:

2 Likes

Hello everyone : )

Thank you for your feedback @COSDpool , and thanks to the author of this incredible post !

I’m setting up a stakepool on VPS (both BP and Relay are on 6cores 16GB RAM 150GB SSD).
I’v been trying some of the RTS options on this thread, but unfortunatly, no matter what i try, i still get Missed Slots.

It happens of course on every Major GC, and i’m wondering : is it actually normal ? I mean, maybe i’m struggling for nothing, because it’s inevitable ? Are you guys, who ve been running nodes for month/years, having some Missed Slots as well ? Or do you manage to get 0 missed with your RTS config ?

Here is a screenshot of my Grafana Dashboard for Major GC / Missed Slots

1 Like

I manage to get 0 missed slot leader checks (except during epochs transition) without any RTS parameter (except the number of CPU Cores). The secret is: quality hosting.
There is another thing (except the quality hosting) that you can do to avoid the missed slot leader checks that happen every hour when the ledger snapshot happens: increase the ledger snapshot interval from the default of 1 hour to 1 day by adding the setting:
"SnapshotInterval": 86400,
in the mainnet-config file and restarting the node.

4 Likes

Thank you very much, i’ll give it a try. The “SnapshotInterval” settings is only for BP node right ? (my relays are all running pretty good with their settings)

(I’m on MVPS which has quite good performance actually. Maybe i went to far with RTS options, and it’s become counterproductive.)

I am also setting this on relays, but 12 hours, not 24 hours. It is less effective on relays.

1 Like

FYI any node with a ledger will take snapshots (and will honour this parameter)… i.e. all nodes, regardless of whether they produce blocks :nerd_face:

1 Like

Yes of course :slight_smile: But my Relays are fine regarding Memory Utilisation. I used your settings, and they work very well ! It’s just my BP that keeps missing slots on each GC

I would imaging that changing this parameter may affect node sync time after restart. Is that the case?

I will let you know once i restart my BP :slight_smile:

I agree. I believe it all boils down to hosting specs but let me try this SnapshotInterval and set it up to 1 day and see if that helps with missed checks.

Yes, it takes a few extra seconds to restart. Negligible.

1 Like

An update after 24 hours :slight_smile:
So i removed every RTS options, except -N, on my BP node (6cores, 16GB), and i set the snapshot interval to 24h.

So far, i had 5 Major GC, which caused 18 missed slots during the last 24 hours. I think that the snapshot interval helped a lot. I also think that i should fine tune RTS with some little tweaks that optimize the RAM utilization on my BP node for GC

Capture d’écran 2022-09-19 à 15.28.25
Capture d’écran 2022-09-19 à 15.31.14

1 Like