Increasing numbers of OOM crashes on BP since 1.34.1 and now won't come back up

hamish · 31 March 2022 12:17

Howdy SPO folks.

My BP is running on a 4-core virtual server with 16GB RAM and 4GB swap, which has seemed plenty until recently. A week or two ago it started periodically crashing with out-of-memory errors, and now it won’t get past “starting” for an hour and then crashing again.

It seems more unstable after the 1.34.1 upgrade.

Any ideas?

Thanks!

hamish · 31 March 2022 12:33

PS some log data: before crashing it is doing this:

{“host”:“ubuntu-1”,“pid”:“18129”,“loc”:null,“at”:“2022-03-31T12:04:39.00Z”,“ns”:[“cardano.node.LeadershipCheck”],“sev”:“Info”,“env”:“1.34.1:73f9a”,“data”:{“utxoSize”:6453179,“kind”:“TraceStartLeadershipCheck”,“delegMapSize”:1178218,“credentials”:“Cardano”,“slot”:57161988,“chainDensity”:4.717579e-2},“msg”:“”,“thread”:“5185”,“app”:}
{“host”:“ubuntu-1”,“pid”:“18129”,“loc”:null,“at”:“2022-03-31T12:04:39.00Z”,“ns”:[“cardano.node.Forge”],“sev”:“Info”,“env”:“1.34.1:73f9a”,“data”:{“val”:{“kind”:“TraceNodeNotLeader”,“slot”:57161988},“credentials”:“Cardano”},“msg”:“”,“thread”:“5185”,“app”:}

Then after restarting and running for 30 mins it was doing this when it crashed:

{“host”:“ubuntu-1”,“pid”:“148130”,“loc”:null,“at”:“2022-03-31T12:29:22.68Z”,“ns”:[“cardano.node.ChainDB”],“sev”:“Info”,“env”:“1.34.1:73f9a”,“data”:{“kind”:“TraceImmutableDBEvent.ValidatedChunk”,“chunkNo”:“2237”},“msg”:“”,“thread”:“5”,“app”:}
{“host”:“ubuntu-1”,“pid”:“148130”,“loc”:null,“at”:“2022-03-31T12:29:22.68Z”,“ns”:[“cardano.node.ChainDB”],“sev”:“Info”,“env”:“1.34.1:73f9a”,“data”:{“finalChunk”:“2644”,“initialChunk”:“2238”,“kind”:“TraceImmutableDBEvent.StartedValidatingChunk”},“msg”:“”,“thread”:“5”,“app”:}

It doesn’t look abnormal to me, and htop was reporting reasonable memory usage (about 6GB) shortly before the crash.

The crash is reported in dmesg as:

[Thu Mar 31 14:04:39 2022] Out of memory: Killed process 18129 (cardano-node) total-vm:1074654484kB, anon-rss:8453716kB, file-rss:1528kB, shmem-rss:0kB, UID:1000 pgtables:24928kB oom_score_adj:0

Mystified!

lauris · 31 March 2022 13:01

I would suggest increasing SWAP size and seeing what’s going on… maybe there are other services that you are running on the server and using the RAM? for me 1.34.1 is running quite well on all servers.

hamish · 31 March 2022 13:13

tnx, it did indeed seem to max out swap on the latest run, so I’ve bumped it to 16gb, crossing my fingers

Alexd1985 · 31 March 2022 13:38

16G of RAM (without SWAP) should be enough if you are not using another scripts such as cncli leaderlog

beakersbike · 31 March 2022 13:40

There was some hard work done by one of our SPO’s on garbage collection tuning. If you haven’t read it yet it is worth your time.

hamish · 31 March 2022 20:16

Tnx folks, more swap seems to have resuscitated it for now

Topic		Replies	Views
High memory usage Operate a Stake Pool	7	592	1 September 2022
Sudden High Memory Usage core node Operate a Stake Pool	2	640	12 May 2022
Block producer crashes every ~24hours status=137 Operate a Stake Pool	12	901	2 January 2022
Cardano-cli query stake-snapshot hangs Setup a Stake Pool	3	775	25 August 2021
Out of memory dumping ledger-state of the node Operate a Stake Pool pool-operator , cardano-node	3	1398	20 January 2021

Increasing numbers of OOM crashes on BP since 1.34.1 and now won't come back up

Related topics