Hello,
I am having an issue with my stake pool crashing at around 60% of replaying blocks. It is using up around 30 GB of ram and 16GB Swap then exiting with an error code of 137. Any ideas what is causing it to use so much?
This is my startup script:
cardano-node run +RTS -N -w -A16m -qg -qb -RTS --topology ${TOPOLOGY} --database-path ${DB_PATH} --socket-path ${SOCKET_PATH} --host-addr ${HOSTADDR} --port ${PORT} --config ${CONFIG} --shelley-kes-key ${KES} --shelley-vrf-key ${VRF} --shelley-operational-certificate ${CERT}
Edit: This is my block producing node, not the relay.
Check your disk storage. I had a similar problem and it was just that I was out of space on my hard drive.
There is plenty of space left. I have upped my swap space to 64GB and it is progressing further. I am now up to around 55GB of memory in use and at 67%. It is going much slower now though with so much swap space in use.
Something isn’t right to be using that much swap, but if it gets through, hopefully it doesn’t want that much again.
What version node are you running?
Well, it crashed again at around 77% at a little over 90GB of total memory space. I was at 1.35, I downgraded to 1.32.1 to try an earlier version to see if that makes any difference and so far it is not looking promising. 17.4% and I am at 18GB.
Hi,
did you downgrade to 1.32? or 1.34?
You should monitor what process is causing the memory consumption until it crashes, by running something like htop, while you monitor for a clarifying log message, i.e. running journalctl -f- e in a side by side terminal.
Hi,
I downgraded to 1.32 to see if there was a difference and there was no change in the memory usage.
It is cardano-node that is eating up the memory.
Just a solution I found to this. After downgrading to 1.32 it still did not update, so I went to a fresh server and reinstalled to 1.32 copying over all the configuration files and keys. This worked. I recently updated back to 1.35.3 and on the new server had the same issue.
The fix for me was swapping the starting the block producing node as a relay and after it replayed everything switched it back to a block producer and then it was able to run.