Around an hour ago, the RAM usage of both relay + producer increased by about 1GB. There was a small amount of CPU activity when it occurred:
I did some searching and found this issue this seems to indicate rewards calculations are 48hrs after epoch start, though it doesn’t seem like we’re at 48hours yet.
Seems a bit strange for such a large increase in such a short space of time on both. Any ideas what may have happened? (there’s plenty of RAM, I’m just curious). There was not a corresponding increase in mempool size/txs when this occurred.
Also - I restarted the relay after this happened and the memory usage did something around 7 hours later this morning at 4am:
Looking in the logs around the time this happened (around 3:37am) I can’t see anything that looks unusual to me (though I don’t understand a lot of the output). There is a lot of FetchDeclineConcurrencyLimit and FetchDeclineChainNotPlausible logs (I have fetch decisions enabled because that seemed to be required for some of the metrics I wanted, though I don’t recall which ones), but nothing that looks like a warning or error.
I have noticed much higher memory usage with 1.26.2 as well. Something seems off. I’ve had a couple of commands crash cardano on my relay and producer node, running out of memory.
example command causing crash:
cardano-cli query ledger-state --mainnet | grep publicKey | grep <pool id>
I’ve set TraceMempool to false on the relay this morning to test. I should know more within a day about whether this helps.
Most of that thread seems to relate to a slow increase, but that seems different to what happened here (which was a sudden increase of around 1GB).
Dumping the ledge state (as also mentioned in that thread) does seem intensive and increases usage, but that was true for me on previous versions too. I haven’t done that on this new machine (and certainly not at 3am) so I don’t think that’s the same either.
Yes you are probably right, but I feel there might be a correlation somewhere (maybe not).
Just since 1.26.2 both problem a memory linked, time linked and so released linked.
Also maybe they haven’t detected that step behavior you have but still get it.
It might also be different issues you are right though …