Sudden 1GB RAM usage increases on both relay+bp

Around an hour ago, the RAM usage of both relay + producer increased by about 1GB. There was a small amount of CPU activity when it occurred:

Screenshot 2021-04-22 at 19.31.42

Zoomed in:

Screenshot 2021-04-22 at 19.42.42

I did some searching and found this issue this seems to indicate rewards calculations are 48hrs after epoch start, though it doesn’t seem like we’re at 48hours yet.

Seems a bit strange for such a large increase in such a short space of time on both. Any ideas what may have happened? (there’s plenty of RAM, I’m just curious). There was not a corresponding increase in mempool size/txs when this occurred.

1 Like

Can you correlate this with some log events ? Are you even sure it’s from cardano processes ?

I’m seeing the same and it appears to correlate with with live data based on rts_gc_max_bytes_used

Screen Shot 2021-04-22 at 6.57.10 PM

How much memory you have to go this low ? 4G ?

Also be careful mate, you posted you IPs, you ports etc …
As good practice and for your own benefit, try hiding those information when you share things.

Hey, thanks for noticing. Those ports and IPs are not external.

Yes I was suspecting, but as I said, good practice :slight_smile:

Sorry, I didn’t include much because I thought it might’ve happened to everyone or there would be an obvious answer.

This is the memory usage from Kubernetes pods running the IOHK cardano-node image.

I don’t think this was at me, but my host machine has 64GB and is running nothing but two cardano-node containers (one relay, one producer).

               total        used        free      shared  buff/cache   available
Mem:           62Gi       9.3Gi        28Gi       3.0Mi        24Gi        52Gi
Swap:         8.0Gi          0B       8.0Gi

Also - I restarted the relay after this happened and the memory usage did something around 7 hours later this morning at 4am:

Screenshot 2021-04-23 at 07.07.17

Looking in the logs around the time this happened (around 3:37am) I can’t see anything that looks unusual to me (though I don’t understand a lot of the output). There is a lot of FetchDeclineConcurrencyLimit and FetchDeclineChainNotPlausible logs (I have fetch decisions enabled because that seemed to be required for some of the metrics I wanted, though I don’t recall which ones), but nothing that looks like a warning or error.

Yeah ok it’s containers on k8s.
It seems to be a common problem all over the place. There are other people with the same issue.

Most of that thread seems to relate to a slow increase, but that seems different to what happened here (which was a sudden increase of around 1GB).

Dumping the ledge state (as also mentioned in that thread) does seem intensive and increases usage, but that was true for me on previous versions too. I haven’t done that on this new machine (and certainly not at 3am) so I don’t think that’s the same either.

I’m using docker, slimmed down Debian

Yes you are probably right, but I feel there might be a correlation somewhere (maybe not).
Just since 1.26.2 both problem a memory linked, time linked and so released linked.
Also maybe they haven’t detected that step behavior you have but still get it.

It might also be different issues you are right though …