We’re running a well performing stake pool for nearly a year now but something has always bothered me which I have never been able to research properly: all our pool nodes spike their CPU use (also raising load average) exactly every 72 minutes.
Since it’s always happened on our core as well as our relays, I’ve understood that it isn’t a response to intrusion. In the early days the documentation on
cardano-node was pretty sketchy, and it’s filled out a lot since then: but not with respect to internals, memory usage, debugging and tracing settings, etc… and I’ve been waiting in vain for an explanation of our observed 72 minute cycle.
What have I been missing all this time? I’ve noticed the node resource usage has been stepping up a little bit, probably from all the scaffolding coming in for Alonzo… our pool nodes have been well within performance limits during the quiet part of each 72 minute cycle, but are now pushing those limits during the several-minute spikes.
We’d just like to be well prepared for the coming weeks & hope some of the devs, dev oriented SPOs, or people with relevant empirical observations might share their insights. Before I begin the usual analysis & performance logging I would please just like to hear some others chime in, so we might have an idea what to look at first & even whether or not this is correctable (or even undesirable).