Feedback on stakepool setup

Just a few simple commands which I cut and paste. Here it is:

printf "%0.s-" {1..70}; \
echo; \
startsec=$(curl -s -H 'Accept: application/json' http://localhost:12788 | jq '.cardano.node.metrics.nodeStartTime.int.val'); \
startdate="$(date -d @${startsec})"; \
nowsec=$(date '+%s'); \
nowdate="$(date)"; \
runhrs=$(( (nowsec - startsec) / 3600 )); \
runmins=$(( (nowsec - startsec) % 3600 / 60 )); \
rtsconf="$(ps aux | grep -Po "cardano-node\s.*\+RTS\s.*\-RTS")"; \
missedslots=$(curl -s -H 'Accept: application/json' http:/localhost:12788 | jq '.cardano.node.metrics.slotsMissedNum.int.val'); \
echo "Node Started: ${startdate} (Running: ${runhrs} hrs ${runmins} mins)"; \
echo "RTS settings: ${rtsconf}"; \
echo "Missed slots: ${missedslots}"; \
echo "Memory use:"; \
free; \
echo; \
ss -4Htnp state established | grep -Po '[\d\.]+:2700\s+[\d+\.]+:\d+'

There is really only a few commands in it:

  • The curl command is grabbing the actual information from your own EKG port on localhost:12788
  • date - time calculations
  • ps as a simple way to grab the RTS settings when the node was started
  • ss to list the established connections

I leave it as a cut and paste thing because it is easy to grab from the previous command history and there are only a few commands in it. I really think it is a bad security idea to blindly rely on other people’s scripts though, and especially ones that get somehow automatically updated on your system. The best thing to do is steal my code above, run each line separately and ensure you know what it does, then modify it to suit your needs. Then you have much better security practices with no additional “tools” required.

1 Like

Very practical script.
Thanks

Here are the metrics with and without the GC and ZRAM for my Block producer node and one of the relay nodes:

Block producer Node:

Relay node 01:

At first glance it seems like my BP and RELAY’S are still running out of memory with the ZRAM applied? Assuming that the sum of the live and heap memory determines the total required RAM?

However the metrics say there is about 3GB RAM free on both nodes.
So not sure it’s really necessary to upgrade to min 24 GB RAM?

Just to be sure, this is how I applied the Garbage collector trick and the ZRAM:

GC:

ZRAM:
https://fosspost.org/enable-zram-on-linux-better-system-performance/#:~:text=zRAM%20is%20a%20Linux%20kernel,on%20SSD%20or%20HDD%20devices.

After applying the ZRAM tutorial I see the following when checking for swap devices:
There are six Zram swaps. I guess one for each processor.

Colonystake_block-producer-node-server02_ZRAM_v001

Using the zramctl command to check the RAM allocation I see the following:
Colonystake_block-producer-node-server02_ZRAM_v002

I noticed the used kB stay’s at 0 and the DATA and COMPR are very low. Is this normal?

I changed the swapiness level to 150% in this file /etc/sysctl.conf by adding vm.swappiness = 150 to push the kernel to swap more often.

Little update.
The Zram seems to work now but is still very low in data?

Colonystake_block-producer-node-server02_ZRAM_v001

Hi,

Update on the pool.
All nodes (BP en 2 Relays) are now upgraded to 30GB RAM, 8 x 2.8~3.2 CHz CPU cores and 800GB SSD with 600 Mbits/s network each and are running Cardano Node 8.1.1

I’ll monitor if the missed slot leader checks still occur the coming days.

Hi,

24 hours checkpoint using the new system with each 30GB RAM and 8 x 2.8 ~3.2 GHz CPU cores.
GC trick applied and Zram installed. (See installed method used Implementation used)
Not sure if the GC implementation is correct to use with the Cardano node?

Still getting around 2.5% missed slot leader checks. But the system in getting regular blocks adopted.

pool_colonystake_blockProducer_2023-07-25

Hello,

Are you using a VPS ? Which provider ?

How many CPU (or VPCU) do you have ?

Hi,

I’m using 3 Contabo VPS servers with 8 vCPU cores, 30 GB RAM with 800 Mbit/s.
One of the servers is located in the USA. The others in Germany.

I tried to apply the solutions above concerning the CG. But not sure if i’m implementing it correct.

Hello,

Ok. From my own experience, with 30GB you don’t have any RAM problem…

So first of all, change your swappiness parameter (you don’t need to swap often…)

vm.swappiness = <try 10 or 20…>

Also, you coult try to optimize your RAM with cache pressure. This parameter move more or less data that is not going to be used immediatly from RAM to SWAP. Default is 100. Try 50.

vm.vfs_cache_pressure=50

The main cause behind those missed slots is probably those shared vCPU on VPS Contabo… Unfortunatly, there isn’t much you can do about that (unless you go for a VDS or a full Dedicated Barebone Server).

You could try these options on your BP inside your cardano startup script, and see if it improves your missed slots % :

+RTS -N8 -A32M -AL256M -n16m --disable-delayed-os-memory-return --nonmoving-gc -T -S -RTS

You could also raise the snapshot interval on your BP inside your cardano config.json :

“SnapshotInterval”: 43200,

(snapshot every 12 hours)

2 Likes

I changed my BP to your proposed settings.
The only thing I couldn’t find is the SnapshotInterval in the config.json file.
However I see the following

  "rotation": {
    "rpKeepFilesNum": 10,
    "rpLogLimitBytes": 5000000,
    "rpMaxAgeHours": 24
  },

Not sure I can adjust the rpMaxAgeHours setting to 12?

I’ll give the other settings a try for some epochs.
Thanks.

I’ll keep you posted.

Hi,

Checking in…

The BP is running for 3 days. The missed slot leader check is a bit better.
Currently around 1.8% instead of 2.4 ~ 3%

The only thing I couldn’t apply is the SnapshotInterval tweak in the config.json file:

1 Like

You just have to add the line :

“SnapshotInterval”: 43200,

inside the first block of your config.json file

And restart your cardano process

1 Like

Ok, I’ll give it a try and report in a few days.

Thanks for the help @kirael

1 Like

The BP has been running for 4 days. The missed slot leader check is currently around 1.3%
after adding the “SnapshotInterval”: 43200, to the config file.

1 Like