Feedback on stakepool setup

Terminada · 20 July 2023 21:48

Just a few simple commands which I cut and paste. Here it is:

printf "%0.s-" {1..70}; \
echo; \
startsec=$(curl -s -H 'Accept: application/json' http://localhost:12788 | jq '.cardano.node.metrics.nodeStartTime.int.val'); \
startdate="$(date -d @${startsec})"; \
nowsec=$(date '+%s'); \
nowdate="$(date)"; \
runhrs=$(( (nowsec - startsec) / 3600 )); \
runmins=$(( (nowsec - startsec) % 3600 / 60 )); \
rtsconf="$(ps aux | grep -Po "cardano-node\s.*\+RTS\s.*\-RTS")"; \
missedslots=$(curl -s -H 'Accept: application/json' http:/localhost:12788 | jq '.cardano.node.metrics.slotsMissedNum.int.val'); \
echo "Node Started: ${startdate} (Running: ${runhrs} hrs ${runmins} mins)"; \
echo "RTS settings: ${rtsconf}"; \
echo "Missed slots: ${missedslots}"; \
echo "Memory use:"; \
free; \
echo; \
ss -4Htnp state established | grep -Po '[\d\.]+:2700\s+[\d+\.]+:\d+'

There is really only a few commands in it:

The curl command is grabbing the actual information from your own EKG port on localhost:12788
date - time calculations
ps as a simple way to grab the RTS settings when the node was started
ss to list the established connections

I leave it as a cut and paste thing because it is easy to grab from the previous command history and there are only a few commands in it. I really think it is a bad security idea to blindly rely on other people’s scripts though, and especially ones that get somehow automatically updated on your system. The best thing to do is steal my code above, run each line separately and ensure you know what it does, then modify it to suit your needs. Then you have much better security practices with no additional “tools” required.

bertman · 21 July 2023 17:58

Very practical script.
Thanks

bertman · 21 July 2023 18:20

Here are the metrics with and without the GC and ZRAM for my Block producer node and one of the relay nodes:

Block producer Node:

Relay node 01:

At first glance it seems like my BP and RELAY’S are still running out of memory with the ZRAM applied? Assuming that the sum of the live and heap memory determines the total required RAM?

However the metrics say there is about 3GB RAM free on both nodes.
So not sure it’s really necessary to upgrade to min 24 GB RAM?

bertman · 21 July 2023 18:59

Just to be sure, this is how I applied the Garbage collector trick and the ZRAM:

GC:

ZRAM:
https://fosspost.org/enable-zram-on-linux-better-system-performance/#:~:text=zRAM%20is%20a%20Linux%20kernel,on%20SSD%20or%20HDD%20devices.

After applying the ZRAM tutorial I see the following when checking for swap devices:
There are six Zram swaps. I guess one for each processor.

Colonystake_block-producer-node-server02_ZRAM_v001

Using the zramctl command to check the RAM allocation I see the following:
Colonystake_block-producer-node-server02_ZRAM_v002

I noticed the used kB stay’s at 0 and the DATA and COMPR are very low. Is this normal?

I changed the swapiness level to 150% in this file /etc/sysctl.conf by adding vm.swappiness = 150 to push the kernel to swap more often.

bertman · 23 July 2023 15:35

Little update.
The Zram seems to work now but is still very low in data?

Colonystake_block-producer-node-server02_ZRAM_v001

bertman · 24 July 2023 19:48

Hi,

Update on the pool.
All nodes (BP en 2 Relays) are now upgraded to 30GB RAM, 8 x 2.8~3.2 CHz CPU cores and 800GB SSD with 600 Mbits/s network each and are running Cardano Node 8.1.1

I’ll monitor if the missed slot leader checks still occur the coming days.

bertman · 25 July 2023 19:34

Hi,

24 hours checkpoint using the new system with each 30GB RAM and 8 x 2.8 ~3.2 GHz CPU cores.
GC trick applied and Zram installed. (See installed method used Implementation used)
Not sure if the GC implementation is correct to use with the Cardano node?

Still getting around 2.5% missed slot leader checks. But the system in getting regular blocks adopted.

pool_colonystake_blockProducer_2023-07-25

kirael · 25 July 2023 21:21

Hello,

Are you using a VPS ? Which provider ?

How many CPU (or VPCU) do you have ?

bertman · 27 July 2023 17:38

Hi,

I’m using 3 Contabo VPS servers with 8 vCPU cores, 30 GB RAM with 800 Mbit/s.
One of the servers is located in the USA. The others in Germany.

I tried to apply the solutions above concerning the CG. But not sure if i’m implementing it correct.

kirael · 31 July 2023 09:40

Hello,

Ok. From my own experience, with 30GB you don’t have any RAM problem…

So first of all, change your swappiness parameter (you don’t need to swap often…)

vm.swappiness = <try 10 or 20…>

Also, you coult try to optimize your RAM with cache pressure. This parameter move more or less data that is not going to be used immediatly from RAM to SWAP. Default is 100. Try 50.

vm.vfs_cache_pressure=50

The main cause behind those missed slots is probably those shared vCPU on VPS Contabo… Unfortunatly, there isn’t much you can do about that (unless you go for a VDS or a full Dedicated Barebone Server).

You could try these options on your BP inside your cardano startup script, and see if it improves your missed slots % :

+RTS -N8 -A32M -AL256M -n16m --disable-delayed-os-memory-return --nonmoving-gc -T -S -RTS

You could also raise the snapshot interval on your BP inside your cardano config.json :

“SnapshotInterval”: 43200,

(snapshot every 12 hours)

bertman · 5 August 2023 13:36

I changed my BP to your proposed settings.
The only thing I couldn’t find is the SnapshotInterval in the config.json file.
However I see the following

  "rotation": {
    "rpKeepFilesNum": 10,
    "rpLogLimitBytes": 5000000,
    "rpMaxAgeHours": 24
  },

Not sure I can adjust the rpMaxAgeHours setting to 12?

I’ll give the other settings a try for some epochs.
Thanks.

I’ll keep you posted.

bertman · 8 August 2023 18:30

Hi,

Checking in…

The BP is running for 3 days. The missed slot leader check is a bit better.
Currently around 1.8% instead of 2.4 ~ 3%

The only thing I couldn’t apply is the SnapshotInterval tweak in the config.json file:

kirael · 8 August 2023 20:44

You just have to add the line :

“SnapshotInterval”: 43200,

inside the first block of your config.json file

And restart your cardano process

bertman · 9 August 2023 19:40

Ok, I’ll give it a try and report in a few days.

Thanks for the help @kirael

bertman · 14 August 2023 18:31

The BP has been running for 4 days. The missed slot leader check is currently around 1.3%
after adding the “SnapshotInterval”: 43200, to the config file.

Topic		Replies	Views
Missed slot leader checks Operate a Stake Pool	68	5371	6 September 2022
Too many Missed Slots happens, help me Operate a Stake Pool	72	4399	2 October 2022
Solving the Cardano node huge memory usage - done Operate a Stake Pool	141	17276	13 May 2024
Missed slots Operate a Stake Pool	59	2603	8 October 2021
Missed Slot Leader % Setup a Stake Pool	55	2250	2 March 2023

Feedback on stakepool setup

Related topics