Upgraded to CNTools, Slot Leader but no blocks

I have been operating my pool (NSPYR) using the coincashew guide for almost a year. When version 1.33.0 came out I decided to upgrade the pool to use CNTools (Guild Operators) since coincashew seems to be missing upgrade path for 1.33.0. I followed the steps, installing the prerequites and everything seemed to be acting well. I rotated my KES key on Jan 11th and since then my pool hasn’t minted any blocks. While I don’t have a screenshot, gLiveView.sh showed under “Block Production” Leader: 2. Usually when I have seen this in the past we have a corresponding “Adopted” value meaning we minted a block. However, this time there was no “Adopted” value. It just read 0. What does this mean? In an attempt to fix this issue I rotated by KES again using the “cntools.sh” script. From my understanding that script generates a new KES key and operational certificate. From my reading on related issues to mine, I have seen that a common issue is not updating the op.cert. I followed the steps here: Common Tasks - Guild Operators which says that it updates the op.cert and after a reboot the node should be OK.

Any ideas on how I can go about troubleshooting this issue? I noticed in Grafana that my “node_timex_estimated_error_seconds” was growing to about 750 ms to 800 ms on my core node so I installed “chrony” on all relays nodes and the core node using this Stake Pool (Server) Time Synchronisation with Chrony - YouTube video as a guide.

Please help as I am at a loss as to what I have done wrong on the 1.33.0 upgrade.

hmm, try this command:

cardano-cli text-view decode-cbor --in-file <path>/op.cert | grep int | head -1

what is the output? U used (for the last blocks) Opcert 4 so any number above 4 should be ok… the problem is if u used an old cold.counter file when u rotated the KES

for example in ur case if u go to adapools.org → search for ur pool → blocks u will see Opcert 4 … now if for example u will want to rotate the KES, before to do it check the cold.counter file (nano or cat cold.counter), u should see next issue certificate number (or something like this) :5 (any number > then the last one used (4) will be fine)

cardano@cardano-block01:/opt/cardano/cnode$ cardano-cli text-view decode-cbor --in-file priv/pool/NSPYR/op.cert | grep int | head -1
      06  # int(6)

When I cat my cold.counter file in the “description” I see

"description": "Next certificate issue number: 7",

This seems to be ok, yes?

yes, it’s fine… now run

./cncli.sh sync
./cncli.sh init

do u see any blocks missed/ghosted/invalid/stolen since epoch 314?

Just to confirm, I should run the

./cncli.sh sync

command and then open a new SSH session to the block producer and run

./cncli.sh init

right?

Apologies, I’m new to the CNTools. When I ran the ./cncli.sh init command in a new SSH session I get the following output (relating to Epoch 314):

> Validating epoch 314
CONFIRMED: Leader for slot '50350265' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '50354209' and match found in CNCLI DB for this slot with pool's VRF public key

One ssh session… run ./cncli.sh sync and wait till u see 100% synced, then stop the script and run ./cncli.sh init

it should return all blocks assigned to ur pool and the status (adopted, missed, ghosted, stolen, invalid, etc)

U should run each epoch (u can run now if u didn’t)

./cncli.sh sync
./cncli.sh leaderlog 

This way u will know how much blocks were assigned to ur pool for the respective epochs (the slot number and the time when the block will be created)

Your above output didn’t found any blocks after epoch 314

I get this error when I run ./cncli.sh leaderlog

Checking for script updates...
~ CNCLI Leaderlog started ~
Node in sync, sleeping for 60s before running leaderlogs for current epoch
Running leaderlogs for epoch 319 and adding leader slots not already in DB
error: The following required arguments were not provided:
    --active-stake <active-stake>
    --pool-stake <pool-stake>

USAGE:
    cncli leaderlog --active-stake <active-stake> --byron-genesis <byron-genesis> --d <d> --db <db> --ledger-set <ledger-set> --pool-id <pool-id> --pool-stake <pool-stake> --pool-vrf-skey <pool-vrf-skey> --shelley-genesis <shelley-genesis> --tz <timezone>

For more information try --help
ERROR: failure in leaderlog while running:
/home/cardano/.cargo/bin/cncli leaderlog --db /opt/cardano/cnode/guild-db/cncli/cncli.db --byron-genesis /opt/cardano/cnode/files/byron-genesis.json --shelley-genesis /opt/cardano/cnode/files/shelley-genesis.json --ledger-set current  --pool-id 45faad3e1ab98f4a06a54e3d1a3bc70bed80d6e437ce8f9361a5a9be --pool-vrf-skey /opt/cardano/cnode/priv/pool/NSPYR/vrf.skey --tz UTC
Error message: 

I am reviewing the documentation (CNCLI - Guild Operators) and I don’t see a mention of providing --active-stake or --pool-stake.

ok, if u are using cntools then perhaps u will need to install the last update?

cd ~/tmp
./prereqs.sh -c

then try again to run

./cncli.sh sync
./cncli.sh leaderlog

I think this script was manually configured by u in the past right?

This should be the script

https://raw.githubusercontent.com/cardano-community/guild-operators/alpha/scripts/cnode-helper-scripts/cncli.sh

don’t forget to update the paths and the pool ID (and uncomment the lines)


#POOL_ID=""                               # Automatically detected if POOL_NAME is set in env. Required for leaderlog calculation & pooltool sendtip, lower-case hex pool id
#POOL_VRF_SKEY=""                         # Automatically detected if POOL_NAME is set in env. Required for leaderlog calculation, path to pool's vrf.skey file
#POOL_VRF_VKEY=""                         # Automatically detected if POOL_NAME is set in env. Required for block validation, path to pool's vrf.vkey file

Ah, I must set the POOL_VRF_SKEY and POOL_VRF_VKEY. It’s unfortunate that the comment in the cncli.sh script is misleading. I have POOL_NAME set in my env file…

So it looks like the ./cncli.sh leaderlog command is running. However it appears to consumed all available memory on my core node causing the cnode.service to get killed. Is this normal?

FYI, this system has 16GB of RAM

Yeah, use swap file

type free -m and share the output

cardano@cardano-block01:/opt/cardano/cnode$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16012         491        1163           1       14358       15244
Swap:             0           0           0

It looks like I might have swap turned off.

This config will add a 6G swap file

sudo fallocate -l 6G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

now to make it permanently type
sudo nano /etc/fstab

and add to the end, as a new line
/swapfile swap swap defaults 0 0
save the file (Ctrl + x then Y then ENTER)

check if the configuration was successfully

free -m

image

Ok cool.

cardano@cardano-block01:/opt/cardano/cnode$ free -m
              total        used        free      shared  buff/cache   available
Mem:          16012        6536         181           1        9295        9183
Swap:          6143           0        6143

Now run the ./cncli.sh leaderlog command?

This is the output from ./cncli.sh leaderlog:

Checking for script updates...
~ CNCLI Leaderlog started ~
Node in sync, sleeping for 60s before running leaderlogs for current epoch
Leaderlogs already calculated for epoch 319, skipping!

I just popped up gLiveView.sh and this is the output:
Screenshot from 2022-02-06 16-51-01

It is saying Leader: 1 under Block Production. Can you help me understand what that means?

In exactly 1d and 4 hours u will have a block assigned

run ./cncli.sh sync and ./cncli.sh init

Oh awesome. Here is the output from those commands (truncated):

> Validating epoch 312
CONFIRMED: Leader for slot '49429307' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '49450222' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '49553398' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '49595461' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '49679186' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '49688044' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '49729521' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '49777279' and match found in CNCLI DB for this slot with pool's VRF public key
> Validating epoch 313
CONFIRMED: Leader for slot '50091470' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '50093254' and match found in CNCLI DB for this slot with pool's VRF public key
> Validating epoch 314
CONFIRMED: Leader for slot '50350265' and match found in CNCLI DB for this slot with pool's VRF public key
CONFIRMED: Leader for slot '50354209' and match found in CNCLI DB for this slot with pool's VRF public key
> Validating epoch 319

awesome, those are the all blocks assigned to your pool since the beginning