Useful debugging tools for pool operators

Jack7E · 23 October 2022 12:09

This is just a thread to document useful things I do when operating a pool. The aim is provide a wealth of information for pool operators to debug issues such as missed slots. I’ll start the thread with some CNODE related workflows.

Please share any other useful processes in the comments.

CNODE (guild operators)

Networking:

Use cncli ping from hostA to hostB to ensure ports are accessible
cncli ping --host <IP> --port <PORT>

Block trouble shooting:

cd /opt/cardano/cnode/guild-db/blocklog/
sqlite3 blocklog.db "SELECT * FROM blocklog WHERE epoch=370"

Example output

503|74572767|"2022-10-19T00:24:18+00:00"|370|7901572|95967|ea5d0dcbd488e0b225164bd948888d53774ea04fa5618cbce6822f95632d2553|60022|confirmed

Notice the output contains a date, convert it to the format below to check cnode logs for that time.
Please note timezones can affect this value (see the alernative, if you are unsure)

sudo journalctl --unit=cnode.service --since "2022-10-19 00:24:18" -n 20

alternatively you can specify the slot
The following command can take some time to execute

sudo journalctl --unit=cnode.service | grep 74572767

Jack7E · 25 October 2022 23:22

Noticed some useful content in this thread

Terminada · 26 October 2022 07:10

Out of interest, what is creating the “backlog.db” sqlite database? Is that database being filled with leaderlog data generated by the cncli tool? Could you please point me to the script that generates the data?

Jack7E · 26 October 2022 09:57

I think you meant blocklog (not backlog) and it’s the cncli.sh script found at /opt/cardano/cnode/scripts/cncli.sh (if you use CNODE)

Terminada · 26 October 2022 12:13

OK, I see it now. It is in cncli.sh script: https://raw.githubusercontent.com/cardano-community/guild-operators/alpha/scripts/cnode-helper-scripts/cncli.sh

Function: cncliLeaderlog() (146 lines long). Within this function is the following:

if [[ $(jq -r .status <<< "${cncli_leaderlog}") != ok ]]; then
      error_msg=$(jq -r .errorMessage <<< "${cncli_leaderlog}")
      if [[ "${error_msg}" = "Query returned no rows" ]]; then
        echo "No leader slots found for epoch ${curr_epoch} :("
      else
        echo "ERROR: failure in leaderlog while running:"
        echo "${CNCLI} leaderlog --consensus ${consensus} --db ${CNCLI_DB} --byron-genesis ${BYRON_GENESIS_JSON} --shelley-genesis ${GENESIS_JSON} --ledger-set current ${stake_param_current} --pool-id ${POOL_ID} --pool-vrf-skey ${POOL_VRF_SKEY} --tz UTC"
        echo "Error message: ${error_msg}"
        exit 1
      fi
    else
      epoch_nonce=$(jq -r '.epochNonce' <<< "${cncli_leaderlog}")
      pool_id=$(jq -r '.poolId' <<< "${cncli_leaderlog}")
      sigma=$(jq -r '.sigma' <<< "${cncli_leaderlog}")
      d=$(jq -r '.d' <<< "${cncli_leaderlog}")
      epoch_slots_ideal=$(jq -r '.epochSlotsIdeal //0' <<< "${cncli_leaderlog}")
      max_performance=$(jq -r '.maxPerformance //0' <<< "${cncli_leaderlog}")
      active_stake=$(jq -r '.activeStake //0' <<< "${cncli_leaderlog}")
      total_active_stake=$(jq -r '.totalActiveStake //0' <<< "${cncli_leaderlog}")
      sqlite3 ${BLOCKLOG_DB} <<-EOF
				UPDATE OR IGNORE epochdata SET epoch_nonce = '${epoch_nonce}', sigma = '${sigma}', d = ${d}, epoch_slots_ideal = ${epoch_slots_ideal}, max_performance = ${max_performance}, active_stake = '${active_stake}', total_active_stake = '${total_active_stake}'
				WHERE epoch = ${curr_epoch} AND pool_id = '${pool_id}';
				INSERT OR IGNORE INTO epochdata (epoch, epoch_nonce, pool_id, sigma, d, epoch_slots_ideal, max_performance, active_stake, total_active_stake)
				VALUES (${curr_epoch}, '${epoch_nonce}', '${pool_id}', '${sigma}', ${d}, ${epoch_slots_ideal}, ${max_performance}, '${active_stake}', '${total_active_stake}');
				EOF

Which is updating a sqlite database “$BLOCKLOG_DB”, originally created by the same script (function createBlocklogDB).

So this script implements functionality to store the stake pool leaderlogs into a sqlite database instead of just saving the data as a flat file. Then this script, or others, can make use of this “blocklog” database when looking at logs, or maybe cncli database, about the chain adopted blocks if the pool operator identifies a problem.

Do pool operators set a cron job to automatically call that script every 5 days or is it intended to be run manually?

Also is there a script that automatically checks the chain blocks in the cncli database and emails the operator if a block is missed?

Is there another script that automates checking the logs if a missed block is noticed?

Topic		Replies	Views
CNTOOLS and leaderlogs Operate a Stake Pool	10	1260	31 March 2022
Cncli leaderlog Operate a Stake Pool	25	2892	9 August 2021
Pool Operators scripts to help you manage your node (+ guide) Operate a Stake Pool	1	514	13 February 2020
CNTools - Error: Rerun in offline mode Operate a Stake Pool	40	1756	14 November 2021
Stuck in http://127.0.0.1:12798 when running ./cnode.sh script Operate a Stake Pool	229	4826	31 January 2021

Useful debugging tools for pool operators

CNODE (guild operators)

Networking:

Block trouble shooting:

Related topics