Cncli leaderlog does not match cardano-cli query leadership-schedule

I was running node v1.33 and I was using cncli leaderlog to check block generation schedule. It has been working flawlessly so far.

Recently I start to upgrade node to v1.34.1. So I set up another pool with identical credential, but without cncli, because I want to use the native “cardano-cli query leadership-schedule” instead. Today I ran leader checking on both pools, and to my surprise they don’t agree with each other. Specifically v1.33+cncli pool reports a block creation while the new v1.34.1 pool reports none.

Anybody knows why? Did I miss anything in the new pool query command? Below are the commands I used on each pool to do the query.

old pool:

$SCRIPT_DIR/cncli leaderlog \
        --pool-id $(cat stake-pool-id.txt) \
        --pool-vrf-skey vrf.skey \
        --byron-genesis $SCRIPT_DIR/$NETWORK-byron-genesis.json \
        --shelley-genesis $SCRIPT_DIR/$NETWORK-shelley-genesis.json \
        --pool-stake $POOL_STAKE \
        --active-stake $ACTIVE_STAKE \
        --db $SCRIPT_DIR/cardano-cncli.db \
        --ledger-set $epoch

New pool:

        $CCLI query leadership-schedule $NETWORK_MAGIC \
            --genesis "$NODE_HOME/$NETWORK-shelley-genesis.json" \
            --stake-pool-id $STAKE_POOL_ID \
            --vrf-signing-key-file "$NODE_HOME/vrf.skey" \
            --next

testnet or mainnet? And by “new pool” do you mean just a new server with the same vrf, node.cert etc?

It’s on mainnet.

Yes, your understanding of new pool is correct. Any insight on why they are giving different results?

Thanks.

Another silly question, do they both methods report the same epoch? I see you have selected an epoch in the cncli method, so just checking it is the same epoch as “next” in the cli method

Yes. Both are reporting for epoch 346, the coming one.

I’m keen to see both logs, but it might be best to wait until the epoch has finished, or at least you’ve minted the block. It will be interesting to see which method tells the truth.

Another silly question, both methods are reporting the same time zone? e.g. UTC?

I’m getting curious and I did some more testing:

  • Use the same method (i.e., duplicating server) and checked on testnet. Both cncli leaderlog and query leadership-schedule agree with each other with many blocks being reported (and being generated since I queried current epoch)
  • I deleted cncli db on my production mainnet server (producer node) and re-create cncli db from scratch. This time “cncli leaderlog” qould reports no block scheduled, agreeing with “query leadership-schedule” method.

So in short “cncli” is there to blame, it seems. I will still keep the node running as it is to confirm before I switch over the standby server to take over the pool.

So the cncli “predicted” block never happened?

That is right. It is confirmed with time passing the predicated slot.

I knew it should be case when I re-created cncli db and the new db said “no”. And this is the second time that cncli mispredicted block schedule. Let us hope the new native query won’t have such mistake.

I’m letting my new server running in standby mode. Will formally switch over when I feel comfortable. While we are here, my pool is MYADA. Please take a look and consider delegating to it.

I’ve just skim read this post. Had the same issue I believe. It was due to some corrupted data in the cncli db. It’s best just to remove it and let it resync.

Solution for potential corrupted data