Not getting the Vasil node.counter memo - A STORY FOR OTHER SPOs

Hey all,

Ive been running a stake pool for over a year now, and in epoch 383 I finally got assigned a block. So, having updated my KES keys and the node cert I was happy to finally forge a block. Unfortunately not knowing about the breaking changes to the node counter rule introduced in Vasil. Im writing this post-mortem to anyone that is still getting caught out by this and as a bit of a reminder, when there is a hard fork, to sit down an RTFM on what is being changed.

$ ./leadership.sh
     SlotNo                          UTC Time
-------------------------------------------------------------
     80246054                   2022-12-23 16:19:05 UTC

So, im waiting for the slot to click over, I see in gLive, ā€œLeaderā€ go from 0 to 1, I wait a little longer, a little longerā€¦ uh ohhā€¦ something isnā€™t right, ā€œAdoptedā€ still sitting at 0.

1

Something is wrong. I have a look on the block explorer, yep, another pool has produced the block in the slot range I was assigned. Hmm, lets take a look at the logs:

**(HERES WHERE ITS ASSIGNED)**
Dec 23 16:19:05 blockprod cardano-node[3381269]: [blockprod:cardano.node.Forge:Info:6864] 
[2022-12-23 16:19:05.06 UTC] fromList [("credentials",String "Cardano"),
("val",Object (fromList [("block",String "b7ceb31faf2e2a9395a0decba64bb379bbe3d618ccad961a70c2a4f3aa5822a7"),
("blockNo",Number 8177134.0),("blockPrev",String "57b2460a441766f9855e75d835243a0174fa7e89c37909f0319d1b9d894e0037"),
("kind",String "TraceForgedBlock"),("slot",Number 8.0246054e7)]))]

**(HERES THE ERROR)**
Dec 23 16:19:05 blockprod cardano-node[3381269]: [blockprod:cardano.node.ChainDB:Error:6854] 
[2022-12-23 16:19:05.12 UTC] Invalid block b7ceb31faf2e2a9395a0decba64bb379bbe3d618ccad961a70c2a4f3aa5822a7 at slot 80246054: ExtValidationErrorHeader 
(HeaderProtocolError (HardForkValidationErrFromEra S (S (S (S (S (Z (WrapValidationErr 
{unwrapValidationErr = CounterOverIncrementedOCERT 0 7}))))))))


Dec 23 16:19:05 blockprod cardano-node[3381269]: [blockprod:cardano.node.ChainDB:Info:6854] 
[2022-12-23 16:19:05.12 UTC] Valid candidate 57b2460a441766f9855e75d835243a0174fa7e89c37909f0319d1b9d894e0037 at slot 80246039

Dec 23 16:19:05 blockprod cardano-node[3381269]: [blockprod:cardano.node.Forge:Error:6864] 
[2022-12-23 16:19:05.12 UTC] fromList [("credentials",String "Cardano"),
("val",Object (fromList [("kind",String "TraceForgedInvalidBlock"),
("reason",Object (fromList [("error",Object (fromList [("error",Object (fromList [("currentCounter",Number 7.0),
("kind",String "CounterOverIncrementedOCERT"),
("lastCounter",Number 0.0)])),
("kind",String "HeaderProtocolError")])),
("kind",String "ValidationError")])),
("slot",Number 8.0246054e7)]))]

Here we are CounterOverIncrementedOCERT 0 7. Apparently I didnā€™t get the memo that the node counter post Vasil hard fork could not be higher than 1 in difference to the on chain counter if the node had forged a block(s) in the current KES period.

Checking kes-period-info revealed:

$ cardano-cli query kes-period-info --mainnet --op-cert-file ~/cardano-my-node/CURRENT_KEYS/node.cert
āœ“ Operational certificate's KES period is within the correct KES period interval
āœ— No blocks minted so far with the operational certificate at: /home/user/cardano-my-node/CURRENT_KEYS/node.cert
  On disk operational certificate counter: 7
{
    "qKesCurrentKesPeriod": 619,
    "qKesEndKesInterval": 678,
    "qKesKesKeyExpiry": null,
    "qKesMaxKESEvolutions": 62,
    "qKesNodeStateOperationalCertificateNumber": null,
    "qKesOnDiskOperationalCertificateNumber": 7,
    "qKesRemainingSlotsInKesPeriod": 7621146,
    "qKesSlotsPerKesPeriod": 129600,
    "qKesStartKesInterval": 616
}

My counter was at:
ā€œqKesOnDiskOperationalCertificateNumberā€: 7
, and my on chain was at:
ā€œqKesNodeStateOperationalCertificateNumberā€: null`. (i.e., the block producer has never forged a block)

So in line with the stake-pool-operations/7_KES_period.md document (the document that i didnt read or know about till now), my values

Where:

ā€œqKesNodeStateOperationalCertificateNumberā€: is the last counter registered OnChain to forge a block.

and

ā€œqKesOnDiskOperationalCertificateNumberā€: is the counter value of the block producers OnDisk operational certificate.

Whats slightly confusing to me when working out what the ā€œqKesOnDiskOperationalCertificateNumberā€ Should be, given the pool had never forged a block before, is: Should it be null, or Should it be zero?

Let me check against another pool I run that also hasnt produced a block and I havnā€™t rotated any KES keys yet:

āœ“ Operational certificate's KES period is within the correct KES period interval
āœ— No blocks minted so far with the operational certificate at: /home/user/cardano-my-node/CURRENT_KEYS/node.cert
  On disk operational certificate counter: 0
{
    "qKesCurrentKesPeriod": 619,
    "qKesEndKesInterval": 681,
    "qKesKesKeyExpiry": null,
    "qKesMaxKESEvolutions": 62,
    "qKesNodeStateOperationalCertificateNumber": null,
    "qKesOnDiskOperationalCertificateNumber": 0,
    "qKesRemainingSlotsInKesPeriod": 7985621,
    "qKesSlotsPerKesPeriod": 129600,
    "qKesStartKesInterval": 619
}

So ā€˜nullā€™ on chain and ā€˜0ā€™ counter, is an acceptable combination for a pool that hasnā€™t forged a block before (at all) ā†’ Ok(?)

Now, in order to correct this mess, the 7_KES_period.md document says, to

  1. Query kes-period-info and get the value of qKesNodeStateOperationalCertificateNumber
  2. Create a new Counter
  3. Generate new KES keys if needed (because of MaxKESEvolutions reached)
  4. Issue the new Operational Certificate:

So what I have now, after generating a new node counter with

cardano-cli node new-counter --cold-verification-key-file node.vkey --counter-value 0 --operational-certificate-issue-counter-file opcert.counter

is a ā€˜opcert.counterā€™ with content:

{
    "type": "NodeOperationalCertificateIssueCounter",
    "description": "Next certificate issue number: 1",
    "cborHex": "82015820afb...<removed>...b1a3512e4673c23"
}

and querying the kes info on the block producer:

$ ./query_kes_period_info.sh
āœ“ Operational certificate's KES period is within the correct KES period interval
āœ— No blocks minted so far with the operational certificate at: /home/user/cardano-my-node/CURRENT_KEYS/node.cert
  On disk operational certificate counter: 0
{
    "qKesCurrentKesPeriod": 619,
    "qKesEndKesInterval": 681,
    "qKesKesKeyExpiry": null,
    "qKesMaxKESEvolutions": 62,
    "qKesNodeStateOperationalCertificateNumber": null,
    "qKesOnDiskOperationalCertificateNumber": 0,
    "qKesRemainingSlotsInKesPeriod": 7985621,
    "qKesSlotsPerKesPeriod": 129600,
    "qKesStartKesInterval": 619
}

Looks like this should be ok for a successful block forge in possibly another year :man_facepalming:

6 Likes