Hey all,
Ive been running a stake pool for over a year now, and in epoch 383 I finally got assigned a block. So, having updated my KES keys and the node cert I was happy to finally forge a block. Unfortunately not knowing about the breaking changes to the node counter rule introduced in Vasil. Im writing this post-mortem to anyone that is still getting caught out by this and as a bit of a reminder, when there is a hard fork, to sit down an RTFM on what is being changed.
$ ./leadership.sh
SlotNo UTC Time
-------------------------------------------------------------
80246054 2022-12-23 16:19:05 UTC
So, im waiting for the slot to click over, I see in gLive, āLeaderā go from 0 to 1, I wait a little longer, a little longerā¦ uh ohhā¦ something isnāt right, āAdoptedā still sitting at 0.
Something is wrong. I have a look on the block explorer, yep, another pool has produced the block in the slot range I was assigned. Hmm, lets take a look at the logs:
**(HERES WHERE ITS ASSIGNED)**
Dec 23 16:19:05 blockprod cardano-node[3381269]: [blockprod:cardano.node.Forge:Info:6864]
[2022-12-23 16:19:05.06 UTC] fromList [("credentials",String "Cardano"),
("val",Object (fromList [("block",String "b7ceb31faf2e2a9395a0decba64bb379bbe3d618ccad961a70c2a4f3aa5822a7"),
("blockNo",Number 8177134.0),("blockPrev",String "57b2460a441766f9855e75d835243a0174fa7e89c37909f0319d1b9d894e0037"),
("kind",String "TraceForgedBlock"),("slot",Number 8.0246054e7)]))]
**(HERES THE ERROR)**
Dec 23 16:19:05 blockprod cardano-node[3381269]: [blockprod:cardano.node.ChainDB:Error:6854]
[2022-12-23 16:19:05.12 UTC] Invalid block b7ceb31faf2e2a9395a0decba64bb379bbe3d618ccad961a70c2a4f3aa5822a7 at slot 80246054: ExtValidationErrorHeader
(HeaderProtocolError (HardForkValidationErrFromEra S (S (S (S (S (Z (WrapValidationErr
{unwrapValidationErr = CounterOverIncrementedOCERT 0 7}))))))))
Dec 23 16:19:05 blockprod cardano-node[3381269]: [blockprod:cardano.node.ChainDB:Info:6854]
[2022-12-23 16:19:05.12 UTC] Valid candidate 57b2460a441766f9855e75d835243a0174fa7e89c37909f0319d1b9d894e0037 at slot 80246039
Dec 23 16:19:05 blockprod cardano-node[3381269]: [blockprod:cardano.node.Forge:Error:6864]
[2022-12-23 16:19:05.12 UTC] fromList [("credentials",String "Cardano"),
("val",Object (fromList [("kind",String "TraceForgedInvalidBlock"),
("reason",Object (fromList [("error",Object (fromList [("error",Object (fromList [("currentCounter",Number 7.0),
("kind",String "CounterOverIncrementedOCERT"),
("lastCounter",Number 0.0)])),
("kind",String "HeaderProtocolError")])),
("kind",String "ValidationError")])),
("slot",Number 8.0246054e7)]))]
Here we are CounterOverIncrementedOCERT 0 7. Apparently I didnāt get the memo that the node counter post Vasil hard fork could not be higher than 1 in difference to the on chain counter if the node had forged a block(s) in the current KES period.
Checking kes-period-info revealed:
$ cardano-cli query kes-period-info --mainnet --op-cert-file ~/cardano-my-node/CURRENT_KEYS/node.cert
ā Operational certificate's KES period is within the correct KES period interval
ā No blocks minted so far with the operational certificate at: /home/user/cardano-my-node/CURRENT_KEYS/node.cert
On disk operational certificate counter: 7
{
"qKesCurrentKesPeriod": 619,
"qKesEndKesInterval": 678,
"qKesKesKeyExpiry": null,
"qKesMaxKESEvolutions": 62,
"qKesNodeStateOperationalCertificateNumber": null,
"qKesOnDiskOperationalCertificateNumber": 7,
"qKesRemainingSlotsInKesPeriod": 7621146,
"qKesSlotsPerKesPeriod": 129600,
"qKesStartKesInterval": 616
}
My counter was at:
āqKesOnDiskOperationalCertificateNumberā: 7
, and my on chain was at:
āqKesNodeStateOperationalCertificateNumberā: null`. (i.e., the block producer has never forged a block)
So in line with the stake-pool-operations/7_KES_period.md document (the document that i didnt read or know about till now), my values
Where:
āqKesNodeStateOperationalCertificateNumberā: is the last counter registered OnChain to forge a block.
and
āqKesOnDiskOperationalCertificateNumberā: is the counter value of the block producers OnDisk operational certificate.
Whats slightly confusing to me when working out what the āqKesOnDiskOperationalCertificateNumberā Should be, given the pool had never forged a block before, is: Should it be null, or Should it be zero?
Let me check against another pool I run that also hasnt produced a block and I havnāt rotated any KES keys yet:
ā Operational certificate's KES period is within the correct KES period interval
ā No blocks minted so far with the operational certificate at: /home/user/cardano-my-node/CURRENT_KEYS/node.cert
On disk operational certificate counter: 0
{
"qKesCurrentKesPeriod": 619,
"qKesEndKesInterval": 681,
"qKesKesKeyExpiry": null,
"qKesMaxKESEvolutions": 62,
"qKesNodeStateOperationalCertificateNumber": null,
"qKesOnDiskOperationalCertificateNumber": 0,
"qKesRemainingSlotsInKesPeriod": 7985621,
"qKesSlotsPerKesPeriod": 129600,
"qKesStartKesInterval": 619
}
So ānullā on chain and ā0ā counter, is an acceptable combination for a pool that hasnāt forged a block before (at all) ā Ok(?)
Now, in order to correct this mess, the 7_KES_period.md document
says, to
- Query
kes-period-info
and get the value of qKesNodeStateOperationalCertificateNumber - Create a new Counter
- Generate new KES keys if needed (because of MaxKESEvolutions reached)
- Issue the new Operational Certificate:
So what I have now, after generating a new node counter with
cardano-cli node new-counter --cold-verification-key-file node.vkey --counter-value 0 --operational-certificate-issue-counter-file opcert.counter
is a āopcert.counterā with content:
{
"type": "NodeOperationalCertificateIssueCounter",
"description": "Next certificate issue number: 1",
"cborHex": "82015820afb...<removed>...b1a3512e4673c23"
}
and querying the kes info on the block producer:
$ ./query_kes_period_info.sh
ā Operational certificate's KES period is within the correct KES period interval
ā No blocks minted so far with the operational certificate at: /home/user/cardano-my-node/CURRENT_KEYS/node.cert
On disk operational certificate counter: 0
{
"qKesCurrentKesPeriod": 619,
"qKesEndKesInterval": 681,
"qKesKesKeyExpiry": null,
"qKesMaxKESEvolutions": 62,
"qKesNodeStateOperationalCertificateNumber": null,
"qKesOnDiskOperationalCertificateNumber": 0,
"qKesRemainingSlotsInKesPeriod": 7985621,
"qKesSlotsPerKesPeriod": 129600,
"qKesStartKesInterval": 619
}
Looks like this should be ok for a successful block forge in possibly another year