My server is running node 8.7.2 and I have gLiveView v1.28.3 and I have a question mark displayed for the chain on the BP. KES is still current. gLiveView in verbose mode is not displaying pool percentage, delegation or operator cost either. I jumped from 8.1.2 to 8.7.2 and rebuilt all servers … Everything seems fine …
OP Cert Disk disk|node|chain 2 | 2 |?
I’m not using P2P for the BP and the BP it is connected to my relays and the relays are in P2P mode to the outer world. BP and relays are all NSYNC and current.
Same for me, I just upgraded to 8.7.2 and v1.28.3 from 8.1.1
Cert is still valid, but this indicator is showing 9 | 9 | ?
Am trying to understand if there is something I need to be aware of or take action on.
Exactly the same thing here — two question marks for disk|chain after a recent gLiveView upgrade and an upgrade to 8.7.3. I guess the underlying problem had always been there and just surfaced now.
The kes / node / vrf files are all current and there are no error messages w.r.t. the files in the logs. The current|remaining|exp numbers look fine and unsurprising.
This kind of spirals back to my usual question about how likely it is that a pool with a 30k+ stake and a 30k pledge produces precisely zero blocks in 2.5 years of operation.
I doesn’t seem to affect the block production. I upgraded 2 weeks ago and minted one block on node 8.7.3 with 128k ADA live stake and 16K pledge. Running full speed for about 6 months now and 6 blocks minted so far.
Struggled in the beginning and found out my block producer was not running as a block producer (core - mainnet) but as a relay node (relay - mainnet) => see at top in GliveView. Got some great help on this forum from the community. https://forum.cardano.org/t/feedback-on-stakepool-setup/119630/6
Also be sure your node.cert is valid when turning KES.
You need to set your node.cert path in the env file (it’s down near the bottom). If you’ve setup via the Coincashew guide, then it is in a differen file path that gLiveView expects by default.
30k stake is quite low, so there is a low chance to get a block. 2.5 yrs without a block is bad luck, but it all comes down to luck (unless there is something wrong with your pool setup).
Have you been checking your leaderlogs each epoch?
Actually, this was a bug in gLiveView.sh. So even though I did get the node.cert path (which is what I call that file) resolved correctly, the cardano-cli and jq failed. Here’s what was needed:
It requires --socket-path. (Perhaps an environment variable was missing or a default location where cardano-cli would look for it doesn’t match reality on my system.)
The grep filtering missed an additional header line before the JSON data in the output, which was causing jq to fail. It’s sad that cardano-cli doesn’t keep its /dev/stdin clean in pure JSON; additional plain text information should go to /dev/stderr for optional filtering / inspection…
Now it gives me the right on-disk certificate number. \o/ The on-chain certificate stays at ?, because (obviusly) no blocks → no on-chain certificates. /o\
I think there’s more to it than bad luck. There is a piece of configuration I am missing. For example, according to leadership-schedule, my pool was supposed to be the slot leader for slot 122591815. While it may or may not have been the slot leader, one thing is certain: No blocks were produced.
Sadly enough, when I discovered this, journalctl’s log rotation had already removed all records of what could have happened.
The explorer says that the nearest known (== turned into blocks) slots in epoch 481 were 122591811 and 122591825 and the former even had (?!) 0 transactions in it.
epoch
slot
block
date
time
transactions
output
481
122591811
10237735
2024/04/26
19:01:42
0
0
481
122591815
N/A
N/A
N/A
N/A
N/A
481
122591825
10237736
2024/04/26
19:01:56
5
2007762.516504
It could have been the case that there were (still) 0 transactions to persist. After all, it looks like around that time the chance of becoming a block producer (based on being a slot leader) was very slim, less than 5.5%.
However, it does look like many pools “circumvent” the problem by creating empty blocks (with zero transactions). Sure, they earn nothing by doing so (?) — unless the fixed block fee still applies —, but what they do gain is visibility. A non-zero block history means everything when it comes to a pool’s popularity. Personally, if I were to stake, I would never pick a pool with a zero.
So, how does one configure this magic? How does one force a validator to create an empty block, just for the sake of creating a block? This seems to be happening quite often, but my validator doesn’t do it.
The rewards for pools and their delegators do not depend on transactions being available to put in a block. Blocks with 0 transactions are fine and do get rewards just like all other blocks. So, it’s also totally fine to delegate to pools minting blocks without transactions.
The fixed and margin fee has no relevance for this question, but is concerned with how rewards are split between pool and delegators.
That the log was already rotated is too bad. Can have any number of reasons that your block wasn’t picked up. Could have been a height battle (since there were actually quite a lot blocks in short succession there, average time between blocks should be around 20 seconds). Or your relay nodes have no incoming connections that could pull your blocks. Or your clock(s) are too far off. Or …
Personally, if I were to stake, I would never pick a pool with a zero.
So, it’s also totally fine to delegate to pools minting blocks without transactions.
What I meant above was zero blocks in total, not zero transactions in some blocks. A pool that has produced >=1 blocks has perhaps thousands of times higher chances at stake acquisition than a pool with precisely 0 blocks. That one first block makes all the difference in the world.
That said, this time I was lucky to catch what appeared in the logs at the time of a failed block “adoption” (or whatever the final procedure is called):
Jun 13 01:11:10 cardano-node[1684710]: [charon:cardano.node.ChainDB:Error:211] [2024-06-12 23:11:10.07 UTC] Invalid block ce6d5f0923eeecce18e650b300ff1678e44b14e0efd014dffc6205a61432e491 at slot 126667579: ExtValidationErrorHeader (HeaderProtocolError (HardForkValidationErrFromEra S (S (S (S (S (Z (WrapValidationErr {unwrapValidationErr = CounterOverIncrementedOCERT 0 12}))))))))
Jun 13 01:11:10 cardano-node[1684710]: [charon:cardano.node.Forge:Error:229] [2024-06-12 23:11:10.07 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceForgedInvalidBlock"),("reason",Object (fromList [("error",Object (fromList [("error",Object (fromList [("currentCounter",Number 12.0),("kind",String "CounterOverIncrementedOCERT"),("lastCounter",Number 0.0)])),("kind",String "HeaderProtocolError")])),("kind",String "ValidationError")])),("slot",Number 1.26667579e8)]))]
Without an intimate knowledge of the codebase this looks hard to interpret…
All in all I’m becoming skeptical about the very possibility of successful block production with a “stock” unmodified cardano-node version. There may well be something that only a chosen subset of nodes “know” or can do, the rest serving as a distributed network support for free.
does not make much sense. It is totally okay to mint a block with 0 transactions if the mempool is empty at the time. That is not “circumventing” anything at all.
For only 30k stake, you get quite a lot of blocks assigned.
That is a pretty baseless accusation. Although the documentation is arguably a bit scattered, there are hundreds of pools of all sizes minting blocks regularly.
Thanks a lot for the pointers. Only now did I notice that there was also this write-up.
I’ve just used cardano-cli node new-counter ... to reset the counter to zero and cardano-cli query kes-period-info ... is showing me "qKesOnDiskOperationalCertificateNumber": 0, — hopefully that’s the correct state.
All in all I’m becoming skeptical about the very possibility of successful block production with a “stock” unmodified cardano-node version. There may well be something that only a chosen subset of nodes “know” or can do, the rest serving as a distributed network support for free.
That is a pretty baseless accusation. Although the documentation is arguably a bit scattered, there are hundreds of pools of all sizes minting blocks regularly.
Baseless in assuming a conspiracy between the lines; sorry about that. That part (i.e. the word “chosen”) was baseless indeed. Yet otherwise I was not that far from the truth, it seems.
The --operational-certificate-issue-counter-file mechanism seems dangerously brittle. Unless I misunderstood something, the main issues are:
If you don’t produce a block during a KES key validity usage period — which is a common occurrence for small pools —, you need to remember to reset the counter file before rotating your KES keys. Forgetting (or not knowing about) that will render your pool “inoperable” (serving as a free Cardano relay only), because the offset won’t decrease.
The post-mortem mentions that having an arbitrary certificate number offset used to be fine (?) before the Vasil fork. This would imply that a change breaking backward compatibility could be mitigated only by explicit human action and attention (to release notes / memos).
A better option: The genkes command could have checked the chain for the expected certificate number to use the right one by default — no matter if you produced blocks with the preceding certificate, no matter if you paid attention to announcements. Human factor avoided.
Although the documentation is arguably a bit scattered, there are hundreds of pools of all sizes minting blocks regularly.
And there are also hundreds of pools, especially of small sizes, not minting blocks regularly.
Anyway, thanks a lot for helping me overcome this barrier of entry; hopefully my pool will be more lucky next time it gets a slot.
The mode of operation I would have hoped for would involve multiple (at least two) interleaved / intertwined chains that don’t require head-to-head tip races against propagation latency for a scheduled block producer to succeed — as long as it is not too far behind.
After my recent experience with broken certificate numbers and no warning anywhere in the CLI tools, my conspiracy imagination is running wild.
In any case, this sheds (even) more light on why there are literally hundreds of pools without a single block, serving as “free” network nodes.