gLiveView has OP Cert Disk disk|node|chain 2 | 2 |?

BumpyMcFly · 4 January 2024 01:33

Hi,

My server is running node 8.7.2 and I have gLiveView v1.28.3 and I have a question mark displayed for the chain on the BP. KES is still current. gLiveView in verbose mode is not displaying pool percentage, delegation or operator cost either. I jumped from 8.1.2 to 8.7.2 and rebuilt all servers … Everything seems fine …

OP Cert Disk disk|node|chain 2 | 2 |?

I’m not using P2P for the BP and the BP it is connected to my relays and the relays are in P2P mode to the outer world. BP and relays are all NSYNC and current.

I use bare metal hardware and it is not slow …

Thank you,

McFly

Benagain · 5 January 2024 16:21

I have a very similar question and hoping someone from Guild Operators team help out here:

this is my healthy rotation on preprod and disk|node|chain seem to be in alignment 7 | 7 | 7
But on my mainnet it shows 18|17|17

Does this indicate an issue? Maybe i entered the incorrect counter value? Cant find any good documents that explain the disk|node|chain values

AceOfSpades · 7 January 2024 06:54

Same for me, I just upgraded to 8.7.2 and v1.28.3 from 8.1.1
Cert is still valid, but this indicator is showing 9 | 9 | ?
Am trying to understand if there is something I need to be aware of or take action on.

FlairStaking · 9 January 2024 20:22

I see 5 | 5 | ?

what is the question mark?

I am running 8.1.2 with gLiveView 1.28.3

Thanks

BumpyMcFly · 19 February 2024 01:58

Hey Folks,

I never resolved this. Anyone have a good answer on how to configure gLiveView in verbose mode?

Thx, Bumpy McFly

bertman · 8 March 2024 20:52

Same for me. Node 8.7.3 on Core Mainnet for block producer.
Didn’t found a solution so far.

Capture

andrejpodzimek · 10 March 2024 10:52

Exactly the same thing here — two question marks for disk|chain after a recent gLiveView upgrade and an upgrade to 8.7.3. I guess the underlying problem had always been there and just surfaced now.

The kes / node / vrf files are all current and there are no error messages w.r.t. the files in the logs. The current|remaining|exp numbers look fine and unsurprising.

This kind of spirals back to my usual question about how likely it is that a pool with a 30k+ stake and a 30k pledge produces precisely zero blocks in 2.5 years of operation.

bertman · 10 March 2024 18:58

I doesn’t seem to affect the block production. I upgraded 2 weeks ago and minted one block on node 8.7.3 with 128k ADA live stake and 16K pledge. Running full speed for about 6 months now and 6 blocks minted so far.

Struggled in the beginning and found out my block producer was not running as a block producer (core - mainnet) but as a relay node (relay - mainnet) => see at top in GliveView. Got some great help on this forum from the community.
https://forum.cardano.org/t/feedback-on-stakepool-setup/119630/6

Also be sure your node.cert is valid when turning KES.

jeremyisme · 10 March 2024 19:35

You need to set your node.cert path in the env file (it’s down near the bottom). If you’ve setup via the Coincashew guide, then it is in a differen file path that gLiveView expects by default.

jeremyisme · 10 March 2024 19:38

30k stake is quite low, so there is a low chance to get a block. 2.5 yrs without a block is bad luck, but it all comes down to luck (unless there is something wrong with your pool setup).

Have you been checking your leaderlogs each epoch?

andrejpodzimek · 11 March 2024 12:57

Actually, this was a bug in gLiveView.sh. So even though I did get the node.cert path (which is what I call that file) resolved correctly, the cardano-cli and jq failed. Here’s what was needed:

--- a/scripts/cnode-helper-scripts/gLiveView.sh
+++ b/scripts/cnode-helper-scripts/gLiveView.sh
@@ -539,7 +539,7 @@ getOpCert () {
     op_cert_tsv=$(jq -r '[
     .qKesNodeStateOperationalCertificateNumber //"?",
     .qKesOnDiskOperationalCertificateNumber //"?"
-    ] | @tsv' <<<"$(${CCLI} ${NETWORK_ERA} query kes-period-info ${NETWORK_IDENTIFIER} --op-cert-file "${opcert_file}" | grep "^[{ }]")")
+    ] | @tsv' <<<"$("${CCLI}" "${NETWORK_ERA}" query kes-period-info "${NETWORK_IDENTIFIER}" --socket-path "${CARDANO_NODE_SOCKET_PATH}" --op-cert-file "${opcert_file}" | awk '/^[{]/{o=1}o;/^[}]/{o=0}')")
     read -ra op_cert_arr <<< ${op_cert_tsv}
     isNumber ${op_cert_arr[0]} && op_cert_chain=${op_cert_arr[0]}
     isNumber ${op_cert_arr[1]} && op_cert_disk=${op_cert_arr[1]}

Summary:

It requires --socket-path. (Perhaps an environment variable was missing or a default location where cardano-cli would look for it doesn’t match reality on my system.)
The grep filtering missed an additional header line before the JSON data in the output, which was causing jq to fail. It’s sad that cardano-cli doesn’t keep its /dev/stdin clean in pure JSON; additional plain text information should go to /dev/stderr for optional filtering / inspection…

Now it gives me the right on-disk certificate number. \o/ The on-chain certificate stays at ?, because (obviusly) no blocks → no on-chain certificates. /o\

andrejpodzimek · 21 May 2024 10:05

2.5 yrs without a block is bad luck…

I think there’s more to it than bad luck. There is a piece of configuration I am missing. For example, according to leadership-schedule, my pool was supposed to be the slot leader for slot 122591815. While it may or may not have been the slot leader, one thing is certain: No blocks were produced.

Sadly enough, when I discovered this, journalctl’s log rotation had already removed all records of what could have happened.

The explorer says that the nearest known (== turned into blocks) slots in epoch 481 were 122591811 and 122591825 and the former even had (?!) 0 transactions in it.

epoch	slot	block	date	time	transactions	output
481	122591811	10237735	2024/04/26	19:01:42	0	0
481	122591815	N/A	N/A	N/A	N/A	N/A
481	122591825	10237736	2024/04/26	19:01:56	5	2007762.516504

It could have been the case that there were (still) 0 transactions to persist. After all, it looks like around that time the chance of becoming a block producer (based on being a slot leader) was very slim, less than 5.5%.

However, it does look like many pools “circumvent” the problem by creating empty blocks (with zero transactions). Sure, they earn nothing by doing so (?) — unless the fixed block fee still applies —, but what they do gain is visibility. A non-zero block history means everything when it comes to a pool’s popularity. Personally, if I were to stake, I would never pick a pool with a zero.

So, how does one configure this magic? How does one force a validator to create an empty block, just for the sake of creating a block? This seems to be happening quite often, but my validator doesn’t do it.

HeptaSean · 21 May 2024 11:38

The rewards for pools and their delegators do not depend on transactions being available to put in a block. Blocks with 0 transactions are fine and do get rewards just like all other blocks. So, it’s also totally fine to delegate to pools minting blocks without transactions.

The fixed and margin fee has no relevance for this question, but is concerned with how rewards are split between pool and delegators.

That the log was already rotated is too bad. Can have any number of reasons that your block wasn’t picked up. Could have been a height battle (since there were actually quite a lot blocks in short succession there, average time between blocks should be around 20 seconds). Or your relay nodes have no incoming connections that could pull your blocks. Or your clock(s) are too far off. Or …

andrejpodzimek · 16 June 2024 00:56

Personally, if I were to stake, I would never pick a pool with a zero.

So, it’s also totally fine to delegate to pools minting blocks without transactions.

What I meant above was zero blocks in total, not zero transactions in some blocks. A pool that has produced >=1 blocks has perhaps thousands of times higher chances at stake acquisition than a pool with precisely 0 blocks. That one first block makes all the difference in the world.

That said, this time I was lucky to catch what appeared in the logs at the time of a failed block “adoption” (or whatever the final procedure is called):

Jun 13 01:11:10 cardano-node[1684710]: [charon:cardano.node.ChainDB:Error:211] [2024-06-12 23:11:10.07 UTC] Invalid block ce6d5f0923eeecce18e650b300ff1678e44b14e0efd014dffc6205a61432e491 at slot 126667579: ExtValidationErrorHeader (HeaderProtocolError (HardForkValidationErrFromEra S (S (S (S (S (Z (WrapValidationErr {unwrapValidationErr = CounterOverIncrementedOCERT 0 12}))))))))
Jun 13 01:11:10 cardano-node[1684710]: [charon:cardano.node.Forge:Error:229] [2024-06-12 23:11:10.07 UTC] fromList [("credentials",String "Cardano"),("val",Object (fromList [("kind",String "TraceForgedInvalidBlock"),("reason",Object (fromList [("error",Object (fromList [("error",Object (fromList [("currentCounter",Number 12.0),("kind",String "CounterOverIncrementedOCERT"),("lastCounter",Number 0.0)])),("kind",String "HeaderProtocolError")])),("kind",String "ValidationError")])),("slot",Number 1.26667579e8)]))]

Without an intimate knowledge of the codebase this looks hard to interpret…

All in all I’m becoming skeptical about the very possibility of successful block production with a “stock” unmodified cardano-node version. There may well be something that only a chosen subset of nodes “know” or can do, the rest serving as a distributed network support for free.

HeptaSean · 16 June 2024 01:40

Ah, okay. Still, your allegation

does not make much sense. It is totally okay to mint a block with 0 transactions if the mempool is empty at the time. That is not “circumventing” anything at all.

For only 30k stake, you get quite a lot of blocks assigned.

(WrapValidationErr {unwrapValidationErr = CounterOverIncrementedOCERT 0 12}) is not that hard to guess. Alternatively, a quick search on this forum for “CounterOverIncrementedOCERT” gives for example https://forum.cardano.org/t/not-getting-the-vasil-node-counter-memo-a-story-for-other-spos/111937 and https://forum.cardano.org/t/failed-adoption-over-incremented-during-re-certification-i-think/124943: Your certificate number is invalid! It must not be more than 1 greater than the on-chain certificate number (or 0 if there never was a block minted). Yours is 12 according to that error message.

This is also described in https://www.coincashew.com/coins/overview-ada/guide-how-to-build-a-haskell-stakepool-node/part-iv-administration/issuing-new-opcert#determining-the-counter-value which was already linked to above in:

That is a pretty baseless accusation. Although the documentation is arguably a bit scattered, there are hundreds of pools of all sizes minting blocks regularly.

andrejpodzimek · 16 June 2024 07:05

Thanks a lot for the pointers. Only now did I notice that there was also this write-up.

I’ve just used cardano-cli node new-counter ... to reset the counter to zero and cardano-cli query kes-period-info ... is showing me "qKesOnDiskOperationalCertificateNumber": 0, — hopefully that’s the correct state.

All in all I’m becoming skeptical about the very possibility of successful block production with a “stock” unmodified cardano-node version. There may well be something that only a chosen subset of nodes “know” or can do, the rest serving as a distributed network support for free.

That is a pretty baseless accusation. Although the documentation is arguably a bit scattered, there are hundreds of pools of all sizes minting blocks regularly.

Baseless in assuming a conspiracy between the lines; sorry about that. That part (i.e. the word “chosen”) was baseless indeed. Yet otherwise I was not that far from the truth, it seems.

The --operational-certificate-issue-counter-file mechanism seems dangerously brittle. Unless I misunderstood something, the main issues are:

If you don’t produce a block during a KES key ~~validity~~ usage period — which is a common occurrence for small pools —, you need to remember to reset the counter file before rotating your KES keys. Forgetting (or not knowing about) that will render your pool “inoperable” (serving as a free Cardano relay only), because the offset won’t decrease.
The post-mortem mentions that having an arbitrary certificate number offset used to be fine (?) before the Vasil fork. This would imply that a change breaking backward compatibility could be mitigated only by explicit human action and attention (to release notes / memos).

This↑ makes a few established principles, outlined in Eliminating Toil, a famous chapter from Site Reliability Engineering, harder than necessary to follow.

A better option: The genkes command could have checked the chain for the expected certificate number to use the right one by default — no matter if you produced blocks with the preceding certificate, no matter if you paid attention to announcements. Human factor avoided.

Although the documentation is arguably a bit scattered, there are hundreds of pools of all sizes minting blocks regularly.

And there are also hundreds of pools, especially of small sizes, not minting blocks regularly.

Anyway, thanks a lot for helping me overcome this barrier of entry; hopefully my pool will be more lucky next time it gets a slot.

andrejpodzimek · 18 July 2024 21:57

So, here’s how the next opportunity has “worked”:

Jul 18 22:09:15 cardano-node[2234848]: [charon:cardano.node.ChainDB:Notice:154] [2024-07-18 20:09:15.03 UTC] Chain extended, new tip: c4a452f025c5d9cf026af11fcda10b25fcdbdb1545240c42687513d8fca6ebaa at slot 129767064
Jul 18 22:09:15 cardano-node[2234848]: [charon:cardano.node.ChainDB:Notice:154] [2024-07-18 20:09:15.42 UTC] Switched to a fork, new tip: 83857afe04e60fc0b3f6599

It might have been the “height battle” case, but the bare existence of such cases looks like a flaw in the entire “decentralization” idea.

The mode of operation I would have hoped for would involve multiple (at least two) interleaved / intertwined chains that don’t require head-to-head tip races against propagation latency for a scheduled block producer to succeed — as long as it is not too far behind.

After my recent experience with broken certificate numbers and no warning anywhere in the CLI tools, my conspiracy imagination is running wild.

In any case, this sheds (even) more light on why there are literally hundreds of pools without a single block, serving as “free” network nodes.

Topic		Replies	Views
Wondering if my node is working properly Setup a Stake Pool	2	407	17 January 2024
Op Cert disk\|chain question Setup a Stake Pool	5	320	24 June 2024
gLiveView displaying my BP node as "relay", not "core" Misc Dev Talk	3	469	2 May 2022
At what point Block Producer node show up as “core” on GLiveView? Setup a Stake Pool	1	399	17 November 2021
gLive and 1.35.3 Setup a Stake Pool	1	483	6 September 2022

gLiveView has OP Cert Disk disk|node|chain 2 | 2 |?

Related topics