We were SlotLeader but we did not minted a BLOCK

I have rerun the slot leader script but we were still nominated for the same SlotNo 021-03-28 18:45:12 ==> Leader for 187221, Cumulative epoch blocks: 1

But weird think is that the gLiveView is not showing that we missed a block also I do not see any orphans on pooltool

Everything looks good with the nodes:

are u using cntools?
what is the ticker?

cheers,

No I was not using cntools
Ticker is BULL1
I have just stopped the relay to change the config so the log is safed in a file.
Pool.vet

I digged this a bit further.

If your logs are not sent to a log file, use this to extract data from console.

sudo journalctl -u cardano-node.service > block_fail.log

Search for TraceNodeIsLeader, you will fin the slot you missed.

In my case, it’s look like the last KES rotation was the problem. InvalidKesSignatureOCERT

Invalid block b9753272f9a5517042a3617a8881daefdf556fb86d92e750b08b9d5783f4a9fa at slot 24975847: ExtValidationErrorHeader (HeaderProtocolError (HardForkValidationErrFromEra S (S (S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (OcertFailure (InvalidKesSignatureOCERT 192 189 3 "Reject"))]}))))))

It’s look like it’s not the first time this happened.

I followed the Coincashew guide and had no error and the KES period was fine on Grafana and gLiveView…
https://www.coincashew.com/coins/overview-ada/guide-how-to-build-a-haskell-stakepool-node#18-1-rotate-pools-kes-keys-updating-the-operational-cert-with-a-new-kes-period

There is a script to test KES Signature before missing a block ?

Thank you :slight_smile:

@LuckySlam unfortunately I had to stop the nodes to set up Loging into a file … so journalctl want help now. But thank you :slight_smile:
Next time I can check the logs if something is wrong.

But still is it possible to find out where is the problem with the NODES why are they not minting? I am not sure if this can help but this are the logs for both nodes. Do you see anything weird happening that could caused the blocks to be ignored by our nodes?

We missed already 2 blocks so its frustrating :frowning:
Pool wet stats:

Ticker:BULL1

RELAY node LOG

BP node LOG

gLiveView of both nodes

Any ideas how to DEBUG this problem? Is the only way to find out to just wait till we are nominated again and we miss it again? And then we can only see the issue in the LOG?

I have followed the CoinCashew guide as well so we could have the same problem with KES as you have … how did you solve it?

Check if the modification date of node.counter and node.cert are the same.

1 Like

the modification date is the ok. But to be sure I will try to generate new KES files just to be 100% sure. Do you think is ok to test it this way?

I do not want to wait till we miss another block to see the ERROR in the log file. :frowning:
It would be good to have something that can show us if the problem is with KES

Hi guys we were elected for a slot leader again at 15:42:45 2021-03-31 15:42:45 (server runs on google cloud in LA) Leader for Slot 3474

But I am not able to find TraceNodeIsLeader. It looks like our node was not elected as a Leader even though leaderLogs shows we should be leader.

31.03.2021 15:42:45 LA time was 31.03.2021 22:42:45 node time >>

grep TraceNodeIsLeader mylog*
I am not able to find anything. I went through all the logs.

This is the RELAY LOG I am not sure if it HELPS.

Here is the full LOG File where we should mint the BLOCK.
https://bull-pool.com/wp-content/uploads/2021/04/mylog-20210331223334.log_.zip

How would you DEBUG this? :smiley:

i have something but I’m not sure about it.
I created an endless loop of a wrong TX executed it. that nodes in my topology start showing overload use of CPU and ram.
maybe some one use this.
that’s not impotent they have your BP IP or not.they can send TX to relay.

Are you using gLiveView where you can check how many Leader Slot the node had?
Interestingly I had same situation just recently - according to leaderLogs my node should produce a block. But it did not happen and there where no logs either about the event - so I dont know whether the leaderLogs was correct at that time…

Yes I am [quote=“laplasz, post:28, topic:54751, full:true”]
Are you using gLiveView where you can check how many Leader Slot the node had?
Interestingly I had same situation just recently - according to leaderLogs my node should produce a block. But it did not happen and there where no logs either about the event - so I dont know whether the leaderLogs was correct at that time…
[/quote]
Leader / Adopted / Invalid all show 0

Lauris | EU01 | www.StakePool247.eu, [1 Apr 2021 at 12:29:52]:

…either cncli is wrong regarding the leader log

or something wrong with the keys on the node as one of them is lying :slightly_smiling_face:

Cardano node doesn’t see that it is the leader … NO IDEA HOW to FIX it. Should I rotate the KES? But if the problem would be with KES it would show as ERROR with KES.

Should I replace the

  • Operational node certificate
  • VRF key
  • KES key pair
    Will it help?

for me, after the skip of production there was a successful creation - without changing anything… so using current keys does not give you errors, I think that is fine then

i think you don’t see my post but is okay.
see in your grafana dashboards before miss your block. did you have a lot of tx in mempool or more cpu load.
other thing is about copy and past. some time when you copy and past something that character code is change. it’s make change in hash.
sorry my English is not good.

I do not think our node had CPU overload. But the thing with the copy / paste is interesting. I am not sure right now but I think I did copy paste text in one file. But I am not sure.

@laplasz I think I will wait one more epoch what will happen and hope it will get solved by it self. As it did in your case.

for avg you right but you must submit block in 20 sec . if that time your cpu have overload you miss block.
i find several thing .
did you check your “cardano_node_metrics_slotsMissedNum_int” . you can check it in grafana .
use to query to check slot miss and cpu load
100 - (avg by(alias) (irate(node_cpu_seconds_total{mode=“idle”,alias=“BP”}[1m])) * 100)
cardano_node_metrics_slotsMissedNum_int


also when you use “TraceMemPool : True” in mainnet-config.json you gote more slot lost.
when you use cardano-cli use --mary-era.
did your server have time sync?

Did anyone get to the bottom of this. We have just had a similar situation today where it looks like we missed a block however there were no errors in the logs so I am trying to get to the bottom of this to understand whether it is an issue with the nodes or the leader logs.

hi and welcome.
slot miss depend on lot of things .
ram + 6GB
Core + 4 x 2.8GHz
about 20 peer for each relay . RTT less then 50 very good .
have chrony for sync server time.
if you can use ssd or nvm hard disk.
don’t run any other app in BP.
you must verify did you leader or not every 20 sec.

Same here, “missed” 2 blocks this epoch, see Missed two blocks - no clue why

The KES keys/certs are interesting though I cannot find a clue in our logs that it would be incorrect, we’re renewing them today and see if we will actually mint the next upcoming block.