Troubleshooting not validated blocks

laplasz · 1 March 2020 14:33

The following block was not validated:

{
    "created_at_time": "2020-02-29T21:41:06.352339908+00:00",
    "scheduled_at_time": "2020-03-01T10:13:45+00:00",
    "scheduled_at_date": "78.27004",
    "wake_at_time": "2020-03-01T10:13:45.000848006+00:00",
    "finished_at_time": "2020-03-01T10:13:45.002801924+00:00",
    "status": {
      "Block": {
        "block": "372bd8cccb2ec0203533ea72cd1dc67c6f2c31ca0359cc6d215788d581bbd104",
        "chain_length": 257764
      }
    },
    "enclave_leader_id": 1
  },

and the block which was validated before my block is:
https://shelleyexplorer.cardano.org/en/block/f945d6ed7352655650d7d4081dc1608ba934c5c405c602680b47d60aa577c85a/

the block which was validated after my block is:
https://shelleyexplorer.cardano.org/en/block/464624133b6e7def44c23617f64db39224fe1ef3bf05cfa01507dbf2df1b5151/

so the slots in a table would be:

EPOCH	SLOT	TIME	STAKE POOL
78	27022	11:14:21, March 1, 2020	842c…c54
78	27004	10:13:45, March 1, 2020	5959…eb5
78	27003	11:13:43, March 1, 2020	107d…0c5
78	26967	11:12:31, March 1, 2020	3033…393

it seams that the previous slot created just 2 sec before my slot. And the next one is almost after 40 sec. So my node probably created the block on a wrong block height. Since there were only 2 sec to sync the block which was created in the previous slot.
3 questions:

Why the slot time was so close to the previous one? I think the average should be 20 sec.
And which config parameter should be tuned to able to sync block height in such a short period of time.
is this situation an example of a height battle?

Shang_Di · 1 March 2020 16:46

https://hydra.iohk.io/build/1505847/download/1/itn_rewards_v1-genesis.yaml
genesis的配置文件
我们关注的三个参数
“slot_duration”: 2,
“slots_per_epoch”: 43200 “consensus_genesis_praos_active_slot_coeff”: 0.1
一个时代有43200个slot，每个2秒。
24 * 60 * 60 / 2=43200
consensus_genesis_praos_active_slot_coeff 指出只有0.1比例的slot是能产生块。所以平均的同步时间是20秒，一天满效率出块4320块。
矿池在某一个slot出块是基于概率来保证的。

f is active slots coefficient
αi is the relative stake
所有这一切是由概率来保证的，所以局部的时间段内可能出现比较集中的情况。
回答：
1，平均的间隔时间大概是20秒，但实际上可能局部出现不均匀的情况。这是协议本身就包含的情况。
2，我上面列举的三个参数确定了这些性质。
“slot_duration”: 2,
“slots_per_epoch”: 43200 “consensus_genesis_praos_active_slot_coeff”: 0.1
但是你不能修改他们，这是在测试网启动时就确定的数据。在haskell所写的shelly规范中实现了更改协议参数的提议。通过某一节点提出协议参数修改提议，拥有股权的用户投票来确定是否使用新的参数来运行协议。但是测试网应该没有实现这一功能。
3,这并不是一个height battle的例子。假设你也是在slot 27003时出块，那么你和别人才是height battle。
如果你的网络足够好，那么你应该能在2秒内收到27003块，然后你自己再出27004块，这样效率是最高的。
如果你网络不够好，没能同步到27003块，那么这时27003，27004会指向同一个块，这时网络将产生分叉。由下一个块决定网络的走向。27003 27004都是合法的，但是有一块会被丢弃。
。。。。
有英文比较好的兄弟可以把这个答案翻译成英文吧
我英文比较菜，就直接用中文回答了。

laplasz · 1 March 2020 17:01

could you please write your answer in English?

Elurevad · 1 March 2020 18:34

https://translate.google.com
Paste the post in here.
Cheers,
D

laplasz · 1 March 2020 18:37

Szerintem ha lehet ne bízzuk a Google-re hogy egy technikai kérdést próbáljon lefordítani. Mosoly!

Elurevad · 1 March 2020 18:50

Fair enough.
The translation looked reasonable enough, though the maths is all greek to me!
Have a good one.
D

laplasz · 2 March 2020 16:18

just an update - now I found another block which was validated and has the same conditions:

"created_at_time": "2020-03-02T12:01:22.192603859+00:00",
"scheduled_at_time": "2020-03-02T15:50:11+00:00",
"scheduled_at_date": "79.37097",
"wake_at_time": "2020-03-02T15:50:11.002031353+00:00",
"finished_at_time": "2020-03-02T15:50:11.002331797+00:00",
"status": {
  "Block": {
    "block": "c5a0292a5fad76abfc2135ccf9f71766f6bc325371891ad99c4ea65b4bf1a1d8",
    "chain_length": 262178
  }
},
"enclave_leader_id": 1

https://shelleyexplorer.cardano.org/en/block/c5a0292a5fad76abfc2135ccf9f71766f6bc325371891ad99c4ea65b4bf1a1d8/

EPOCH	SLOT	TIME	STAKE POOL
79	37146	16:51:49, March 2, 2020	1f53…5d2
79	37127	16:51:11, March 2, 2020	9d51…b8b
79	37121	16:50:59, March 2, 2020	f1c9…3a2
79	37097	16:50:11, March 2, 2020	5959…eb5
79	37096	16:50:09, March 2, 2020	01bd…d2a
79	37091	16:49:59, March 2, 2020	9b00…187

So at least it means that my node is capable of fetching the latest validated block which was created in the previous slot.
So the question is why this time it was a successful creation?
Anybody else who has blocks not validated? in those cases what are the reasons?
Thanks,

ChrisSTR8 · 2 March 2020 16:44

How is the second case related to the first? In the second you reference a block which was validated, in the first one which was not.

Slot time is 2 seconds not 20. If you lose any other battles but competitive slots your pool is simply not up to the tip at that time that it matters.

Use tools like Prometheus with time-series monitoring to tune your pool. Which settings work for my pool might not work for yours. Every pool environment, latency etc. is slightly different.

Do check the parent block hash in the Jormungandr log and compare with the hash of the winning block, they will be different if it is not a competitive slot.

Your blockheight time tracking should look like this:

Shang_Di · 2 March 2020 23:51

如果网络条件没有变化，那么可能的情况是:产生37096块的节点距离你很近，所以你很快就能同步到这个块。

laplasz · 3 March 2020 08:53

Hi!

So the second case is about an info that the node can validate a block in a slot which is so close to the previous slot. So it is relevant if you want to troubleshoot the first case.
Can you share the info how to get the parent hash of a block which was not validated? Thanks.

laplasz · 3 March 2020 08:55

If that the case I will try to increase the max connections of my node
Right now is about ~200.

ChrisSTR8 · 3 March 2020 13:03

To get the parent hash of the block which was not validated just grep your jormungandr log for the “leadership” event, all the details are found there.

laplasz · 3 March 2020 14:16

That means that the node log level should be set to info… any other way to get it? but thanks for this info - I did not know about that…

ChrisSTR8 · 3 March 2020 21:00

No other way that I know of. INFO is the minimum debug level you should use in a testnet IMHO.

laplasz · 4 March 2020 12:03

perhaps the problem is not with the slot schedule. Since there were a case when the schedule was ideal - after 40 sec end before 20 sec between slots. And was not validated…

"created_at_time": "2020-03-03T20:16:47.470018238+00:00",
"scheduled_at_time": "2020-03-04T06:14:37+00:00",
"scheduled_at_date": "81.19830",
"wake_at_time": "2020-03-04T06:14:37.000967089+00:00",
"finished_at_time": "2020-03-04T06:14:37.002122292+00:00",
"status": {
  "Block": {
    "block": "b80ec5b8d6a752f96e6b65dca868630c74c3aeea6dd6f1280baaefbf095a67f8",
    "chain_length": 268006
  }
},
"enclave_leader_id": 1

here is the link for the next block which was validated:
https://shelleyexplorer.cardano.org/en/block/e009a187d30d9907b9f04651bae9b6ca53f4c0ab82e179f7beaa0a00551375c3/
And the sequence of the blocks would be the follow:

EPOCH	SLOT	TIME	STAKE POOL
81	19846	06:15:09, March 4, 2020	9277…8d7
81	19843	06:15:03, March 4, 2020	365d…434
81	19830	06:14:37, March 4, 2020	365d…434
81	19811	06:13:59, March 4, 2020	7e03…04f
81	19800	06:13:37, March 4, 2020	4437…8bd

So I think I have to increase the peer max connections to able to sync with the latest hash. I will also set the log level of the node to able to determine the parent hash of block which was not validated.

laplasz · 4 March 2020 14:56

laplasz · 4 March 2020 22:24

you get get the parent hash with the following API request:

jcli rest v0 block <blockhash> get -h <url> | cut -c 105-168

Topic		Replies	Views
Invalid block - no known cause Operate a Stake Pool	13	1446	16 December 2021
How often a slot assigned to more pools Operate a Stake Pool stake-pools , blockchain	0	561	16 February 2020
Missed slot - trying to understand what happend Operate a Stake Pool	9	1425	21 November 2021
Orphaned block Operate a Stake Pool	22	1823	7 April 2021
Example of block creation without validation Operate a Stake Pool	1	1257	11 January 2021

Troubleshooting not validated blocks

Related topics