Debugging node question cardano_node_metrics_slotsMissedNum_int

Hi,

I was looking at my node and wanted to see if I missed a block. Can someone help explain:

cardano_node_metrics_slotsMissedNum_int 195
rts_gc_max_bytes_slop 58062976
cardano_node_metrics_Stat_threads_int 15
cardano_node_metrics_density_real 4.901070974768561e-2
cardano_node_metrics_epoch_int 262
cardano_node_metrics_Forge_node_not_leader_int 435714

similar topic:

and some workaround to solve the missing slot (not perfect)

1 Like

hi,

is there a method i can use to see how many slots i’ve missed and where?
journalctl --since=‘2021-04-20’ | grep -c Missed

this metrics provided by prometheus - you can query them on the node machine:

$ curl localhost:12798/metrics
cardano_node_metrics_nodeIsLeaderNum_int 3
rts_gc_par_tot_bytes_copied 543077170096
rts_gc_num_gcs 138068
cardano_node_metrics_slotsMissedNum_int 152
rts_gc_max_bytes_slop 63980872
cardano_node_metrics_served_block_count_int 126996
...

where 12798 is the listening port of prometheus

Is there a possibility to look at what caused the missing from below. Im trying to investigate what happened that cause the missing.

I’m running the BP on an aws t2.large - 2 vCPU, 86_64, 8gb memory - network perforamnce is low to moderate.

cardano_node_metrics_slotsMissedNum_int 195

1 Like

Do you have logs with cardano.node.BlockFetchClient trace enabled?

It is currently set to false. Will turn it on and restart the bp node.

“TraceBlockFetchClient”: false,
“TraceBlockFetchDecisions”: true,
“TraceBlockFetchProtocol”: false,
“TraceBlockFetchProtocolSerialised”: false,
“TraceBlockFetchServer”: false,

So need some post processing on logs - if there is a new tip then there is blockNo and a slot number. So if you collect these events the blockNo should be increased one by one. If there is gap then a slot was missed

no.

"slots Missed " This happens when your BP cannot detect at a specific time whether it was a leader or not. . Many factors cause this to happen
node version , CPU , ram , connection peer , storage drive , run other service on server , use relay and BP in one system , and…

hmm, is there a specific log for this event?
and what is the consequences if a slot is missed? the node may not creating the block even it was scheduled to do it?

Can you please elaborate more? I have the BP on aws t2.large but can convert to a c5.large and 1 Relay on aws t2.large but can convert to a c5.large. I have 1 relay baremetal running with an i7, 16gb and 512gb ssd. My guess it was due to the high tx count and the instance couldn’t keep up.

Capture

Hi Laplasz,

does this information help? I found it just after enabling blockFetchDecision and restarting
[2021-04-28 22:21:31.63 UTC] fromList [(“tx”,Object (from List [(“txid”,String “txid: TxId {_unTxId = SafeHash “302be2aabc37ef21ec99c7469225d6bdbe1937cb1933fef74e6f2916fec51cbb”}”)])),(“mempoolSize”,Object (fr omList [(“bytes”,Number 1759.0),(“numTxs”,Number 5.0)])),(“kind”,String “TraceMempoolRejectedTx”),(“err”,Object (fromList [(“badInputs”,Array [String “8d d64896f8810b7336f7a3ed8d2273f6283253a0afb76358433a30b5c60c4ab2#9”]),(“consumed”,Object (fromList [(“lovelace”,Number 1.4475606e7),(“policies”,Object (fro mList ))])),(“error”,String “The transaction contains inputs that do not exist in the UTxO set.”),(“incorrectWithdrawals”,Array [Array [Object (fromLis t [(“credential”,Object (fromList [(“key hash”,String “1bb3e0a32ea31e8a1d80e63d9d669f74c3d624cee0af47f12ef2db18”)])),(“network”,String “Mainnet”)]),Numbe r 1.4475606e7]]),(“kind”,String “WithdrawalsNotInRewardsDELEGS”),(“produced”,Object (fromList [(“lovelace”,Number 6.014531525e9),(“policies”,Object (from List ))]))]))]

no… that is transaction related

upgraded 1 of the relay to a c5.large instance. 6 hours of monitoring and no more missed slot thus far. Will update the BP soon.

i don’t see it.
you can see it in metric data from : curl -s http://localhost:12798/metrics
yes that’s right. you have 20 sec to submit block or lose it.

what is your BP HW config . what version you use (cardano-node --version)
your HW relay is fine .
no your tx count is fine . i have 250-300 tx in mempool but did’t lost any block.

Unfortunately, slots start missing after about 9-11 hours again. At least for me. At the same time memory consumption rises from 2.5 GB to almost 5GB. Looks like a memory leak to me. IMO that needs to be addressed by development team.

This is happening on a 16GB 6 core configuration - i.e. no memory or processing power problem.

1 Like

I’ve been up for 24 hours now officially with no more miss slots. I also up the network bandwidth. Could internet speed be playing a factor in responding back to in pulling a slot for LeaderSlot?

Could you open an issue on github site?

It has been not registered yet by others as I can see…

hi @laplasz is it normal to have many 0 tx in the mempool? I’m seeing it pretty often after a small spike, it goes to 0 again and back. I would think it should above 0tx…Capture