Preview Network BP Node not producing blocks?

I just finished registering our pool and have given it many days to catch up on the epochs. You can see our pool here (Preview TTSP1 Trading Tools Software Pool - Cardanoscan)

Screenshot_2023-09-09_at_2.29.05_PM

Attached is an image of our block producing node glLiveView. It says it was the leader 117 times and adopoted 117 times but yet I don’t see any blocks minted on the explorer.

I ruled out connection and firewall issues by using the following command from the block producer node:

cardano-cli ping -c 2 -m 2 -h relays.tradingtools.software -p 6000

Which gave the results:

Just to further prove it’s not a firewall issue I also ran the command on one of the relays:

nc -v 173.212.241.124 6000 - w2

with these results:

Screenshot_2023-09-09_at_2.37.47_PM

Any ideas on why I have not produced any blocks yet?

You should have your relays as incoming connections.

Check your topology files and post gLiveview from your relays.

@Zyroxa

Relay #1 Topology File

The BP IP Address is blurred out for security reasons but is correct.

relay_topology

Relay #2 Topology File

relay_2_topology

Block Producing Topology File

bp_topology

Relay #1 GL Live View

image

Relay #2 GL Live View

relay_2_GL

On the block producer mode I can press the ‘P’ button for peer analysis and see this:

Screenshot 2023-09-09 at 8.40.08 PM

These are def my relays. I restarted the BP node and we’ll see if that makes any difference, maybe it’s because I have never restarted it since it synced?

Does anything look wrong with my topology files or setup for it to not be producing blocks? Since I restarted the node, it claims to have been slot leader many times but still no blocks have been produced.

Screenshot 2023-09-10 at 5.00.09 PM

My config file in case that matters is below as well.

{
  "AlonzoGenesisFile": "alonzo-genesis.json",
  "AlonzoGenesisHash": "7e94a15f55d1e82d10f09203fa1d40f8eede58fd8066542cf6566008068ed874",
  "ApplicationName": "cardano-sl",
  "ApplicationVersion": 0,
  "ByronGenesisFile": "byron-genesis.json",
  "ByronGenesisHash": "83de1d7302569ad56cf9139a41e2e11346d4cb4a31c00142557b6ab3fa550761",
  "ConwayGenesisFile": "conway-genesis.json",
  "ConwayGenesisHash": "f28f1c1280ea0d32f8cd3143e268650d6c1a8e221522ce4a7d20d62fc09783e1",
  "EnableP2P": true,
  "ExperimentalHardForksEnabled": false,
  "ExperimentalProtocolsEnabled": false,
  "LastKnownBlockVersion-Alt": 0,
  "LastKnownBlockVersion-Major": 3,
  "LastKnownBlockVersion-Minor": 1,
  "Protocol": "Cardano",
  "RequiresNetworkMagic": "RequiresMagic",
  "ShelleyGenesisFile": "shelley-genesis.json",
  "ShelleyGenesisHash": "363498d1024f84bb39d3fa9593ce391483cb40d479b87233f868d6e57c3a400d",
  "TargetNumberOfActivePeers": 20,
  "TargetNumberOfEstablishedPeers": 50,
  "TargetNumberOfKnownPeers": 100,
  "TargetNumberOfRootPeers": 100,
  "TestAllegraHardForkAtEpoch": 0,
  "TestAlonzoHardForkAtEpoch": 0,
  "TestMaryHardForkAtEpoch": 0,
  "TestShelleyHardForkAtEpoch": 0,
  "TraceAcceptPolicy": true,
  "TraceBlockFetchClient": false,
  "TraceBlockFetchDecisions": false,
  "TraceBlockFetchProtocol": false,
  "TraceBlockFetchProtocolSerialised": false,
  "TraceBlockFetchServer": false,
  "TraceChainDb": true,
  "TraceChainSyncBlockServer": false,
  "TraceChainSyncClient": false,
  "TraceChainSyncHeaderServer": false,
  "TraceChainSyncProtocol": false,
  "TraceConnectionManager": true,
  "TraceDNSResolver": true,
  "TraceDNSSubscription": true,
  "TraceDiffusionInitialization": true,
"TraceErrorPolicy": true,
  "TraceForge": true,
  "TraceHandshake": false,
  "TraceInboundGovernor": true,
  "TraceIpSubscription": true,
  "TraceLedgerPeers": true,
  "TraceLocalChainSyncProtocol": false,
  "TraceLocalErrorPolicy": true,
  "TraceLocalHandshake": false,
  "TraceLocalRootPeers": true,
  "TraceLocalTxSubmissionProtocol": false,
  "TraceLocalTxSubmissionServer": false,
  "TraceMempool": true,
  "TraceMux": false,
  "TracePeerSelection": true,
  "TracePeerSelectionActions": true,
  "TracePublicRootPeers": true,
  "TraceServer": true,
  "TraceTxInbound": false,
  "TraceTxOutbound": false,
  "TraceTxSubmissionProtocol": false,
  "TracingVerbosity": "NormalVerbosity",
  "TurnOnLogMetrics": true,
  "TurnOnLogging": true,
  "defaultBackends": [
    "KatipBK"
  ],
  "defaultScribes": [
    [
      "StdoutSK",
      "stdout"
    ]
  ],
  "hasEKG": 12788,
  "hasPrometheus": [
    "127.0.0.1",
    12798
  ],
  "minSeverity": "Info",
  "options": {
    "mapBackends": {
      "cardano.node.metrics": [
        "EKGViewBK"
      ],
      "cardano.node.resources": [
        "EKGViewBK"
      ]
    },
    "mapSubtrace": {
      "cardano.node.metrics": {
        "subtrace": "Neutral"
      }
    }
  },
  "rotation": {
    "rpKeepFilesNum": 10,
    "rpLogLimitBytes": 5000000,
    "rpMaxAgeHours": 24
  },
  "setupBackends": [
    "KatipBK"
  ],
  "setupScribes": [
    {
      "scFormat": "ScText",
      "scKind": "StdoutSK",
      "scName": "stdout",
      "scRotation": null
    }
  ]
}

I noticed incoming connections shows 0 on the block producer mode. On the relay there is tons of incoming connections. I’m not sure if this matters for the block producer, but I have noticed other posts on this forum where this is the same as the outgoing. Could this be affecting things somehow?

I got it to start producing blocks! Turns out the relays need to be restarted after the block producer syncs, once I did this it recognized the incoming connections and transactions and was able to mint a block succesfully! I feel so dumb haha, hopefully this helps someone else starting this.

I still have the issue of it not showing the rtt values.

1 Like

That seems odd to me. If your relays were remaining synced then this means they were pulling blocks from other relays and therefore should have been able to pull blocks from your block producer equally well. Maybe there is something in the P2P protocol which caused your relay to label your block producer as “tainted” or “untrusted” because it wasn’t properly synced and thereby refused to re-establish a connection with it, even after it became synced???

It is frustrating to get a problem like this which you never properly get to the bottom of, because restarting the service just made things work. I feel your frustration and I would like to know the explanation too.

1 Like

Yeah there was no indication this was any kind of problem or that I needed to restart, but apparently that fixed it… at least it’s working and minting blocks now and if I see incoming connections: 0 I know something is wrong.

To be clear I started by syncing my relays first, then a few days later I used the relays to sync the block producer node but I noticed at the time the block producer node declared itself a relay in glLiveView until it finished syncing completely and then it said core in glLiveView. So this is possibly why the relays needed a restart? Perhaps because the relays were connected to what it considered another ‘relay’ at that time?

1 Like

gLiveView is only gathering information it can see on the running machine using linux tools available. You can also use these tools and look in the logs on the running machine. For example, when the block producer is still syncing it won’t be doing it’s slot leader checks but when fully synced these will start happening.

As far as I am aware, each relay has no idea whether the other node it is directly connected to is another relay or a block producer. Each is simply a node that can supply blocks.

The 0 incoming connections on your block producer indicates your relays were not initiating connections to it or at least these connection attempts were not getting through to it. If the only thing you changed was to restart these relays and there were no changes to your block producer, firewall, dns, or network routing, then maybe your relays didn’t “like” your block producer for some reason because it wasn’t synced for so long and the P2P mechanism labelled it “untrusted” so they refused to connect??? I don’t know, but it seems odd.

Did the block producer try to establish connections to the relays from it’s end? You could look through the logs on the block producer at the time to see if there were attempts by it to connect out to your relays.
On my block producer I see logs like this:

Sep 12 00:29:53 bp1 cardano-node[2664962]: [bp1:cardano.node.LocalRootPeers:Info:586] [2023-09-11 14:29:53.21 UTC] TraceLocalRootResult (DomainAccessPoint {dapDomain = "relays.terminada.io", dapPortNumber = 2700}) [(x.x.x.x,3600),(x.x.x.x,3600),(x.x.x.x,3600),(x.x.x.x,3600)]
Sep 12 00:29:53 bp1 cardano-node[2664962]: [bp1:cardano.node.LocalRootPeers:Info:586] [2023-09-11 14:29:53.21 UTC] TraceLocalRootGroups [(6,fromList [(x.x.x.x:2700,DoNotAdvertisePeer),(x.x.x.x:2700,DoNotAdvertisePeer),(x.x.x.x:2700,DoNotAdvertisePeer),(x.x.x.x:2700,DoNotAdvertisePeer)])]
Sep 12 00:29:53 bp1 cardano-node[2664962]: [bp1:cardano.node.LocalRootPeers:Info:586] [2023-09-11 14:29:53.21 UTC] TraceLocalRootDNSMap (fromList [(DomainAccessPoint {dapDomain = "relays.terminada.io", dapPortNumber = 2700},[x.x.x.x:2700,x.x.x.x:2700,x.x.x.x:2700,x.x.x.x:2700])])
Sep 12 00:29:53 bp1 cardano-node[2664962]: [bp1:cardano.node.LocalRootPeers:Info:586] [2023-09-11 14:29:53.21 UTC] TraceLocalRootWaiting (DomainAccessPoint {dapDomain = "relays.terminada.io", dapPortNumber = 2700}) 3600s

Note that the x.x.x.x values were all valid IP addresses which I changed to protect the innocent before pasting. :grinning:

I see similar logs on my relays for setting up connections the other way.

I just looked at one of my relays to see what sort of errors it logs about peer connections and I see messages like this:

Sep 12 07:21:58 relay1 cardano-node[984262]: [relay1:cardano.node.PeerSelectionActions:Error:99235] [2023-09-11 21:21:58.37 UTC] PeerStatusChangeFailure (HotToCold (ConnectionId {localAddress = 172.27.0.7:2700, remoteAddress = 13.228.77.95:1338})) (ApplicationFailure [MiniProtocolException {mpeMiniProtocolNumber = MiniProtocolNum 2, mpeMiniProtocolException = MuxError (MuxShutdown (Just (MuxIOException Network.Socket.recvBuf: resource vanished (Connection reset by peer)))) "(recv errored)"}])
Sep 12 07:21:58 relay1 cardano-node[984262]: [relay1:cardano.node.PeerSelectionActions:Error:99235] [2023-09-11 21:21:58.37 UTC] PeerMonitoringError (ConnectionId {localAddress = 172.27.0.7:2700, remoteAddress = 13.228.77.95:1338}) (MiniProtocolExceptions [MiniProtocolException {mpeMiniProtocolNumber = MiniProtocolNum 2, mpeMiniProtocolException = MuxError (MuxShutdown (Just (MuxIOException Network.Socket.recvBuf: resource vanished (Connection reset by peer)))) "(recv errored)"}])

So it looks like that peer went off-line or something. The “resource vanished” message seems to indicate that the tcp connection got closed by the other end.

There is so much useful logging done by the cardano-node software. It might be worthwhile looking for similar messages about errors in your logs during the time your blocks weren’t getting out, because you might not have fully fixed your problem yet.

Another thing I see in the relay logs is this message:

Sep 12 07:45:37 relay1 cardano-node[984262]: [relay1:cardano.node.ConnectionManager:Info:626] [2023-09-11 21:45:37.24 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 1, duplexConns = 29, unidirectionalConns = 42, inboundConns = 22, outboundConns = 50})

I don’t actually understand the difference between “fullDuplex” and “Duplex” in this P2P mechanism. If anyone does, please let me know.

I assume this “fullDuplexConns = 1” is the connection between my relay and my block producer as this is the only accessPoint listed under localRoots in the relay’s topology file. In other words, I think that because each end point (relay, bp) is trying to set up a connection with each other then this is upgraded to “fullDuplex” by the P2P mechanism. (Maybe?) On the other hand, I think the “duplexConns = 29” might represent connections set up by outside relays into my relay, or connections from my relay out to other relays, which then agree on a duplex upgrade.

You could look for similar “ConnectionManager:Info” logs to see if you are now seeing “fullDuplex” connections between your relay and block producer.

If you figure out anything, let me know. IOG have really built some robust mission critical beast here but the complexity is mind blowing.

1 Like