Cardano-node 1.33.0 in P2P mode on mainnet?

I have one of my relays running in P2P mode on mainnet and it seems to be running better than my other relays in normal (“direct connection”) mode.

I added the following to my mainnet-config.json:

  "TestEnableDevelopmentNetworkProtocols": true,
  "EnableP2P": true, 
  "MaxConcurrencyBulkSync": 2,
  "MaxConcurrencyDeadline": 4,
  "TargetNumberOfRootPeers": 100,
  "TargetNumberOfKnownPeers": 100,
  "TargetNumberOfEstablishedPeers": 50,
  "TargetNumberOfActivePeers": 20,

And configured my mainnet-topology.json:

{                                                                               
  "LocalRoots": {                                                               
    "groups": [                                                                 
      {                                                                         
        "localRoots": {                                                         
          "accessPoints": [                                                     
            {                                                                   
              "address": "relay2",                                 
              "port": 3000                                                      
            },                                                                  
            {                                                                   
              "address": "relay3",                                 
              "port": 3000                                                      
            },                                                                  
            {                                                                   
              "address": "blockproducer",                                    
              "port": 3000                                                      
            }
          ],
          "advertise": false
        },
        "valency": 3
      }
    ]
  },
  "PublicRoots": [
    {
      "publicRoots" : {
        "accessPoints": [
          {
            "address": "relays-new.cardano-mainnet.iohk.io",
            "port": 3001
          }
        ],
        "advertise": false
      },
      "valency": 2
    }
  ],
  "useLedgerAfterSlot": 0
}

The relay has logs like:

TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 0, unidirectionalConns = 82, inboundConns = 32, outboundConns = 50})
TrInboundGovernorCounters (InboundGovernorCounters {coldPeersRemote = 0, idlePeersRemote = 1, warmPeersRemote = 0, hotPeersRemote = 31})
TrInboundGovernorCounters (InboundGovernorCounters {coldPeersRemote = 0, idlePeersRemote = 1, warmPeersRemote = 0, hotPeersRemote = 31})
TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 0, unidirectionalConns = 82, inboundConns = 32, outboundConns = 50})

I monitor block receipt delay with a script on each relay and it appears that the relay running in P2P mode gets blocks a little quicker than my other relays which each have 20-24 directly configured peers fetched from “api.clio.one”.

Here is a comparison of block receipt delays.

Left side is relay running P2P. Right side is normal mode (relay with 24 directly configured outgoing peers, as well as currently 12 incoming peers)

slot 50158044 delayed 1470ms     slot 50158044 delayed 1510ms
slot 50158061 delayed  750ms     slot 50158061 delayed  780ms
slot 50158063 delayed  560ms     slot 50158063 delayed  550ms
slot 50158079 delayed 1090ms     slot 50158079 delayed 1200ms
slot 50158146 delayed 1750ms     slot 50158146 delayed  860ms
slot 50158151 delayed 1820ms     slot 50158151 delayed 1860ms
slot 50158162 delayed 1060ms     slot 50158162 delayed 1140ms
slot 50158175 delayed 1060ms     slot 50158175 delayed 1120ms
slot 50158192 delayed 1410ms     slot 50158192 delayed 1260ms
slot 50158194 delayed  500ms     slot 50158194 delayed  510ms
slot 50158195 delayed  520ms     slot 50158195 delayed  520ms
slot 50158224 delayed 1540ms     slot 50158224 delayed 1670ms
slot 50158248 delayed  950ms     slot 50158248 delayed 1060ms
slot 50158260 delayed  890ms     slot 50158260 delayed  960ms
slot 50158274 delayed 1080ms     slot 50158274 delayed 1090ms
slot 50158382 delayed 1290ms     slot 50158382 delayed 1530ms
slot 50158393 delayed 1250ms     slot 50158393 delayed 1410ms
slot 50158452 delayed  950ms     slot 50158452 delayed 1110ms
slot 50158459 delayed 1300ms     slot 50158459 delayed 1160ms
slot 50158469 delayed 1170ms     slot 50158469 delayed 1180ms
slot 50158499 delayed  790ms     slot 50158499 delayed 1030ms

In this list of block receipt delays: There are 16 slots where the P2P relay was quicker and only 4 slots where the it was slower.

Is anyone else running 1.33.0 in P2P mode on mainnet?

What experience do others have running P2P on testnet?

Is there something else that should be configured?

7 Likes

Looks complex

1 Like

p2p on mainnet is unsupported as of today (check release notes), and it would be better not to encourage people to switch it on :slight_smile:

1 Like

Why? What is the problem?

Isn’t it just a different way of setting up the connections?

Instead of telling my node manually who to connect to by downloading a “suggested” list from a central site like “api.clio.one”, it gets a list from the chain and keeps changing the list depending on quality of connection.

Are you saying something can break? If so what?

This version also adds experimental support for peer-to-peer network. It is unverified and unsupported , and hence not recommended to be enabled in production.

As written in the release notes. A p2p test is currently on going and a different branch is used for that.

1 Like

Yes. I read that before I started running it.

  • Maybe the node won’t maintain a good list of relays to connect with?
  • Maybe the list I download from api.clio.one will have better connected relays and result in better block receipt times.
  • Maybe in P2P mode the relay will take a longer time to establish enough connections with other nodes so it might get higher block delays for an initial period. However, it does seem like I can overcome this by adding peers in the “PublicRoots” section of the topology.json file.

I get it. There are unknown unknowns. Software can break and fail.

I can deal with the possibility that one of my relays gets disconnected or suffers from a period of poor connectivity.

Having said that, 2 of my IP addresses recently got firewalled from pooltool.io and I was unable to send tip information for a day. I don’t know why, but they got un-blocked a day later. Other IP addresses were not blocked so it seemed like a firewall issue at pooltool.io end. This is the problem with relying on a centralised service.

What if api.clio.one decides to give me a poor list of relays to connect to or even stops providing me a list at all? More importantly, what if api.clio.one doesn’t list my relay addresses in the lists given to other relays? These are known risks.

P2P mode can’t come soon enough!

Currently my P2P mode relay seems to be doing a better job than my other relays.

Thanks for this, just spun a new relay with p2p enabled. do you know what “advertise” do? thanks

1 Like

I am not totally sure. However, I believe “advertise” enables a mechanism whereby nodes can let other nodes know about the nodes they are getting blocks from. In other words, your node can then advertise the nodes it is using to other nodes.

Overall, P2P mode is working great for me, but see below for a problem.

I have my block producer and one of my relays using P2P mode now. My relay in P2P mode is quicker at getting blocks than my other relays on average.

On my block producer I have this mainnet-topology.json:

{
  "LocalRoots": {
    "groups": [
      {
        "localRoots": {
          "accessPoints": [
            {
              "address": "relays",
              "port": 3000
            }
          ],
          "advertise": false
        },
        "valency": 3
      }
    ]
  },
  "PublicRoots": []
}

My DNS resolves “relays” to 3 IP addresses for my relays (hence valency 3).

Note that PublicRoots is empty and also that “useLedgerAfterSlot” has been deleted altogether. If you set it like this then when the node starts you will see that useLedgerAfterSlot gets a default value of “-1” which means that the node does not try to connect with any other external nodes using ledger registration records, ever. In other words, it only connects with my localRoots. This is what you want for a block producer.

On one of my 3 relays I now use this mainnet-topology.json:

{                                                                               
  "LocalRoots": {                                                               
    "groups": [                                                                 
      {                                                                         
        "localRoots": {                                                         
          "accessPoints": [                                                     
            {                                                                   
              "address": "blockproducer",                                    
              "port": 3000
            }                                                                   
          ],
          "advertise": false
        },
        "valency": 1
      }
    ]
  },
  "PublicRoots": [
    {
      "publicRoots" : {
        "accessPoints": [
          {
            "address": "relays-new.cardano-mainnet.iohk.io",
            "port": 3001
          }
        ],
        "advertise": true
      },
      "valency": 1
    }
  ],
  "useLedgerAfterSlot": 0
}

Note that the only localRoot it has is my block producer and it doesn’t advertise it. For publicRoots it has only an iohk relay and useLedgerAfterSlot 0, so that it will use the ledger to obtain relay information from slot 0. It also has “advertise”: true.

Note: The iohk relay may not be required here since my node already has enough blockchain data to access the ledger for relay information from slot 0. But I figured it couldn’t hurt so I just left it in.

One problem with P2P I noticed:

I have been actively monitoring the connections between my nodes since experimenting with P2P. I noticed that one of my relays running normal mode very occasionally drops its half-duplex (pull) connection to my block producer running in P2P, and it never re-establishes it, until restarted.

Here are some explanatory details:

  • Nodes running in normal mode connect with half-duplex connections to each other using a random high port number connected to the other relay’s cardano-node port (3000). You will see TCP connections like 47213 to 3000. If they connect to each other then you see two half-duplex connections. (Eg. for block producer connected to relay and relay connected to block producer. Both high ports to port 3000). This is normal.
  • Nodes running in P2P mode set up TCP connections directly between their cardano-node ports. IE: 3000 to 3000. If 2 P2P nodes are connected with each other (Eg. block producer to relay and relay to block producer) there is only 1 connection set up from port 3000 to 3000. The logs on both these P2P nodes label this single connection as full duplex.
  • Now if a normal node connects to a P2P node and this P2P node also connects to the normal node you get a half-duplex connection from the normal node to the P2P node (high port to 3000). You also get the connection from the P2P node to the normal node (3000 to 3000) but now the logs on the P2P node label this connection as half-duplex.

This last scenario is where I noticed the problem: Very occasionally, one of my normal nodes drops its half-duplex connection. The logs on this node show an “AsyncCancelled” exception. The P2P node maintains its connection with the normal node (3000 to 3000) but this connection is not upgraded to full duplex it remains as half-duplex, and most importantly, blocks are not pulled by the normal node from the P2P node (since the normal node doesn’t have its half-duplex connection anymore). The normal node never tries to re-establish its half-duplex connection either!

Importantly: This problem could result in your relay failing to pull your block from your block producer!

Interestingly, I have only seen this happen with one of my normal mode relays and this particular relay is on the same subnet as my block producer. Another normal mode relay, on the other side of the world, has never dropped its half-duplex connection with my block producer in P2P mode. Weird because you would think that this more distant connection would be more likely to drop.

I am guessing the cause has something to do with the communication protocol between nodes regarding these connections. Maybe there is a conflict somehow whereby the normal node thinks it has this connection that the P2P node set up and so doesn’t try re-establishing the half-duplex connection.

I have never seen my relay running P2P drop its full-duplex connection with my block producer in P2P.

The bottom line is:
If you intend to experiment with P2P mode, make sure that you have other reliable relays that will pull your blocks from your block producer and be prepared to actively monitor the connections.

Furthermore, there could be other problems with P2P if everyone was to switch to using it immediately.
After all, it is a big deal to just rely on software to arrange all inter-node connections automatically for the entire cardano network. What if some nodes arranged themselves in separate disconnected cliques? I can definitely understand IOHK’s conservative approach in signing off on P2P mode and just letting it rip for everyone to totally rely on it.

I continue to run 2 of my 3 relays in normal mode.

1 Like

This is an incredibly interesting topic.

I’ve launched a new relay in P2P mode, but I’m running in docker/kubernetes, so I’m wondering how does the discovery works in this situation.

I know there is a half vs full-duplex communication in place… anyway. Just a quick thanks for the long reply, I will read more carefully in a while.

Thanks!

1 Like

Okay, just had time to properly read your response. Loads of insight.

{
  "LocalRoots": {
    "groups": [
      {
        "localRoots": {
          "accessPoints": [
            {
              "address": "bp",
              "port": 3000
            },
            {
              "address": "relay1",
              "port": 3000
            },
            {
              "address": "relay2",
              "port": 3000
            }
          ],
          "advertise": false
        },
        "valency": 1
      }
    ]
  },
  "PublicRoots": [
    {
      "publicRoots": {
        "accessPoints": [
          {
            "address": "relays-new.cardano-mainnet.iohk.io",
            "port": 3001
          }
        ],
        "advertise": false
      },
      "valency": 2
    }
  ],
  "useLedgerAfterSlot": 0
}

Following your example, I’ve setup this P2P config on a new relay, in the same network as the other 2 relays and the BP.

BP and 2 relays are running normal mode.

  1. I’ve set valency 1 to the LocalRoots group, is that correct?
  2. My stake pool runs on kubernetes (containers). As far as I can tell, the P2P node has no way to know on which ip/port can receive incoming connection. Do you have any idea? Is there any doc I can read about this?
  3. You’re mentioning your normal relays are dropping their connection to BP running P2P. I don’t feel like switching BP to P2P, but only a relay. Do you think I will still get propagation benefit?
  4. Would you recommend connecting the relays to eachother (P2P and normal mode?)

Thanks in advance.

1 Like

With the disclaimer that I am not totally sure about any of this since the documentation is almost non-existent; here goes:

Valency should be 3 if you want the node to connect to all 3 of the localRoots accessPoints.

However, think about why you would want this. I could be wrong about this but, the way I view things is that you want blocks being pulled from the Cardano network in towards your block producer and also blocks flowing the other way for blocks you produce. Thus, I don’t think you will get any advantage from having your relays talking between each other. The less unnecessary connections the better. In general, having the relays interconnected is not going to help your block producer get new blocks faster and is also not going to help your blocks get out faster. It might actually slow your relays by having too many connections.

I think it would be best to have each relay differently connected to the cardano network with each also having connections to/from your block producer. In a star burst type pattern with your block producer at the centre.

If you agree with this logic, then I would delete relay1 and relay2 from your localRoots accessPoints and just leave the “bp” address. Then you can also leave the valency at 1.

I run my nodes as virtual machines (KVM) which is similar but I believe the kubernetes containers get different IP addresses in a non-predictable way. I think you can configure an incoming IP address that the container manager will route. I don’t know enough to help you there sorry.

I was seeing the problem when I had a normal mode relay connecting to my block producer running in P2P mode. Thus, it is probably more risky to switch your block producer to P2P like I did. However, I can’t see much problem in running an extra relay in P2P mode because you still have your other relays and block producer running normally. You may find, like I did, that the relay running P2P seems to propagate the blocks fastest both ways.

Note that currently you still need to ping api.clio.one with your new relay’s data (IP, port) so that other cardano nodes get this new relay in their topology files. You do need other nodes connecting into each of your relays and currently the rest of the cardano network relies on topology files for this. They need to put your relay’s address in their topology files. If you don’t have any incoming connections to your new relay then it will be of no benefit for the blocks your block producer makes.

Answered above. Maybe I should delete the extra relays I put in my original post config under localRoots. It was a bit of an experiment and I was monitoring their connections with each other.

This technology being built by IOHK is very well designed and thought out. I wish I understood enough to be able to read the code properly.

Here is a script I use for monitoring block receipt delay on each of my relays and block producer. You can also use it to send tip information to pooltool.io. The script uses a config file “my-cardano-node.config.json” which is just so that I don’t put the secret pooltool.io api_key and pool_id in the script. If you run the script in a terminal it will just keep printing the block delays every time the node extends its tip. I also push the data to prometheus pushgateway so that I can see it using Grafana. Just modify the script as needed if it is useful.

#!/bin/bash
set -o nounset
set -o errexit
set -o pipefail

CNODE_HOME='/opt/cardano' #FIXME
export CARDANO_NODE_SOCKET_PATH='/run/cardano/mainnet-node.socket' #FIXME
# Config must contain at least {poolId:, pooltoolApiKey:}
config="${CNODE_HOME}/etc/my-cardano-node-config.json" #FIXME
platform="monitor-block-delay"
[[ "$(cardano-cli version)" =~ cardano-cli\ ([\.0-9]+)\ .*rev\ ([a-f0-9]+)$ ]]
version="${BASH_REMATCH[1]}:${BASH_REMATCH[2]:0:5}"
shortname="$(hostname)" # Only sent to pushgateway (not pooltool.io)
slot0sec=1591566291 # Slot 0 seconds since 1970-01-01 00:00:00 UTC
nl=$'\n'
[[ -f "$config" ]] || { echo "Config: $config doesn't exist."; exit 1; }
pool_id="$(jq -r '.poolId' "$config" 2>/dev/null)"
api_key="$(jq -r '.pooltoolApiKey' "$config" 2>/dev/null)"
node_id="$(jq -r '.nodeId' "$config" 2>/dev/null)"
if [[ -z "${pool_id:-}" || "$pool_id" == 'null' || -z "${api_key:-}" || "$api_key" == 'null' ]]; then
  echo "Config: $config must contain poolId and pooltoolApiKey"
  exit 1
fi

function usage() {
  echo "Usage: $(basename $0) [-p] [-n]" 2>&1
  echo 'Log cardano-node block delay'
  echo '  -p     Push information to prometheus-pushgateway on localhost:9091'
  echo '  -s     Send information to pooltool.io'
  echo
}
optstring=':ps'
while getopts ${optstring} arg; do
  case ${arg} in
    p) optpush='true';;
    s) optsend='true';;
    h) usage && exit 0;;
    ?) echo "Invalid options: -${OPTARG}."; echo; usage && exit 1;;
  esac
done

function write_log() {
  local logtime=$1; local logslot=$2; local delay=$3
  printf 'Received %s slot %d delayed %4dms\n' "$logtime" "$logslot" "$delay"
}

function push_prometheus() {
  local delay="$1"
  local data="itmedico_cardano_node_metrics_block_delay ${delay}${nl}"
  curl -X POST -H "Content-Type: text/plain" --data "$data" "http://localhost:9091/metrics/job/itmedico_cardano_node_metrics/instance/${shortname}"
}

function send_pooltool() {
  local logtime="$1"; local loghash="$2";
  local tipslot=''; local tiphash=''; local tipblock=''
  read -r -d '' tipslot tiphash tipblock < <(cardano-cli query tip \
    --mainnet 2>/dev/null | jq -r '.slot, .hash, .block' || true; printf '\0';)
  if [[ -z "$tipslot" || -z "$tiphash" || -z "$tipblock" ]]; then
    echo 'cardano-cli failed. Broken pipe?'
    return 0 # Sometimes cardano socket is recreated (broken pipe) - Just skip
  fi
  if [[ "$loghash" != "$tiphash" ]]; then
    echo "loghash $loghash != cardano-cli tiphash $tiphash. Skipping."
    return 0 # Skip if different tip
  fi
  json="$(jq -cn --arg apiKey "$api_key" --arg poolId "$pool_id" \
    --arg nodeId "$node_id" --arg version "$version" --arg at "$logtime" \
    --arg blockNo "$tipblock" --arg slotNo "$tipslot" \
    --arg blockHash "$tiphash" --arg platform "$platform" \
    '{apiKey: $apiKey, poolId: $poolId, data: {nodeId: $nodeId, version: $version, at: $at, blockNo: $blockNo | tonumber, slotNo: $slotNo | tonumber, blockHash: $blockHash, platform: $platform}}')"
  response="$(curl -s -H "Accept: application/json" -H "Content-Type:application/json" -X POST --data "$json" 'https://api.pooltool.io/v0/sendstats' || true)" # '' if curl fails
  success="$(echo "${response:-}" | jq -r '.success')"
  if [[ "$success" == 'true' ]]; then result=success; else result=failure; fi;
  printf 'Sent pooltool: %s Response: %s\n' "${logtime},block=${tipblock},slot=${tipslot},hash=${tiphash}" "$result"
}

echo "$(basename $0) started with config: poolId=${pool_id},apiKey=${api_key},nodeId=${node_id},version=${version},platform=${platform}"

# regex for log when new block received "Chain extended, new tip:"
regex='\[([-0-9]+)\ ([\.:0-9]+)\ UTC\]\ Chain\ extended,\ new\ tip:\ ([a-f0-9]+)\ at\ slot\ ([0-9]+)$'
lasthash=''
# follow journal unit cardano-node < <(journalctl -fn0 -u cardano-node)
while IFS= read -r line; do
  [[ "${line:-}" =~ $regex ]] || continue # skip logs not about new blocks
  logtime="${BASH_REMATCH[1]}T${BASH_REMATCH[2]}Z"
  loghash="${BASH_REMATCH[3]}"
  logslot="${BASH_REMATCH[4]}"
  [[ "${loghash}" != "${lasthash}" ]] || continue # skip logs about same block
  lasthash="$loghash"
  # Calculate delay = received time (log message) - actual slot time
  delay=$(( $(date +%s%3N -d "$logtime") - (( $logslot + $slot0sec ) * 1000 ) ))
  write_log "$logtime" "$logslot" "$delay"
  [[ "${optpush:-}" == 'true' ]] && push_prometheus "$delay"
  [[ "${optsend:-}" == 'true' ]] && send_pooltool "$logtime" "$loghash"
done < <(journalctl -fn0 -u cardano-node)
1 Like

You’re an absolute legend I do agree with you about relays, I will improve my topology there accordingly.

I am still not sure how the P2P relay will know what is its ip address/port number if it’s behind a NAT-like configuration. I think this is a question for IOG.

Thanks for the script too, I will take a look!

HAvea great day!

1 Like

When your relay connects to another external relay this external relay will see the IP address of your router doing the NAT. This is no problem when your relay sets up the connection.

However, the problem is when an external relay wants to set up the connection into your relay. This is where the topology updater service that api.clio.one comes in.

This is why I said:

So, if your relay is behind a router doing NAT, then you need to forward the port on your router to that relay. For example if cardano-node on your relay is using port 3000 then you need to forward port 3000 on your router to this relay’s IP port 3000.

Here is a bash script you can use to ping api.clio.one from each relay with its details.

#!/bin/bash
set -o nounset
set -o errexit
set -o pipefail

CNODE_HOME='/opt/cardano' #FIXME
export CARDANO_NODE_SOCKET_PATH='/run/cardano/mainnet-node.socket' #FIXME
service_url='https://api.clio.one/htopology/v1/'
shelley_genesis="${CNODE_HOME}/etc/mainnet-shelley-genesis.json" #FIXME
nwmagic="$(jq -r '.networkMagic' "$shelley_genesis")"
hostname="$(hostname -f)"
port=3000 #FIXME

function send_topology() {
  # Sometimes cardano socket recreated causing broken pipe so try 5 times
  local count=0
  while
    if (( ++count > 5 )); then
      echo 'Error: send_topology(): cardano-cli failed' >&2
      return 1
    fi
    local blockNo="$(cardano-cli query tip --mainnet 2>/dev/null | jq -r '.block')"
    [[ -z "${blockNo:-}" ]]
  do sleep 1; done
  local response="$(curl -sf -4 "${service_url}?port=${port}&blockNo=${blockNo}&valency=1&magic=${nwmagic}&hostname=${hostname}")"
  echo "$response"
}

send_topology || exit 1
exit 0

Note how the send_topology() function sends a port number but not an IP address. api.clio.one already knows your IP address because the curl command in the script set up a tcp connection to api.clio.one. But this IP is your router’s external IP address since it is doing NAT.

api.clio.one collates all this information from other relays doing the same. Then when people request topology.json files from api.clio.one the service will feed your relay’s IP and port to some of these people.

In turn these people will use the topology.json files received to configure their relays to connect to, and pull blocks from, your relay. These people will be attempting to connect to your external (router) IP address on the port you specified. This is why you had better forward that port to your internal relay otherwise their connection attempts will fail.

Remember everyone is still using half-duplex (pull only) connections between relays at present.

I haven’t tested this but I imagine that P2P mode relays will be able to pierce firewalls without needing port forwarding enabled and pull blocks both ways using full-duplex connections. BUT, only when the relay behind the router initiates the TCP connection out to the external relay. Without port forwarding the external relay would still be unable to initiate the connection into your internal relay.

So you do need to set up port forwarding on your router. And, you will still need port forwarding when everyone switches on P2P mode. This is because in P2P mode the relays will be setting up and tearing down connections themselves and you will still want other relays to be able to initiate connections into your relay.

1 Like

Right, I agree with you on everything said in this thread. Thanks.

I’ve re-enabled my topology updater. Updated my “local topology” according to the thread.

I’ve also noticed node has internal metrics about block propagation as perceived by my relays/bp

Will keep an eye in the next couple of days and let you know.

1 Like

Nice graphs. Are you using these Prometheus metrics:
cardano_node_metrics_blockfetchclient_blockdelay_cdf(One|Three|Five)
Which I think is percentage of blocks within 1|3|5 seconds?
Or are you graphing block delay times in seconds?

I have been graphing the actual block delay times using my script:
screenshot

1 Like

I’m graphing this one: cardano_node_metrics_blockfetchclient_blockdelay_s that I think it’s gauge that should tell the latency of the last block.

I’m also using cardano_node_metrics_blockfetchclient_blockdelay_cdfOne{namespace=~"$namespace", pod=~"$pod"} but I haven’t had time to properly write a graphing function.

1 Like

That is fantastic. I hadn’t noticed that new metric:
“cardano_node_metrics_blockfetchclient_blockdelay_s”

That metric is consistently slightly shorter than my script produces from when the cardano-node logs the “Chain extended, new tip” message. There is approx 0.15 to 0.30 seconds difference representing the time for that log message to appear.

I am going to change over my graphing to use that new metric. Thanks.

But I would also like to use this metric for sending the tip delay information to pooltool.io. I know that most operators use cncli to send their tip information but I would prefer to use my own script since I understand what it is doing.

Do you know how I could grab that blockdelay value from the cardano-node somehow without having to keep hitting port 12798 with repeated curl requests? I want a method that can keep a file or socket open and follow it similar to tailing a log file.

Maybe it is possible to create an additional “scribe” under the “hasPrometheus” section in mainnet-config.json file and then cardano-node will send the information to a file???

No idea no how to access that info, scribe sounds like a viable way, but I’m not an expert in that area. afaik cncli speaks directly to the internals of the node, so I know you prefer to use your scripts, but that could actually be the best way of going about it.

Also a quick finding, I just compared the CDFs of my relays, and the p2p manages to pull twice as many blocks within 1 second than other relays.

Screenshot 2022-02-02 at 10.59.49
Screenshot 2022-02-02 at 10.59.38

Yet, the BP has about 20% of blocks pulled within 1 second so either the extra hope delays the blocks that little bit to make them fall in the next bucket ( >1 second) OR the non-p2p node can’t take enough advantage of p2p nodes (and that would be weird/a bummer)

1 Like

Maybe I am paranoid, but I didn’t like the idea of using cncli to send information to pooltool.io because I couldn’t understand what exact information it was sending. However, I do use cncli regularly to determine my leader-logs - but this information doesn’t leave my system.

That fits with what I am seeing too. The P2P relay does seem to get the blocks quicker.

Regarding your block producer not seeing much increase in the blockdelay-CDF-1s: I agree it is likely due to the extra hop. My block producer is delayed another 0.15 to 0.20 seconds after my quickest local relay. Nearly all of my block delays this epoch are well over 1 second, even on the relays, so my blockdelay-CDF-1s would be no different between relays and block producer at almost zero.

1 Like

Not paranoid at all. I think when tehre are money involved caution is never enough.

I had some chats inthe testnet chat, and found aout about htese params:

https://docs.cardano.org/getting-started/guidelines-for-large-spos

# The maximum number of used peers during bulk sync.
MaxConcurrencyBulkSync: 1
# The maximum number of used peers when fetching newly forged blocks.
MaxConcurrencyDeadline: 2

In particular:

  • MaxConcurrencyDeadline should be set to 4 for relays
  • min between MaxConcurrencyDeadline and relays on the BP

Change to MaxConcurrencyDeadline lead to this:

BP (yellow) times are not much closer to te fastest relay

1 Like