Connection issues between relay and producer

Hi, I am running a stake pool docker setup on the mainnet with each node version 1.25.1. When I start the producer and relay nodes I get some weird errors for both of them. This is an what the producer logs:

[0bd3ff87:cardano.node.IpSubscription:Info:386] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Trying to connect to <PRIVATE_RELAY_IP>:3001
[0bd3ff87:cardano.node.IpSubscription:Info:654] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Connection Attempt Start, destination <PRIVATE_RELAY_IP>:3001
[0bd3ff87:cardano.node.IpSubscription:Notice:386] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Waiting 0.025s before attempting a new connection
[0bd3ff87:cardano.node.IpSubscription:Notice:654] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Connection Attempt End, destination <PRIVATE_RELAY_IP>:3001 outcome: ConnectSuccessLast
[0bd3ff87:cardano.node.ErrorPolicy:Warning:382] IP <PRIVATE_RELAY_IP>:35639 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError MuxBearerClosed "<socket: 26> closed when reading data, waiting on next header True"))) 20s 20s
[0bd3ff87:cardano.node.IpSubscription:Error:654] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Application Exception: <PRIVATE_RELAY_IP>:3001 ExceededTimeLimit (ChainSync (Header (HardForkBlock (': * ByronBlock (': * (ShelleyBlock (ShelleyEra StandardCrypto)) (': * (ShelleyBlock (ShelleyMAEra 'Allegra StandardCrypto)) (': * (ShelleyBlock (ShelleyMAEra 'Mary StandardCrypto)) ('[] *))))))) (Tip HardForkBlock (': * ByronBlock (': * (ShelleyBlock (ShelleyEra StandardCrypto)) (': * (ShelleyBlock (ShelleyMAEra 'Allegra StandardCrypto)) (': * (ShelleyBlock (ShelleyMAEra 'Mary StandardCrypto)) ('[] *))))))) (ServerAgency TokNext TokMustReply)
0bd3ff87:cardano.node.IpSubscription:Info:654] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Closed socket to <PRIVATE_RELAY_IP>:3001

Using telnet <PRIVATE_RELAY_IP> 3001 inside the container works so I think this is no general connection problem.

My setup looks like this:
Architecture (1)

I also double-checked the topology files. The private IP’s of the docker host containers are the same I can successfully connect to from inside the node container. Everything else seems fine in the logs for both nodes. Both are currently synced to the same blockNo within their separate db’s:

{
    "blockNo": 5466262,
    "headerHash": "88e4bbb7d7244a2c89c60ed2ce10dc196b7441d21858b7b1d3ef41675aa14390",
    "slotNo": 24287155
}

Thank you for your help.

Hello,

What host- address do u use when u start the nodes?

Cheers,

Hi, 0.0.0.0 for both

ok, and the port 3001 is open for Producer and relay right?

can u also test from relay?

telnet Producer_IP 3001 ?
the Producer accept connection from Relay?

Cheers,

Yes, ports are open and telnet works from both nodes

ok, the nodes are synced?

can u also add in your topology, and try to start the nodes again?

{
“addr”: “relays-new.cardano-mainnet.iohk.io”,
“port”: 3001,
“valency”: 2
}

How do I make sure they are synced? Do you mean checking the output of:

cardano-cli query tip --mainnet

Also just an additional question. How do I have do I have to evaluate this error. Is this blocking or is the node capable of doing its work despite these messages? If yes, how do I make sure the producer and relay are working properly? Is there any specific message in the logs indicating this?

Only for the relay I suppose?

try first for your relay. see if it’s starting…

try to configure

it will show u the status of ur nodes…

Awesome adding relays-new.cardano-mainnet.iohk.io to the relay topology file works. I got no more errors for both nodes. @Alexd1985 Can you maybe explain why the iohk relay is necessary?

Using simpleLiveView with on a host to view stuff in a Docker container is going to be tricky, especially when it comes to shared access to /proc

Instead, I’d recommend to use an image that has topology updater and gLiveView backed in. Some other issues with the iokh upstream image are fixed too.

You can spin up a relay node like this …

$ docker run --detach \
    --name=relay \
    -p 3001:3001 \
    -e CARDANO_UPDATE_TOPOLOGY=true \
    -v node-data:/opt/cardano/data \
    nessusio/cardano-node run    

Connecting the block producer to the relay is an after thought. First, make sure the relay is running, reachable and fully synced.

When you container is running, do …

$ docker exec -it relay gLiveView

You will not have to compile/install anything on your host. In those docs, you’ll find scripts for Docker Compose and Kubernetes as well, if need that as well.

1 Like

@tomdx thanks for the hint. My prometheus/docker setup works properly again since updating the topology so I think I won’t need LiveView. Also the official docs state that it is deprecated Monitoring a Node: LiveView Mode — cardano-node Documentation 1.0.0 documentation

Sure, it is still useful to have light weight monitoing facility backed into the image to quickliy check if stuff is running smoothly.

relay-glview

gLiveView does nothing (i.e. consumes zero resources) if you don’t look at it. That’s not true for Prometheus.

Prometheus+Grphana can always be added later to the mix as additional docker containers. How are you doing your topology updates? If this is an external process, that your container relies on, it should make you wonder why this thing is not self-sufficient.

Sure makes sense. I will definitely have a look.

What do you mean by self-sufficient? I provide the topology.json to the docker container and consume it in the container like this:

cardano-node run \
--topology ~/config/mainnet-topology.json \
--database-path ~/data/db \
--socket-path ${CARDANO_NODE_SOCKET_PATH} \
--host-addr ${NODE_IP} \
--port ${NODE_PORT} \
--config ~/config/${MAIN_CONFIG}
```

Topology updater is a process that has to run once per hour, otherwise your node will not find any friends. What you show above, is the initial topology configuration, which needs to get updated regularly. This will change with Alonzo, later this year when the p2p module becomes part of the node.

as I know the nodes, Producer and relays, will not connect each other till they are not synced… that’s why u need to wait till the nodes will be 100% synced and after u can connect them each other…
adding IOHK relays I believe the nodes get the infos from them…

Ok, totally new for me. Sorry still a rookie. So as I understand from here Guild Operators Documentation, I need to update the topology regularly so that my node is not only dependant on the iohk node ?

you must run the topology updater script on your relay… to announce your relay to the mainnet network…
this script should run 1/hour

On the Producer you will use static connections with your relays.

Seems strange for me. The nodes where 100% snyced (compared with https://explorer.cardano.org). Only adding relays-new.cardano-mainnet.iohk.io resolved the error. But whats the impact of that relay to my nodes?