Connection issues between relay and producer

seveho · 16 March 2021 09:36

Hi, I am running a stake pool docker setup on the mainnet with each node version 1.25.1. When I start the producer and relay nodes I get some weird errors for both of them. This is an what the producer logs:

[0bd3ff87:cardano.node.IpSubscription:Info:386] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Trying to connect to <PRIVATE_RELAY_IP>:3001
[0bd3ff87:cardano.node.IpSubscription:Info:654] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Connection Attempt Start, destination <PRIVATE_RELAY_IP>:3001
[0bd3ff87:cardano.node.IpSubscription:Notice:386] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Waiting 0.025s before attempting a new connection
[0bd3ff87:cardano.node.IpSubscription:Notice:654] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Connection Attempt End, destination <PRIVATE_RELAY_IP>:3001 outcome: ConnectSuccessLast
[0bd3ff87:cardano.node.ErrorPolicy:Warning:382] IP <PRIVATE_RELAY_IP>:35639 ErrorPolicySuspendPeer (Just (ApplicationExceptionTrace (MuxError MuxBearerClosed "<socket: 26> closed when reading data, waiting on next header True"))) 20s 20s
[0bd3ff87:cardano.node.IpSubscription:Error:654] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Application Exception: <PRIVATE_RELAY_IP>:3001 ExceededTimeLimit (ChainSync (Header (HardForkBlock (': * ByronBlock (': * (ShelleyBlock (ShelleyEra StandardCrypto)) (': * (ShelleyBlock (ShelleyMAEra 'Allegra StandardCrypto)) (': * (ShelleyBlock (ShelleyMAEra 'Mary StandardCrypto)) ('[] *))))))) (Tip HardForkBlock (': * ByronBlock (': * (ShelleyBlock (ShelleyEra StandardCrypto)) (': * (ShelleyBlock (ShelleyMAEra 'Allegra StandardCrypto)) (': * (ShelleyBlock (ShelleyMAEra 'Mary StandardCrypto)) ('[] *))))))) (ServerAgency TokNext TokMustReply)
0bd3ff87:cardano.node.IpSubscription:Info:654] IPs: 0.0.0.0:0 [<PRIVATE_RELAY_IP>:3001] Closed socket to <PRIVATE_RELAY_IP>:3001

Using telnet <PRIVATE_RELAY_IP> 3001 inside the container works so I think this is no general connection problem.

My setup looks like this:
Architecture (1)

I also double-checked the topology files. The private IP’s of the docker host containers are the same I can successfully connect to from inside the node container. Everything else seems fine in the logs for both nodes. Both are currently synced to the same blockNo within their separate db’s:

{
    "blockNo": 5466262,
    "headerHash": "88e4bbb7d7244a2c89c60ed2ce10dc196b7441d21858b7b1d3ef41675aa14390",
    "slotNo": 24287155
}

Thank you for your help.

Alexd1985 · 16 March 2021 09:40

Hello,

What host- address do u use when u start the nodes?

Cheers,

seveho · 16 March 2021 09:51

Hi, 0.0.0.0 for both

Alexd1985 · 16 March 2021 10:10

ok, and the port 3001 is open for Producer and relay right?

can u also test from relay?

telnet Producer_IP 3001 ?
the Producer accept connection from Relay?

Cheers,

seveho · 16 March 2021 10:21

Yes, ports are open and telnet works from both nodes

Alexd1985 · 16 March 2021 10:29

ok, the nodes are synced?

Alexd1985 · 16 March 2021 10:31

can u also add in your topology, and try to start the nodes again?

{
“addr”: “relays-new.cardano-mainnet.iohk.io”,
“port”: 3001,
“valency”: 2
}

seveho · 16 March 2021 10:31

How do I make sure they are synced? Do you mean checking the output of:

cardano-cli query tip --mainnet

Also just an additional question. How do I have do I have to evaluate this error. Is this blocking or is the node capable of doing its work despite these messages? If yes, how do I make sure the producer and relay are working properly? Is there any specific message in the logs indicating this?

seveho · 16 March 2021 10:32

Only for the relay I suppose?

Alexd1985 · 16 March 2021 10:33

try first for your relay. see if it’s starting…

try to configure

it will show u the status of ur nodes…

seveho · 16 March 2021 10:53

Awesome adding relays-new.cardano-mainnet.iohk.io to the relay topology file works. I got no more errors for both nodes. @Alexd1985 Can you maybe explain why the iohk relay is necessary?

tomdx · 16 March 2021 10:55

Using simpleLiveView with on a host to view stuff in a Docker container is going to be tricky, especially when it comes to shared access to /proc

Instead, I’d recommend to use an image that has topology updater and gLiveView backed in. Some other issues with the iokh upstream image are fixed too.

You can spin up a relay node like this …

$ docker run --detach \
    --name=relay \
    -p 3001:3001 \
    -e CARDANO_UPDATE_TOPOLOGY=true \
    -v node-data:/opt/cardano/data \
    nessusio/cardano-node run

Connecting the block producer to the relay is an after thought. First, make sure the relay is running, reachable and fully synced.

When you container is running, do …

$ docker exec -it relay gLiveView

You will not have to compile/install anything on your host. In those docs, you’ll find scripts for Docker Compose and Kubernetes as well, if need that as well.

seveho · 16 March 2021 10:57

@tomdx thanks for the hint. My prometheus/docker setup works properly again since updating the topology so I think I won’t need LiveView. Also the official docs state that it is deprecated Monitoring a Node: LiveView Mode — cardano-node Documentation 1.0.0 documentation

tomdx · 16 March 2021 11:03

Sure, it is still useful to have light weight monitoing facility backed into the image to quickliy check if stuff is running smoothly.

relay-glview

gLiveView does nothing (i.e. consumes zero resources) if you don’t look at it. That’s not true for Prometheus.

Prometheus+Grphana can always be added later to the mix as additional docker containers. How are you doing your topology updates? If this is an external process, that your container relies on, it should make you wonder why this thing is not self-sufficient.

seveho · 16 March 2021 11:08

Sure makes sense. I will definitely have a look.

What do you mean by self-sufficient? I provide the topology.json to the docker container and consume it in the container like this:

cardano-node run \
--topology ~/config/mainnet-topology.json \
--database-path ~/data/db \
--socket-path ${CARDANO_NODE_SOCKET_PATH} \
--host-addr ${NODE_IP} \
--port ${NODE_PORT} \
--config ~/config/${MAIN_CONFIG}
```

tomdx · 16 March 2021 11:12

Topology updater is a process that has to run once per hour, otherwise your node will not find any friends. What you show above, is the initial topology configuration, which needs to get updated regularly. This will change with Alonzo, later this year when the p2p module becomes part of the node.

Alexd1985 · 16 March 2021 11:26

as I know the nodes, Producer and relays, will not connect each other till they are not synced… that’s why u need to wait till the nodes will be 100% synced and after u can connect them each other…
adding IOHK relays I believe the nodes get the infos from them…

seveho · 16 March 2021 11:28

Ok, totally new for me. Sorry still a rookie. So as I understand from here Guild Operators Documentation, I need to update the topology regularly so that my node is not only dependant on the iohk node ?

Alexd1985 · 16 March 2021 11:29

you must run the topology updater script on your relay… to announce your relay to the mainnet network…
this script should run 1/hour

On the Producer you will use static connections with your relays.

seveho · 16 March 2021 11:30

Seems strange for me. The nodes where 100% snyced (compared with https://explorer.cardano.org). Only adding relays-new.cardano-mainnet.iohk.io resolved the error. But whats the impact of that relay to my nodes?

Topic		Replies	Views
Relay Node unable to connect to Producer node Operate a Stake Pool	8	1373	25 February 2021
Issue start block producer/relay node Setup a Stake Pool	21	1400	7 January 2022
Block Producing node having issue connecting to relay node Setup a Stake Pool	3	1114	26 May 2021
Test-Setup for Relay and Producer with Docker Setup a Stake Pool docker	10	1183	25 February 2021
Relay node unable to connect to block producing node Operate a Stake Pool	2	1534	22 October 2020

Connection issues between relay and producer

Related topics