Strange gLiveView behavior after 1.26.1

Hello -

So far, I have updated six servers to 1.26.1 and am noting something strange with all six after the update. I like to leave .gLiveView open on the servers and now when I go back and check on them after a period of time I see that .gLiveView is closed with the following error:

COULD NOT CONNECT TO A RUNNING INSTANCE, 3 FAILED ATTEMPTS IN A ROW!

When I re-open .gLiveView the “Uptime” does not show that the node went down and there are no strange errors in the journal log.

Anyone experiencing the same or have any ideas why this is happening?

What messages?
sudo systemctl status cnode

Are u using cntools?

No messages at all. Just gliveview stops with the message error above.

Yes using cntools but have experienced this on all nodes. Not just BP or relays. Both.

Ok, sudo systemctl status cnode

cnode.service - Cardano Node
Loaded: loaded (/etc/systemd/system/cnode.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2021-04-07 23:53:34 UTC; 18h ago
Main PID: 3295573 (cnode.sh)
Tasks: 16 (limit: 4682)
Memory: 3.5G
CGroup: /system.slice/cnode.service
├─3295573 /bin/bash /opt/cardano/cnode/scripts/cnode.sh
└─3295654 cardano-node run --topology /opt/cardano/cnode/files/topology.json --config /opt/cardano/cnode/files/config.json --database-path /opt/cardano/cnode/db →

Apr 07 23:53:34 *****-relay-2 systemd[1]: Started Cardano Node.
Apr 07 23:53:35 *****-relay-2 cnode[3295573]: Failed to query protocol-parameters from node, not yet fully started?
Apr 07 23:53:35 *****-relay-2 cnode[3295573]: WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up.
Apr 07 23:53:36 *****-relay-2 cnode[3295654]: Listening on http://0.0.0.0:12798

And what is the output from glive? Saying could not to connect …?

Yes. It will run fine and stay open for hours, and then all of a sudden close to command prompt with this message present:

COULD NOT CONNECT TO A RUNNING INSTANCE, 3 FAILED ATTEMPTS IN A ROW!

But when I reload gliveview, all is fine and there was no reset of “Uptime” counter or anything…

cardano-node run --topology /opt/cardano/cnode/files/topology.json --config /opt/cardano/cnode/files/config.json --database-path /opt/cardano/cnode/db →

on this command , line, do you see the real long path for the Socket?

try to run the node manually by commands line, and let us know

Aaa, understand… I don’t know what to say if it’s working for hours and then stop and if u run again will run fine…

@tsipou I also believe it to be related to my socket but I do not know why… How do I see the full run command that is executed when I use sudo systemctl start cnode.service?

right arrow from the keyboard

hahahaha, duh!! Thank you… here’s the remainder of the parameters:

–config /opt/cardano/cnode/files/config.json --database-path /opt/cardano/cnode/db --socket-path /opt/cardano/cnode/sockets/node0.socket --port 3001 --host-addr 0.0.0.0

I think u are fine…

I have seen an increase of this on my 1.26.1 upgraded nodes as well. More so than with any past node versions. I wonder if the node is disconnecting from the network ever so slightly but then reconnecting almost immediately. I did lose one block today at almost the exact same time that the “could not connect message” occurred, I was scheduled to mint a block but the block was not minted. Strange. Maybe others will notice this as well and then we’ll see if it is a common problem.

Yes, this is exactly my concern… Is there a way to raise this issue with developers?

Quick update: I updated TraceMemPool to “false” on one of my servers a few hours ago where this was happening frequently before and it has not happened since. I will let it run overnight and report back…

Confirmed. I’ve observed that a couple of times.

guys,
only for your info

when you use the new Topologyupdater script, please go and edit the CUSTOM_PEERS= and use coma ( , ) instead of : between IP and Port

read the description of the Parameter.

2 Likes

Another update - that node where I updated the TraceMemPool config has run all night with zero interruptions. Looks like this might be the solution - I’ll be making the update on all my nodes this morning.

@tsipou I have not made this change but my custom peers appear to be populating a expected?

1 Like