Relay restarts every few hours, with some log errors I don't understand, e.g. IPsubscriptionError"

hi,

I notice that my relays (but not the BP) restart quite often. The logs show things like this:

    Oct 14 16:06:17 ubuntu cnode[1637733]:
        cardano.node.IpSubscription:Error:275693] [2021-10-14 14:06:17.25 UTC]
        IPs: 0.0.0.0:0 [... long list of IPs] Application Exception: 192.46.XXX.XX:6001 SubscriberError {seType = SubscriberWorkerCancelled, seMessage = "SubscriptionWorker exiting", seStack = []}
    Oct 14 16:06:17 ubuntu cnode[1637733]:
        cardano.node.IpSubscription:Error:280705] [... long list of IPs] Application Exception: 18.119.XX.XX:9201 SubscriberError {seType = SubscriberWorkerCancelled, seMessage = "SubscriptionWorker exiting", seStack = []}
    Oct 14 16:06:17 ubuntu cnode[1637733]:
        cardano.node.DiffusionInitializationTracer:Info:5] [2021-10-14 14:06:17.25 UTC] DiffusionErrored user interrupt
    Oct 14 16:06:18 ubuntu systemd[1]: Stopped Cardano Node.

Any ideas? Tnx!

Try journalctl -e -f -u cardano-node

do u see any killing message?

Hi @hamish I’d post a few additional questions:

  • Was this pool running ok before?
  • What version are you running?
  • Can you confirm the topology file is correct (or maybe paste it here while masking the BP address) ?
  • Can you confirm the required ports are accessible and not blocked?
  • Do you have the latest config files: Cardano Configurations

hi Alex,
no, there’s nothing about a kill signal…

hi!

  • the pool appears fine in all respects that I can figure, except that the relay systemd services restart every few hours
  • it is on 1.30.1 now
  • topology: I’m using the guild updater on the relays, which seems to work fine
  • config: I updated for 1.30.1

best, h

Please also check the syslog for an oom kill, or generally for errors.

sudo tail -n 1000 /var/log/syslog | grep -i kill | more

or just inspect the last ie 300 lines:
sudo tail -n 300 /var/log/syslog

Also check RAM consumption with: cat /proc/meminfo

yes, no memory kills

I’ve got 16GB RAM and cnode usage doesn’t seem to run above 10G

still puzzled :slight_smile:

are you starting the node with 0.0.0.0 host-addr right?

yes, via cnode.sh

I can’t find anything related… no other messages if u type journalctl -e -f -u cnode ?

There are a huge amount of log messages from the node (see examples above).

The last ones before the restart included “IpSubscription:Error”… does that indicate something that would kill the node?

This was fixed by adding a 1GB swap file :slight_smile:

I’m unsure why, as the machine as 16GB RAM and reports that there is plently of memory spare, but it seems that something somewhere is expecting swap to exist.

(Thanks to Stefan of CO2 pool for the suggestion!)

THanks all, this forum is extremely useful! Have a good one.