Db corruptions

Several times I have had my db corrupted on a node (usually after a server shuts down with the node running, but not always) I have seen other people post about this as well.

This results in a warning in the logs “WARN: A prior running Cardano node was not cleanly shutdown, socket file still exists. Cleaning up. site:forum.cardano.org” then the node shows “starting…” in gLiveVeiw in perpetuity.

the fix I have been using is to delete your db folder and 1. do the long and painful resyncronization process or 2. copy a db file from another node.

I have seen it occur on both, coin cashew setups and cntools setups.

I am just wondering if anyone has any insights on what is happening when this occurs. why is it not shutting down cleanly? most of my nodes have no problem shutting down and starting back up and retaining the db/synchronization but some seem to be not capable of shutting down properly.

Any ideas or inputs?

I have this problem now and again myself. I notice it when I force one of my nodes to shut down while Cardano-node services are in the middle of transactions.

I usually make sure to run the proper commands to stop the processes before I shut down the node or do any upgrading. Usually, software doesn’t like being updated when it’s accessing files. It can cause file corruptions.

I remember having TimeoutStopSec in my cardano related systemd configuration set to a low amount of seconds. That made the OS kill the process everytime it was restarted as the process could not finish shutdown in that timeframe. End result was a long startup as it needed to fix the db.

Check if that applies to your config too. Maybe you can explicitly set that value to something like 60 seconds. The default should be 90 seconds but I’m not sure if that gets overridden somewhere.

1 Like