Recently I re-built Cardano-node from source on my block producers and the resulting binary was broken and resulted in missed slots and lost rewards. It took me days to find and fix this.
I’m trying to understand what happened and how it is possible that a seemingly successful build could have gone so horribly wrong.
The official documentation offers two ways to build: using cabal and nix. I chose the nix variant. The resulting binary looked okay at first sight, but after deploying it to my block producers, I noticed that blocks were produced but did not show up on the main chain. Instead, each minted block ended up on a fork and after a few blocks the node switched back to the main chain without the minted block.
Of course I tried everything else for a few days before suspecting the build. Then I built using the other method with cabal, as described here. Lo and behold, between the instructions to install cabal and how to compile ghc, there is this little remark:
Note: We no longer provide supported
stack
ornix
installer packages. We recommend usingcabal
instead.
Further down on the same page there is another note (emphasis mine), which reminds me that we still rely on a specific, almost three years old libsodium
fork.
Note, that for a development build you can avoid installing the custom
libsodium
library and add the following lines to the local project file:echo "package cardano-crypto-praos" >> cabal.project.local echo " flags: -external-libsodium-vrf" >> cabal.project.local
At this point, my confusion is nearly complete. I have questions:
- Is the
nix
build a supported way to buildcardano-node
or is it not? Edit: The answer is no. The nix build is broken. See below in this thread. - How is it possible that a broken
cardano-node
forks the chain on every minted block? Why is this happening? - Are we supposed to add
cabal.project.local
for a production build or not? What doesflags: -external-libsodium-vrf
actually do? - Why are we still relying on an ancient, unsupported
libsodium
fork, how long do we expect this situation to persist and what are the risks? In another thread, @Elysium asked about this months ago and there doesn’t seem to be an adequate answer yet. - What did I do wrong when building the node?
- Did other pool operators run into the same problem?
- Are other pool operators running into the same problem and don’t yet know about it, because their pools didn’t mint a block yet?