How did I screw up building cardano-node?

Recently I re-built Cardano-node from source on my block producers and the resulting binary was broken and resulted in missed slots and lost rewards. It took me days to find and fix this.

I’m trying to understand what happened and how it is possible that a seemingly successful build could have gone so horribly wrong.

The official documentation offers two ways to build: using cabal and nix. I chose the nix variant. The resulting binary looked okay at first sight, but after deploying it to my block producers, I noticed that blocks were produced but did not show up on the main chain. Instead, each minted block ended up on a fork and after a few blocks the node switched back to the main chain without the minted block.

Of course I tried everything else for a few days before suspecting the build. Then I built using the other method with cabal, as described here. Lo and behold, between the instructions to install cabal and how to compile ghc, there is this little remark:

Note: We no longer provide supported stack or nix installer packages. We recommend using cabal instead.

Further down on the same page there is another note (emphasis mine), which reminds me that we still rely on a specific, almost three years old libsodium fork.

Note, that for a development build you can avoid installing the custom libsodium library and add the following lines to the local project file:

echo "package cardano-crypto-praos" >>  cabal.project.local
echo "  flags: -external-libsodium-vrf" >>  cabal.project.local

At this point, my confusion is nearly complete. I have questions:

  1. Is the nix build a supported way to build cardano-node or is it not? Edit: The answer is no. The nix build is broken. See below in this thread.
  2. How is it possible that a broken cardano-node forks the chain on every minted block? Why is this happening?
  3. Are we supposed to add cabal.project.local for a production build or not? What does flags: -external-libsodium-vrf actually do?
  4. Why are we still relying on an ancient, unsupported libsodium fork, how long do we expect this situation to persist and what are the risks? In another thread, @Elysium asked about this months ago and there doesn’t seem to be an adequate answer yet.
  5. What did I do wrong when building the node?
  6. Did other pool operators run into the same problem?
  7. Are other pool operators running into the same problem and don’t yet know about it, because their pools didn’t mint a block yet?

What git revision was your built based on?

git log --oneline

8fe46140a (HEAD, tag: 1.27.0)

The same was used for the cabal build, which does not show this problem.

Another observation:

$ nix-build -A scripts.mainnet.node -o mainnet-node-local

nix-build: /nix/store/a6rnjp15qgp8a699dlffqj94hzy1nldg-glibc-2.32/lib/libc.so.6: version `GLIBC_2.33' not found (required by /usr/local/lib/libsodium.so.23)

$ unset LD_LIBRARY_PATH
<build starts>

In other words the nix build of cardano-node 1.27.0 has dependencies outside nix. I’m still not done completely wrapping my head around nix, but so far my understanding is that it’s primary raison d’être is precisely to avoid this type of dependency hell.

So it looks like I just answered my own question #1 above.

Hello.
I’m just someone delegating my little wallet on your WMOPS pool since 8th may 2021.

I’m tracking results every epoch and happy with results, hope you still feel happy about your experience.
I think one epoch will not hurt long term results at all. :+1:

I was just curious (not worried) about that epoch, and searched for you, and i found this post :smiley:

Nice to see that you are motivated and methodic with the maintenance of the pool.

Greetings, and sorry for my english :wink:

Thank you for the nice words. In general I believe in taking responsibility for things I screw up. Unfortunately, I still have no indication so far as to what exactly went wrong.