I am contemplating setting up a block producer server at home.
Is it possible to clone the configs and quickly swap from the home to the cloud if it breaks?
If I created a secondary block producer at home, what conflicts would it cause with the current functioning one if they’re both registered with the same payment/stake keys?
yes it is. I would run the second pool not as a bp but as a passive node. If your bp is not reachable you can restart one of you passive nodes as a bp.
Of course all the keys and the certificates have to be at the place of your passive node. your relays should already include your passive node in the topology files. Then the things get easier.
If you need more guidance I could explain in more detail the idea.
A passive node is determined by its topology file and it is started the same way as a relay node.
A topology file suitable with your idea is: #passive node topology { "Producers": [ { "addr": "relays-new.cardano-mainnet.iohk.io", "port": 3001, "valency": 2 } ] }
Now you may run a script which tests if your bp is online. Once your bp dies you
simply relaunch your passive node with the start script for a block producing node.
your valency values look correct.
{
“Producers”: [
{
“addr”: “x.x.x.x”, Relay 1
“port”: 6000,
“valency”: 1
},
{
“addr”: “x.x.x.x”, Relay 2
“port”: 6000,
“valency”: 1
}
]
}
would fit to your passive node.
Your topology file is perfect. Now you have to pay attention to have two different starting scripts one for the bp and one for the passive node (which is the same as for the relay).
Similarly to what was explained already i also just run it as a relay but with the required config to run it as a Validator available. Still secured like a validator (not exposed to public, no typo updater)
Could you give some more details about how this script works. I have a monitoring/alerting running currently which verifies if the TIP is current. If it gets older then 5 minutes i trigger an Alert. This could be used as a trigger for the secondary node to come up, but there is a risk that the original producer was just temporarely blocked (e.g. not updated by the other relays in the typology). This sometimes happened in the last weeks mostly at ~10:40-10:50 CET.
Concrete Question:
How do you check if the original BP died?
If it died are you making sure that it is also stopped to make sure it not comes up again and you have 2 BPs running?
Is there a reference script around?
My current setup just sends me the alert. Switching is currently manual but I’d love to automate that as well.
Short explanation of the script. It is executed through crontab every minute.
It send OK pings to healthchecks.io. If the TIP diff is too high it does not send the ping. Healtchecks.io will alert if no valid ping comes in for 5 Minutes.
This way I will recognize that something is wrong in any case (also when the machine is not running/crashed/not able to execute the check) without exposing anything to the outside world (like it would be the case if using a cloud monitoring agent)
Remarks:
The script is kind of hardcoded currently, so it will require customization for you
Also if Cardano Config / Parameters change the calculation may be invalid because I’m just substracting the constant 1591566291 from the current Time. So it could be improved to calculate this static value from the Cardano Config Parameters.
Please customize the following parts of the script:
Change USERNAME to your user
Change the “all good sending ping” area to your appropriate handler or define a non success area to trigger somethign in this case.
Script (pingTipCheck.sh):
#!/usr/bin/env bash
# shellcheck disable=SC2034,SC2086,SC2230,SC2009,SC2206,SC2062,SC2059
export CARDANO_NODE_SOCKET_PATH=/opt/cardano/cnode/sockets/node0.socket
customCurrentSlotNoString=$(/home/USERNAME/.cabal/bin/cardano-cli shelley query tip --mainnet | grep -Po '\"slotNo\": \K[0-9]+')
customCurrentSlotNo=$(expr $customCurrentSlotNoString + 0)
customRefSlotNo=$(expr $(printf '%(%s)T\n' -1) - 1591566291)
customDiff=$(expr $customRefSlotNo - $customCurrentSlotNo)
if [[ $customDiff -le 50 ]]
then
echo "all good sending ping"
curl -m 10 --retry 5 https://hc-ping.com/YOURPINGENDPOINT
exit
fi
I think that if you decide to go to the data center anyway, which makes sense for the bp. The redundancy to run a second bp is in principle an overkill. I would suggest to monitor the bp on the data center as suggested by zwirny that’s it.
Yes it is. It is just a call from the server to healthchecks to let healthchecks know everything is still good. This approach is often handled for monitoring of completely internal serves which are not allowed to accept any incoming connects from outside networks.
No ICMP involved. Just a URL request to the HTTP Url