Redundancy architecture what i have in plan -- please criticize it

os11k · 6 October 2021 09:23

Hi,

I’m exploring Cardano Staking Pools universe and I plan to build my own pool, seems I got one in tesnet already(probably you are aware of this, due high amount of messages from me in other threads ). One requirement for successful pool is redundancy, IMHO. As all we know eventually any equipment will fails, so we should be ready for such case. And I have in mind following redundancy/fail-over solution.

So I plan to have 2 relay nodes, what are a very straightforward. Those relay nodes will point to dns name of producer node, lets call this instance as “producer”. Meanwhile I will have one more producer, lets call it “backup-producer” running with same keys and totally same as “producer”. As far as I understand, this is not an issue, until relays are connected just to 1 producer in normal operation it is “producer”.

For redundancy I plan to configure DNS hostnames in AWS and AWS has option to test remote systems with TCP, so in case of “producer” will not reply to TCP connections from AWS, AWS Route53 will switch over DNS names, so now DNS record points towards “backup-producer” and now relays should be able to connect again and no restart or config change is needed on relays, “producer” or “backup-producer”. When TCP is back again working towards “producer”, we switch back DNS name.

What I don’t like is that AWS will do just TCP connection check, probably I would like to have cncli ping remote servers and in case of no response in 3 or 5 in a row, then we will do switchover. This probably will need one more server like zabbix what can do this and in case of such problem it can fire up script to change DNS on AWS side. But this probably is too much for beginning.

Additionally it is possible to get source ip addresses of AWS servers what will do health check, but it will be quite a lot firewall entries.

We need make sure that no flapping is happening too, so probably we should switch to “backup-producer” only after 5 minutes of time when “producer” is not availlable and back only when “producer” is available for at least 30 minutes or in case when “backup-producer” is down.

I did quite similar setup in my other project, but it was while ago, so I might mixing up something, but generally this should work in my opinion.

I personally like this approach because there is not much configuration needed, just sample config on AWS side and no restart needed of any node - relay, “producer” or “backup-producer”

What you guys think? Any comments or advice are highly welcomed.

os11k · 15 October 2021 20:05

I bit tested following set-up. And everything seems work well in terms of failover to backup producer. Only problem I have now that seems relays do not want to switch back to main “producer” after it went online.

Seems relays saves IP and then never make any additional dns request to update it. So if we have main producer down and then we switched back traffic to backup-producer, everything works ok. But when main producer went back online, relays are still connected to backup-producer and don’t even bother to reconnect to main producer, probably because connection is in place and they do not need to make any additional DNS request.

I even noticed that after more then a day leaving it as it is, I had one relay connected to main producer and second relay to backup-producer and this seems is quite bad.

So solution would be, to restart backup producer after main producer is online. Additionally we can close FW manually on backup-producer to allow relays to connect back to main producer and then to remove those FW rules. Probably best way would be some kind of automation script what will run on event when main producer is offline and when it came back online.

os11k · 10 May 2022 13:22

Currently I decided to go with Haproxy.

So I have 2 Haproxies and behind of them there are 2 BPs. Pull of relays will connect to those Haproxies and Haproxy then will send request to BPs. All request should go to main-BP and if main-BP is down, Haproxy should failover to backup-BP.

That setup seems to work fine, just sometimes it happens split brain, when 1st Haproxy connects to main-BP, but second to backup-BP. That issues must be addressed. I personally have monitoring in Grafana(Haproxy exports metrics to Prometheus, what is super useful) and if such issue happen I will receive alert. In ideal world would be nice to restart second Haproxy in case of such events, so connection will be established back to main-BP, without human intervention and probably my next task.

One more point to remember that connection to BPs are constant and if for example Haproxy sees that Main-BP is down, then it will open connection to Backup-BP and connection will stay open all the time and no switchover to Main-BP happens, until we restart Haproxy for example.

I got inspired by that git issue:

github.com/input-output-hk/cardano-node

[Validation Query] - In absence of native HA support for core nodes, any risks foreseen due to use of haproxy?

opened 07:43AM - 22 Sep 20 UTC

rdlrt

enhancement

**Internal/External** *External* **Background** Given that `cardano-node…` does not provide a native way to perform HA, we're thinking about testing and implementing an external solution based on haproxy, especially given that #1132 has been delayed indefinitely. (PS: Previous feature request #1273 for native support didnt receive any ack, so assuming that's not on priority list for next few months) Note that the intention is not to have more availability or reduce maintainance windows , but to have redundancy for core nodes. We're looking at adding some sample instructions to SPOs in [guild-operators docos](https://cardano-community.github.io/guild-operators) for allowing their relay nodes to point to a local haproxy IP:port bound service, which could map out to active/passive connection to multiple cores. Sample relevant haproxy frontend/backend would be: ``` frontend app bind 127.0.0.1:6000 default_backend cnode_core backend cnode_core balance source server c1 IP1:6000 check server c2 IP2:6000 check backup ``` Each of the relay node would connect locally to a ha proxy service and get redirected to active core node to fetch blocks. The only edge-case/disadvantage we could think of was when if there is a block created right before an outage on active core node, it could potentially result in a temporary fork branch that should be resolved automatically. But the advantage from HA far outweighs a short inconvenience from unintentionally created fork during edge case scenario. The blocks from passive core wouldn't be pulled unless there is really an outage **Question** Is this a viable approach? Do the developers foresee any issues with the approach (eg: rollbacks on backup node)?

7.4d4 · 11 May 2022 15:06

Everything is a trade-off:

How much stake is your pool likely to control?
How many blocks is your pool likely to produce per epoch?

If your pool will likely only produce a few blocks per epoch or less then maybe some restart time can be factored in. Cardano-node software is very stable and runs for many days without any noticeable memory leaks or crashes. I only restart my nodes to load new topology files and I can easily pick a window between blocks to do this. Furthermore, my internet connection has something like >99.9% uptime. You will likely lose more blocks to slot battles than internet downtime or server failures.

Do you think the extra redundancy you are designing and its real world effectiveness will be worth it?

os11k · 11 May 2022 15:46

It is definitely a worth, IMHO. Just recently my hosting company(OVH) had downtime for couple of hours. Without proper HA setup I probably would be super stressed, maybe even lost a block.

I have no problem with cardano software at all, but hw will fail at some point and even 99.9% is not good enough, at least for me. Keep in mind that smaller stake you have then less blocks you will get and those blocks will become more and more valuable. Imagine getting one block in couple of months and then exactly at that time internet disappears? If big pool will loose 1 block then it just meant that their return for epoch will be slightly less then expected. And imagine if your stake is 100k and you lost your block which you waited 10 epochs.

I have some stake in my pool, big thanks to Cardana Fondation for that.

7.4d4 · 11 May 2022 21:43

Everything is relative. The risk of a lost block due to a slot battle or a “propagation delay battle” is around 2-5%. Sometimes I think the fear of missing a block is higher than the actual statistics of how much downtime you get with your internet / hardware. I truly understand how valuable each block is for a small pool, but what percentage downtime do you actually get? What is your expected hardware failure rate? What is your expected internet failure rate?

Regarding internet failure:
You can use a backup network connection continuously. For example, you can even get a usb device like this: USB 4G LTE-Advanced Modem for GNU / Linux (TPE-USB4G2US) | ThinkPenguin.com

This can circumvent most of your internet failure risk with zero switching time loss. I have a relay connecting to my block producer over two different network links in case one goes down.

Depending on how you do things, if you use firewalling techniques to switch between backups you will still need some sort of recognition system to trigger the change. This “recognition” will likely have a delay of a minute or two. It won’t be as fast as a continuous backup network connection. Even the simple usb mobile broadband device from thinkpenguin might be better.

Regarding hardware failure:
At a pinch, you can always re-purpose a relay to be a block producer. This will only cause a couple of minutes re-start delay before it will be ready. This will require manual intervention from you, but how often will you need to do it?

os11k · 12 May 2022 06:31

I don’t think it is good idea to bring up here slot battles what can’t be really mitigated, I’m more concerned on something what I can fix, rather what I can’t.

In any case, I personally do have experienced issues with my internet provider and with my hosting before at same time when I was scheduled for a block, so in my experience you should be ready.

Haproxy has almost instant failover, it is tracking TCP connection all the time and when TCP is down it will switchover. It is definitely faster then starting relay as BP.

For me personally haproxy provides much simple, easier and cheaper solution then additional 4g modem or relay what can be used as BP, what might work for you, I have no problem with that.

Additionally I would like to add that for me is important to have full infrastructure redundancy in several different locations, imagine your house with BP and relays will catch fire or all your equipment is stolen, your 4g modem will not help with that, so you might to put that in your plan too.

Topic		Replies	Views
Topology between Relays and Blockproducers in a High Availability Scenario Operate a Stake Pool	23	1223	19 April 2023
Two servers needed now? Operate a Stake Pool	34	4846	19 February 2021
Block Producer - Failover Approach with a BP Standby Operate a Stake Pool	63	3074	24 February 2022
My pool set-up, would it work? Operate a Stake Pool	23	1964	15 September 2020
Redundant BP Nodes? Setup a Stake Pool	1	680	13 April 2021

Redundancy architecture what i have in plan -- please criticize it

Related topics