Why do we need the topology updater?

For memory reasons, I restart my BP about every 24 hours unless no blocks scheduled. Relays about every 48 hours unless no blocks scheduled. Maybe unnecessary, but wouldn’t take the chance of missing a block because of node becoming too unresponsive.

@brouwerQ
Of course I looked at the explorer :slight_smile: Everything works as expected. Blocks are minted and adopted. Rewards are paid. No issues with someone finding me or me finding peers. So this part is sorted …

On a separate note, regarding bi-directional vs uni-directional.
I don’t have a reason to disbelieve you but, at the same time, I don’t see a reason why on Earth it would be designed like that in the first place. Just doesn’t make any sense to me…

Can you point me to any link that would state that it is currently uni-directional and p2p will make it bi-directional?

I was reading quite a few docs about p2p changes yesterday. I could not find any mention of this. It is all about self-discovery. There will be several modules involved with p2p, but in short, it will build the list of “hot”, “warm” and “cold” relays reading them off the blockchain. The algorithm will promote one group to another depending on the algorithmic score (proximity, dead/alive, performance etc.) All the above will simply allow us to get rid of the static topology file. The relays will self-discover each other making connections only to the “best” (as per the score) relays. All that is well understood and clear. Nothing about uni-direction vs bi-direction.

You can find how it will work on https://docs.cardano.org/explore-cardano/cardano-architecture/overview ore for more details in https://hydra.iohk.io/build/6955704/download/2/network-spec.pdf. I haven’t directly got a link for the current way it works, I guess this is less or not documented…

Just reading up on this and appreciate your comments and info, very helpful! Quick question, about this point in this discussion it seems the convo sort of took a turn after “don’t use it then” and the replies went on, but from my understanding in reading that far, you were not saying “it doesn’t matter” which it seems the op interpreted…so just for clarity, from what I’ve read everywhere, it does matter and makes a difference for SPOs to run the script at this time (pre-p2p), is that a correct general understanding?

Second quick question: I do a service restart after I do a pull of the topology, regardless of any 24 hr restart, and often if I see in my LiveView that the last pull ended up with some very slow pinging or non-pinging nodes…and doing this manual pull/restart of srvc has typically wound up giving me a “better” list of nodes, with better pings etc. Is there a downside to, say, scripting this to occur more frequently like every 3 hours (not to overlap the hourly updater cron job)?

Hey, let me comment, but this is my personal opinions. Always do your own research cos everyone sets up their infra in a different way. There is a reason why it states in IOG docs - DevOps/engineering knowledge is required to setup and run the node.

Point 1: You don’t have to run the script. It is just a process that someone created. You don’t have to follow it. Do it your own way if you want and can. Your goal is to have enough established connections with “live” peers. How you do it is up to you.

Point 2: I wouldn’t run the schedule (See point 4). Just setup your topology generation when the node restarts. If your node is not containerised, run your topology generation from your systemd unit. Make a small 2 lines bash script to generate the topology and set it to execute with “ExecStartPre=”. If your node is containerised (as it should be IMHO), then it is even easier, just generate your topology in your “entrypoint”. Every time your node restarts, you’ll have a “fresh” topology file. Your little generation script can look something like that:

TLMT=25
GTOP=“curl -s https://a.adapools.org/topology?limit=$TLMT
cat $HOME/tplg/custom-topology.json | jq “.Producers += $($GTOP | jq .Producers)” > $HOME/conf/topology.json

Point 3: Pings/latency are important. But if we all start “passing” the blocks just to our local neighbours, then it defeats the point of decentralisation. If I only connect to peers in Sydney (because it is close to me), and London only connects to London, then who is gonna propagate it from London to Sydney? Some one else? Not you? Not cool :slight_smile:
Just get enough random peers from everywhere. Don’t region filter it. I inject 25 random peers per relay. Works for me.

Point 4: Try not to restart your node unless there is a bug, you need to patch/upgrade or there is a technical issue. It should run 24/7 365. Just monitor it (setup email/sms alerts etc.) and restart only if you encounter a problem.

Point 5: If APIs you pull your relays from give you lots of “dead” relays, change it. Get it elsewhere. Curl it from a different API.

Just my 50 cents…

it does matter and makes a difference for SPOs to run the script at this time (pre-p2p), is that a correct general understanding?

Should be, otherwise why the script was developed?

Is there a downside to, say, scripting this to occur more frequently like every 3 hours (not to overlap the hourly updater cron job)?

U will need to restart also the server every 3 hours … not a good practice… u have enough peers to “survive” 24 hours

@Alexd1985 Great, thank you so much for the clarification on those two points.

Is there a roadmap for the P2P functionality that will obsolete the topology-updater? I assume this is being tested and developed independently from the upcoming HFC event in September?

P2P is independent from HFC.
There is no date, at least I couldn’t find it. All I could find was this quote: “The full P2P deployment will happen later in 2021”. Everything is put on hold for the HFC.

P2P is obsoleting the static topology - not the updater.
With P2P, the nodes will self discover each other

Got it, thanks!

It’s the other way around. The topology updater will be obsolete after p2p. A static topology will remain so you can set your own relays and other trusted relays you want to connect to; it’ll work side by side with p2p. And also the BPs will use a static topology.

1 Like

have you got any source for this information? i need to know for sure if it’s true or not, and to know how often the process happens, if so.

It is inexcusable that there isn’t a way to signal the node to reread the topology file. Even if this horribly centralized p2p discovery service is “going away”, it would still be good to be able to tell your node to reread the KES keys if they’ve been rotated w/o needing to restart it. On top of that, all of the metrics get reset on a restart, making detection that the validator “should” be validating impossible until the forged blocks counter finally goes non zero.

KES rotation happens like 4 teams a year, not a big problem to restart your BP 4 times a year I think…

Its approximately once a month at the maximum allowed KES periods, currently. It’s inexcusable considering every other kind of service that reads certificates that have to be rotated (e.g. web servers) have had this functionality from the beginning.

This is a non starter since a new node won’t get a valid topology file until it has been pushing for some time, which requires some sort of polling schedule, which then has to be tweaked to stop polling once it has a valid list. More work/hacks for something that should have been in Cardano’s design from the start. Literally a single simple signal handler

Topology updater is made by people from the community, not by IOG. Until p2p is deployed, a static topology file is used that’s read on node startup. Topology updater just changes that static file completely independently.

The things you want are not important and only a gimmick. Eventually p2p will replace topology updater and I don’t see the problem with restarting the node for loading new KES keys. There’re other more important things now were focus should be on.

1 Like

The point is this should have been built into cardano itself from the very beginning and not left to the “community” to implement with pile of hacky scripts and dodgy workarounds. Key rotation is well understood, as is signaling a process to reread a file, or using INOTIFY to detect if files change. It isn’t new, it isn’t difficult, it is literally standard for pretty much everything that reads keys or has to read configuration files that might change on the fly.

No they shouldn’t.

yup. its a bit nonsensical the way things currently are. all the information needed to connect to all the other relays is present on the blockchain, but folks are still using hacky scripts that store and retrieve relay lists from 3rd party services. This lovely decentralised system isnt so decentralised after all.

Few people know why theyre running the scripts, other than ‘everyone else does it’. Even fewer have bothered to write down their purpose anywhere. Documention is abysmal, chat rooms are king. Unfortunately I’ve seen the quality of info that comes out of Telegram, and for 2 pieces useful that comes out - it’s tainted with 6 pieces that are outdated or irrelevant, and 2 pieces that are actually wrong

Not only are there a bunch of hacky scripts around, but theres next to no useful documentation - if following the ‘topology updater’ script instructions then (direct from the source) “it’s expected that you also add a scheduled restart of the relay” - and due to the amount of time it takes a relay to start up, if all your relays do a topology update at the same time, then you’re guaranteed a few minutes around the restart time when block production will be impossible.

Someone will follow this up with ‘but of course your relays shouldnt do a topology update at the same time’ and they’ll be right - but it doesnt change the fact that topology info should be pulled from the blockchain, not from random-3rd-party with hacky script.

P2P discovery cannot arrive soon enough. The current situation is ridiculous.

2 Likes