now we put this in front of the node startup script, in order to generate a fresh op.cert valid for 5 more days on every node start.
Note: this is not a permanent solution. First because the cold keys for security reasons should not be accessible this way on the node. Second because we expect soon the ability to inject new op.certs through cli commands without restarting the node.
I would not advise this as a solution. The real reason we have put such a short KES period is for us to build practices and tooling to manage the cold keys.
Get a protected USB, get a cold laptop/computer and build some healthy habits
right.
but first step (without any clear documentation given) is to figure out how to find and set the right KES period.
Then clearly next step is to develop a remote request, receiving the offline calculated new certificate, and restarting the node.
Posted on TG, but thought it would be ok to post it here too:
Ok, I got my head around the KES Evolution process.
They are using the MMM (forward-seure sheme which is a tree-like digital signature scheme. The key evolution signatures are only valid for some time period, which is currently set to 120 in fnf.
The time period is based on slots, and it set to 3600.
It starts /w 0 (I think) and increasing by one in each time/kesperiod i.e. in every 3600 slots.
But, some log message would be nice to check.
The verifying key is constant during this 120 time period, but the singing key is evolving incrementally in each (n-th) time/kes period from the initial KES signing key, what is loaded together /w the operational certificate that contains the start kesPeriodwhen the node is started.
So, how it works, when we create an ops cert that contains the KES verifying key and we specify the kesPeriod (example 67 from echo $(( 241587 / 3600 )), it just simply means, that the operational certificate is validonly from this kes period till the 67+120-1 the 186.
And if we restart the node any time between those kesPeriods (67..186) that’s won’t be an issue as we have the original KES signing key and the ops certificate that contains the kesPeriod the 67 and we can evolve the new n-thsigning key from the original and the difference of the current kesPeriod and the start one the 67. E.g. n = 157 - 67 = 90th
Also, it does not matter if I create a new KES keypair based on the cold key and the 172th kesPeriod, I can restart (or submit the opcert and KES signing key online through API) the node any time after that 172nd and the previous’ last 186th, the node happily will singing blocks /w the new KES keys and the old will be revoked as the cold counter is bigger in the new cert than in the old one.
However you can trick the system using same counter, but I am not sure how it would behave, but if it’s recognised by the protocol as a protocol violation then that pool should be punished very hard.
What was unclear to me : Do we have to generate a new KES key pair when we are about to rotate the validity period of the previous KES ?
As it turned out, you can keep the same KES, provided you change the starting period time. It works, and after some thoughts, it makes sense. What matters is the new --kes-period and that the node.skey signs it.
But security wise, is there a valid reason not to do it like that? We discussed it a bit with Marek and I am not sure how to proceed. If KES.skey is compromised, it anyway means your server has been breached somehow. So, the incident might occur again and again …
It depend on your use case. I can see an enterprise grade stake pool that is just a preformated black box, with keys on it, no physical access and the only thing that is not on the read-only file system would be the opcert on an usb key that is picked up on boot. Just an example.
Because if you think about it, with using the same channels to transfer your KES keys as well as your opcert is just another angle of attack.
But in most cases, I guess it is safe to generate new KES keys too. Again, depends on individual use case.
But you can fire up a new standby node in seconds/hours /w a new node cold counter, and the other compromised pool is demolished as soon as your new starts. Also, you would just loose some blocks therefore some rewards.
Also, opscert signing needs cold.skey, therfore the cert must be transferred, if that’s the case anyway I would be much more happy with new KES, then reuse the old ones.
Cold key is all about to prevent pools’ identity theft.