Problems after Alonzo Hard Fork about to miss a block

Alexd1985 · 13 September 2021 15:40

ok, do you have teams?

NicoM · 13 September 2021 15:46

I can send mine on in an hour or so. Just coming back home

mdsullivan · 13 September 2021 15:46

yes- traceMempool is set to false on the relay node.
The block producer node glive was what I posted above. The peer in/out is only going to my non-functioning node just like NicoM

Alexd1985 · 13 September 2021 15:48

show me the topology file for your producer

NicoM · 13 September 2021 15:49

Yeah I do, I can talk later when I home.

mdsullivan · 13 September 2021 16:07

{
  "Producers": [
    {
      "addr": "192.168.1.2",
      "port": 6000,
      "valency": 1
    },
    {
      "addr": "192.168.1.41",
      "port":6001,
      "valency": 1
    }
  ]
}

Alexd1985 · 13 September 2021 16:22

Hmmm can u go to config file and share the hash for alonzo file?

It’s the same with this?


7e94a15f55d1e82d10f09203fa1d40f8eede58fd8066542cf6566008068ed874

https://hydra.iohk.io/build/7370192/download/1/mainnet-config.json

mdsullivan · 13 September 2021 16:27

yes all alonzo configs are updated with
“AlonzoGenesisFile”: “mainnet-alonzo-genesis.json”,
“AlonzoGenesisHash”: “7e94a15f55d1e82d10f09203fa1d40f8eede58fd8066542cf6566008068ed874”,
“ApplicationName”: “cardano-sl”,
“ApplicationVersion”: 1,

NicoM · 13 September 2021 16:57

Here are my gLive Outputs for BP and Relays. As you can see, Relay 1 is fine while the other one is not. and the BP is syncing with the Relay that is starting.

mdsullivan · 13 September 2021 17:16

identical to what is happening to me

Alexd1985 · 13 September 2021 18:32

Read this

It looks u need to resync the db

mdsullivan · 14 September 2021 02:25

So even a DB snapshot download and resync didn’t work for the down node and core. I ended up converting my Alonzo Purple machine to a new mainnet relay. Then I made that the core that connected to my one working relay node. Then I wiped the old core (completely reinstalled Ubuntu) and set up a 2nd relay. All looks good now, but without a DB to copy, this process would have taken 4-5 days rather than 24 hours. Epoch salvaged. I would still love to know what actually happened. We were due to mint a block about 20mins into the epoch change, and with the hardfork transition delay, I don’t think that was every gonna happen. But the nodes never recovered. Be interested to hear how you fix your issue @NicoM ! Thanks for your help @Alexd1985
-Sully

Alexd1985 · 14 September 2021 04:22

It was so much chaos… so I am asking again… u don’t have at least one relay up? Coz u can start it as a producer temporaly to mint the block

Some one fixed

okay figured out my issue, was using coincashew guide and somehow pulled the wrong build number: 6510764

this then pulled and incorrect alonzoGenesis and wrong hash for the mainnet-config

pulling the correct Genesis and correct hash fixed my issues and everything is syncing now. The correct build is 7578887

You can find the right files here:

https://hydra.iohk.io/build/7370192/download/1/index.html

righins_mtb · 14 September 2021 12:47

Thanks Alexd1985. I was having the same issue and did replace the genesis file as well as the config file a few times. I think I didn’t have the right version of them. Now my node is finally syncing again.

mdsullivan · 14 September 2021 15:23

That’s basically what I did, Alex. I created an extra working relay and then converted that relay to the BP node. After I proved that was processing Tx, I built a another relay and got the whole thing working like it should.

Until early this AM when I was supposed to mint a block. My leader slot passed me by without adoption. With some help, I found a KES number discrepancy in the journal log at the time of the missed block. Despite it looking up to date on gLive.

So something about just moving kes.skey, vrf.skey, and node.cert to a new machine didn’t work for me. This morning a refreshed my KES keys and node.cert, and I finally minted a block successfully this epoch. Not sure that’s a mandatory requirement, but based on what happened to me, if I ever have to move my BP core again, I’ll likely refresh my KES before starting it up.
I’m not sure of anyway to check its validity without just seeing what happens when your block time comes.

Anyway, the saga is over (I think). If it holds, still should have a decent epoch. Learned a lot in the process about troubleshooting, but still never figured out exactly why everything went down. Was due for a block 20 mins into the hardfork, and I wonder if the delay in epoch transition messed something up. Complete rebuild was the inelegant brute-force solution, but it was the only one I was pretty sure would work, so after a day of uncertainty, I just went for it.

You have an update @NicoM ?

Alexd1985 · 14 September 2021 15:30

Read here

mdsullivan · 14 September 2021 16:28

It was one of my first troubleshooting steps, to download an updated cardano-node binary and the latest iohk config files. For me, it didn’t work though. I would love to have tinkered more to figure it out, but as I didn’t seem to be making progress, I felt a brute-force rebuild would get me back to block-minting the quickest…

NicoM · 14 September 2021 16:39

Hey, sInce our pool is not minting blocks I’m taking a slower approach with the help of @Alexd1985. It was weird that both my of non-working nodes did not compile cabal 3.0.4.0 at the time of upgrading to 1.29.0. So I downloaded the latest mainnet IOHK DB snapshot, uploaded it to my BP, extracted it and it is nearly synced up. I will let you know later today to see if that worked.

Alexd1985 · 14 September 2021 16:50

After all of this u will need to te sync the nodes

tayaya · 15 September 2021 03:52

hi Alex,
I am facing the same issue after hard fork.

TDLR : Relay Node works fine. Core Node unable to sync with Relay Node.

I have tried reinstalling the latest binary files as suggested above.
Replacing the db files with the latest db files.
Tried syncing with IOHK node. Core node is able to sync with IOHK node.
Switched out with private relay addr. However, unable to sync and the attached is the problem.

Telnet works fine both ways.

Topic		Replies	Views
Issue talking to my relay node Operate a Stake Pool	2	923	30 August 2020
BP node not connecting to Relay node Operate a Stake Pool	44	1799	25 October 2021
I can not go past this step on my relay node Setup a Stake Pool	15	796	14 April 2022
Relay and BP node stuck while syncing Operate a Stake Pool	2	784	26 May 2021
Node not syncing after epoch 207 (14.9%) Please help Setup a Stake Pool	63	4601	10 August 2021

Problems after Alonzo Hard Fork about to miss a block

Related topics