Problems after Alonzo Hard Fork about to miss a block

ok, do you have teams?

1 Like

I can send mine on in an hour or so. Just coming back home :pray:

yes- traceMempool is set to false on the relay node.
The block producer node glive was what I posted above. The peer in/out is only going to my non-functioning node just like NicoM

show me the topology file for your producer

Yeah I do, I can talk later when I home. :+1:

1 Like
{
  "Producers": [
    {
      "addr": "192.168.1.2",
      "port": 6000,
      "valency": 1
    },
    {
      "addr": "192.168.1.41",
      "port":6001,
      "valency": 1
    }
  ]
}

Hmmm can u go to config file and share the hash for alonzo file?

It’s the same with this?


7e94a15f55d1e82d10f09203fa1d40f8eede58fd8066542cf6566008068ed874

https://hydra.iohk.io/build/7370192/download/1/mainnet-config.json

yes all alonzo configs are updated with
“AlonzoGenesisFile”: “mainnet-alonzo-genesis.json”,
“AlonzoGenesisHash”: “7e94a15f55d1e82d10f09203fa1d40f8eede58fd8066542cf6566008068ed874”,
“ApplicationName”: “cardano-sl”,
“ApplicationVersion”: 1,

1 Like

Here are my gLive Outputs for BP and Relays. As you can see, Relay 1 is fine while the other one is not. and the BP is syncing with the Relay that is starting.

identical to what is happening to me

1 Like

Read this

It looks u need to resync the db

1 Like

So even a DB snapshot download and resync didn’t work for the down node and core. I ended up converting my Alonzo Purple machine to a new mainnet relay. Then I made that the core that connected to my one working relay node. Then I wiped the old core (completely reinstalled Ubuntu) and set up a 2nd relay. All looks good now, but without a DB to copy, this process would have taken 4-5 days rather than 24 hours. Epoch salvaged. I would still love to know what actually happened. We were due to mint a block about 20mins into the epoch change, and with the hardfork transition delay, I don’t think that was every gonna happen. But the nodes never recovered. Be interested to hear how you fix your issue @NicoM ! Thanks for your help @Alexd1985
-Sully

It was so much chaos… so I am asking again… u don’t have at least one relay up? Coz u can start it as a producer temporaly to mint the block

Some one fixed

okay figured out my issue, was using coincashew guide and somehow pulled the wrong build number: 6510764

this then pulled and incorrect alonzoGenesis and wrong hash for the mainnet-config

pulling the correct Genesis and correct hash fixed my issues and everything is syncing now. The correct build is 7578887

You can find the right files here:

https://hydra.iohk.io/build/7370192/download/1/index.html

Thanks Alexd1985. I was having the same issue and did replace the genesis file as well as the config file a few times. I think I didn’t have the right version of them. Now my node is finally syncing again.

That’s basically what I did, Alex. I created an extra working relay and then converted that relay to the BP node. After I proved that was processing Tx, I built a another relay and got the whole thing working like it should.

Until early this AM when I was supposed to mint a block. My leader slot passed me by without adoption. With some help, I found a KES number discrepancy in the journal log at the time of the missed block. Despite it looking up to date on gLive.

So something about just moving kes.skey, vrf.skey, and node.cert to a new machine didn’t work for me. This morning a refreshed my KES keys and node.cert, and I finally minted a block successfully this epoch. Not sure that’s a mandatory requirement, but based on what happened to me, if I ever have to move my BP core again, I’ll likely refresh my KES before starting it up.
I’m not sure of anyway to check its validity without just seeing what happens when your block time comes.

Anyway, the saga is over (I think). If it holds, still should have a decent epoch. Learned a lot in the process about troubleshooting, but still never figured out exactly why everything went down. Was due for a block 20 mins into the hardfork, and I wonder if the delay in epoch transition messed something up. Complete rebuild was the inelegant brute-force solution, but it was the only one I was pretty sure would work, so after a day of uncertainty, I just went for it.

You have an update @NicoM ?

Read here

It was one of my first troubleshooting steps, to download an updated cardano-node binary and the latest iohk config files. For me, it didn’t work though. I would love to have tinkered more to figure it out, but as I didn’t seem to be making progress, I felt a brute-force rebuild would get me back to block-minting the quickest…

Hey, sInce our pool is not minting blocks I’m taking a slower approach with the help of @Alexd1985. It was weird that both my of non-working nodes did not compile cabal 3.0.4.0 at the time of upgrading to 1.29.0. So I downloaded the latest mainnet IOHK DB snapshot, uploaded it to my BP, extracted it and it is nearly synced up. I will let you know later today to see if that worked.

1 Like

After all of this u will need to te sync the nodes

hi Alex,
I am facing the same issue after hard fork.

TDLR : Relay Node works fine. Core Node unable to sync with Relay Node.

I have tried reinstalling the latest binary files as suggested above.
Replacing the db files with the latest db files.
Tried syncing with IOHK node. Core node is able to sync with IOHK node.
Switched out with private relay addr. However, unable to sync and the attached is the problem.

Telnet works fine both ways.