BP is missing slots since upgrading to 1.27.0

Dear Cardano Community,

My BP is currently missing slots at an average rate of 2.5 slots per hour since I upgraded to 1.27.0.

I already went through several threads on the Cardano forum but I couldn’t find any conclusive solution to this issue. As suggested in many related topics, I moved my BP from a VPS to a bare metal server and increased the RAM of the hardware from 8GB to 16GB. However, I’m still missing slots at a similar rate like before.

My current setup is:

BP → dedicated server at home: 4 cores, 16GB RAM, 15GB swap and 256GB SSD (Zurich, Switzerland)
Relay 1 → dedicated server at home: 4 cores, 8GB RAM, 7GB swap and 256GB SSD (Zurich, Switzerland)
Relay 2 (Backup) → VPS on data center: 4 cores, 8GB RAM, 7GB swap and 80GB SSD (Western Germany)
Internet connection: 1GB up/down

If I run the command I journalctl -e -f -u cnode.service on my BP and relays, I don’t see any issues related to insufficient CPU like cnode.service being terminated with signal SIGKILL, for instance.

I’m a SPO and my pool is extremely small (85k live stake). So I want to lower the chances of missing any of the few blocks that will be assigned to my pool as much as I can.

Is this a known issue? Am I missing something in my setup? Do I need to change any ENV parameters to improve my BP’s performance?

Many thanks in advance for the help!

Hi, we have a very similar pool setup and stake, and same issues, I partially solved them by upgrading BP CPU from 2C to 4C.
My missed slots output lowered by a lot but they’re still present. Since I was not scheduled for any blocks, last week I was strictly monitoring my setup and i found that the old 2C BP -now acting as relay- is sometimes missing on new BP connections, and this is happening quite frequently. And when this happens, the chain tip difference on BP starts increasing.
I’ll try and switch off this relay to see if it’s the responsible for all the old and new issues.

Next try I’m doing is connecting BP directly to my router instead of being connected to the “stakepool switch” with other relays, to minimize latency as much as possible and avoid relays to steal any bandwidth to the BP through the honestly-quite-old switch they’re all now connected to…

To save RAM, i set Tracemempool to FALSE on BP, you might consider that to save some RAM if you haven’t done it yet. I was considering to increase RAM but reading from you and others, that should hardly be the issue…i’ll let you know!

Hola! Thank you for your answer. I was also considering to upgrade the RAM for my relay on a data center to 16GB and check if that helps. Although, I think my CPU activity is looking quite stable on all of my nodes (between 3% and 8% in the last 5 days), but for the sake of the investigation, I’ll set the TraceMempool option to FALSE on the config.json for my BP and for one of my relays and monitor if that helps a little bit. How are you currently checking the latency on your nodes? I had that idea of connecting my servers at home directly to my router as well. Unfortunately, this didn’t help at all. How many slots are you currently missing per hour approximately?

I might have found the solution, or at least a solution that’s perfectly working for me, and that involved my network architecture.
Originally, all stakepool traffic was prioritized on my router and sent to the unmanaged stakepool switch.
So that was my clue: BP has 3 connections to my 3 relays (2 baremetal, 1 cloud), while the 2 baremetal relays have tenths of connections each. All of those are prioritized by router above all other internet traffic, but when they come to the switch it has to manage around 70 connections with no priority capabilities, BP’s connections get queued by some physical electronic part and BP’s priority is eventually lost, causing missed slots.

So I made a new cable from BP to router and restarted the node just before epoch change. Results? 0 missed slots until epoch change, 182 on epoch change (that’s fine from what I read), all of them missed in the first hour after epoch change, and no more since then!! Finally!!!

Just my 2 cents suggestions for baremetal setup are then:

  • connect BP to proven managed network hardware to manage priority in depth, or
  • connect BP to router on dedicated cable, no switches in the middle
  • relays seem to be not affected from any network point you plug them into

Hope to be useful for someone else, let me know in case!

1 Like

Thank you for sharing and I’m glad that you found a solution! I’ll try this out. Setting the TraceMempool option to FALSE definitely improved the situation for me as well. Although, I’ll probably have to upgrade my current router because it doesn’t allow me to prioritize the traffic on my ports. I’ll keep this thread posted as soon as I have relevant results.