Cardano RPC node - GraphQL API setup

stmx38 · 25 May 2022 11:45

Hello,

We started to discover how we can setup Cardano RPC to be able to get blockchain data via API.

Accordingly to the Cardano architecture, we need just some components. We can interact using GraphQL with data stored in PostgreSQL DB and collected by DB-Sync from the Full node.

App --> GraphQL API --> PostgreSQL <-- DB Sync <-- Node

Also, we found information about cardano-node-ogmios, but looks like it is not solution we need.

The easiest way to run all this is using docker compose.

We already started

Postgres - works
Node - in sync
DB-Sync - follow node and sync data
GraphQL - doesn’t work (Connection refused)
Hashura - started and we can access console but not clear why do we need it
Ogmios - not use it because it is not specified on the Architecture diagram

Issue

GraphQL container is up and running and we also enabled debug logs

graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":20,"module":"Db","msg":"pgSubscriber: Connected","time":"2022-05-25T11:30:23.065Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":30,"module":"Server","msg":"Initializing","time":"2022-05-25T11:30:23.066Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":30,"module":"HasuraClient","msg":"Initializing","time":"2022-05-25T11:30:23.466Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":20,"module":"HasuraClient","msg":"{\"level\":\"info\",\"msg\":\"Help us improve Hasura! The cli collects anonymized usage stats which\\nallow us to keep improving Hasura at warp speed. To opt-out or read more,\\nvisit https://hasura.io/docs/1.0/graphql/manual/guides/telemetry.html\\n\",\"time\":\"2022-05-25T11:30:23Z\"}\n{\"level\":\"info\",\"msg\":\"Applying migrations...\",\"time\":\"2022-05-25T11:30:24Z\"}\n{\"level\":\"info\",\"msg\":\"nothing to apply\",\"time\":\"2022-05-25T11:30:24Z\"}\n","time":"2022-05-25T11:30:24.355Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":40,"module":"Db","msg":"pgSubscriber: cardano-db-sync-extended starting, schema will be reset","time":"2022-05-25T11:31:12.982Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":30,"module":"CardanoNodeClient","msg":"Initializing. This can take a few minutes...","time":"2022-05-25T11:31:13.469Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":20,"msg":"Establishing connection to cardano-node: Attempt 1 of 101, retrying...","time":"2022-05-25T11:31:13.481Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":20,"module":"DataFetcher","instance":"ServerHealth","msg":"Initial value fetched","time":"2022-05-25T11:31:14.483Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":20,"msg":"Establishing connection to cardano-node: Attempt 2 of 101, retrying...","time":"2022-05-25T11:31:14.487Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":20,"module":"DataFetcher","instance":"ServerHealth","msg":"Initial value fetched","time":"2022-05-25T11:31:15.687Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":20,"msg":"Establishing connection to cardano-node: Attempt 3 of 101, retrying...","time":"2022-05-25T11:31:15.690Z","v":0}
graphql  | {"name":"cardano-graphql","hostname":"65f028378a05","pid":1,"level":20,"module":"DataFetcher","instance":"ServerHealth","msg":"Initial value fetched","time":"2022-05-25T11:31:17.132Z","v":0}

Questions

We can’t connect to the GraphQL playground and port is not opening and we have the following questions

How long GraphQL will initialize?
To which node it is trying to connect, because it needs connection only to Postgres accordingly to the Architecture?
Accordingly to the Architecture diagram we need Node/DB-Sync/Postgres/GraphQL. What is about Hashura/Ogmios, do we need them and why?
Their configuration is specified in the docker-compose for cardano-graphql.
Is Cardano Node store all blockchain data?
Analogy from Ethereum - Archive node, will we be able to discover all the transactions, starting from the genesis?
If Node store all the data, is there a way to prune old data?
DB-Sync started to sync data with PostgreSQL and we see 64G node-db / 60G postgres.
It means that we have two copies of the data, one on the Node and second in the PostgreSQL?
Is there a way to maintain just one copy of the data? It may refer to Q.2 - pruning.
Should we wait Node to be sync before running DB-Sync or it will handle all by itself and handle all sync/pauses/delays and so on?
DB-Sync extended is the same docker image and EXTENDED=true variable activate extended mode?
If we initially started in a simple mode (without variable), can be change it and an epoch table will be added or we need to do a full re-sync?

stmx38 · 20 June 2022 11:36

We discovered the following about Cardano RPC API configuration

We should run Postgres / Ogmios / DB-Sync / Hasura / GraphQL
GraphQL uses Postgres and Harusa and Ogmios
Postgres configuration should be tuned
When we start sync from the scratch or from snapshot we will have all the data from genesis block
We will have a kind of duplicated data - on the Node and in the Postgres and it is not clear how to prune Node data (probably for now it is not possible)

We started our journey by running 4 different nodes with a different configuration and it was related to the disk performance and sync method.

Node	Instance	Performance	Disk	Sync type	Sync duration
Cardano-ebs-gen	r5a.xlarge	2 x vCPU / 16 GiB	EBS - 400 GB / GP3/ 16,000 IOPS/ 1,000MiB/s	Genesis	> 180 days (estimate)
Cardano-ebs-snap	r5a.2xlarge	4 x vCPU / 32 GiB	EBS - 400 GB / GP3/ 16,000 IOPS/ 1,000MiB/s	Snapshot	~ 11 days
Cardano-ssd-snap	i4i.xlarge	4 x vCPU / 32 GiB	SSD - 937 GB / AWS Nitro / 55,000 IOPS	Snapshot	~ 4 days
Cardano-ssd-2x-gen	i4i.2xlarge	8 x vCPU / 64 GiB	SSD - 1,875 GB / AWS Nitro / 110,000 IOPS	Genesis	~ 9 days

Note: Cardano-ebs-gen was stopped because of the too low sync speed. Maybe early experiment was not so correct and we will try to repeat it later.

Disk space

2022-06-20 - epoch 346

volumes/db-sync  -  12G
volumes/node-db  -  65G
volumes/node-ipc -   0G
volumes/postgres - 265G

Sync from snapshot

OS partition should have enough space for snapshot downloading ~ 25 GB (archive size)
Snapshot unpack is done to the db-sync container /tmp directory ~ 70 GB (snapshot size) and we can use volumes to mount it on the OS /data disk
After snapshot restore is finished, we should restart whole container stack and then db-sync will continue to sync following the node blocks
After db-sync will finish to sync from the node, we should restart whole stack and then graphql will open its interface

Sync from genesis

We should use disk with a high IOPS ~ 60,000 IOPS, please see db-sync System Requirements
After db-sync will finish to sync from the node, we should restart whole stack and then graphql will open its interface
We should copy all data from SSD disk to EBS volume and then start whole stack from it

Guide for AWS

Strategy

Run instance with instance store fast SSD - i4i.2xlarge
Perform sync on SSD from genesis
Mount EBS volume
Copy data from SSD to EBS
Change instance type to the one without instance store - r6i.xlarge
Update postgres containet memory settings

Instance type: i4i.2xlarge
OS: Amazon Linux 2
Performance: 8 vCPU / 64 GiB
Disks: / = 20 GB / GP3 | /data = 1,875 GB AWS Nitro SSD / 200,000 Read IOPS / 110,000 Write IOPS | /data2 = 500 GB GP3 / 16,000 IOPS / 1,000 MiB/s

# Instance store

# List
lsblk

# Format
mkfs -t xfs /dev/nvme1n1

# Mount
mkdir /data
mount /dev/nvme1n1 /data

# Docker
sudo amazon-linux-extras install docker -y
sudo systemctl enable docker
sudo systemctl start docker

# Docker compose
sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Volumes directories
directory="/data/cardano"

mkdir -p $directory/volumes/{db-sync,postgres,node-db,node-ipc}

cd $directory

# Get configs
yum install git

git clone \
  --single-branch \
  --branch 6.2.0 \
  --recurse-submodules \
  https://github.com/input-output-hk/cardano-graphql.git

# PostgreSQL secrets
mkdir secrets

echo -n cexplorer >secrets/postgres_db
echo -n postgres >secrets/postgres_user
grep -ao '[A-Za-z0-9]' < /dev/urandom | head -30 | tr -d '\n' >secrets/postgres_password

# Docker compose
vi docker-compose.yml

version: "3.9"

services:
  postgres:
    container_name: postgres
    image: postgres:11.5-alpine
    shm_size: 10g
    command: -c min_wal_size=2GB -c max_wal_size=8GB -c work_mem=4GB -c max_worker_processes=8 -c shared_buffers=16GB -c effective_cache_size=48GB
    environment:
      - POSTGRES_LOGGING=true
      - POSTGRES_DB_FILE=/run/secrets/postgres_db
      - POSTGRES_USER_FILE=/run/secrets/postgres_user
      - POSTGRES_PASSWORD_FILE=/run/secrets/postgres_password
    secrets:
      - postgres_db
      - postgres_password
      - postgres_user
    ports:
      - ${POSTGRES_PORT:-5432}:5432
    volumes:
      - postgres:/var/lib/postgresql/data
    restart: on-failure
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    logging:
      driver: "json-file"
      options:
        max-size: "200k"
        max-file: "10"

  ogmios:
    container_name: ogmios
    image: cardanosolutions/cardano-node-ogmios:${CARDANO_NODE_OGMIOS_VERSION:-v5.4.0}-${NETWORK:-mainnet}
    environment:
      - NETWORK=${NETWORK:-mainnet}
    ports:
      - ${OGMIOS_PORT:-1337}:1337
    restart: on-failure
    healthcheck:
      test: ["CMD-SHELL", "curl -f 127.0.0.1:12788 || exit 1"]
      interval: 60s
      timeout: 10s
      retries: 5
    volumes:
      - node-db:/db
      - node-ipc:/ipc
    logging:
      driver: "json-file"
      options:
        max-size: "400k"
        max-file: "20"

  db-sync:
    container_name: db-sync
    image: inputoutput/cardano-db-sync:12.0.2
    command: [
      "--config", "/config/cardano-db-sync/config.json",
      "--socket-path", "/node-ipc/node.socket"
    ]
    environment:
      - EXTENDED=true
      - POSTGRES_HOST=postgres
      - POSTGRES_PORT=5432
      - RESTORE_SNAPSHOT=${RESTORE_SNAPSHOT:-}
      - RESTORE_RECREATE_DB=N
    secrets:
      - postgres_db
      - postgres_user
      - postgres_password
    depends_on:
      ogmios:
        condition: service_healthy
      postgres:
        condition: service_healthy
    volumes:
      - ./cardano-graphql/config/network/${NETWORK:-mainnet}:/config
      - db-sync:/var/lib/cdbsync
      - node-ipc:/node-ipc
    restart: on-failure
    logging:
      driver: "json-file"
      options:
        max-size: "200k"
        max-file: "10"

  hasura:
    container_name: hasura
    image: inputoutput/cardano-graphql-hasura:${CARDANO_GRAPHQL_VERSION:-6.2.0}
    ports:
      - ${HASURA_PORT:-8090}:8080
    depends_on:
      postgres:
        condition: service_healthy
    restart: on-failure
    environment:
      - HASURA_GRAPHQL_ENABLE_CONSOLE=true
      - HASURA_GRAPHQL_CORS_DOMAIN=http://localhost:9695
    secrets:
      - postgres_db
      - postgres_password
      - postgres_user
    logging:
      driver: "json-file"
      options:
        max-size: "200k"
        max-file: "10"

  graphql:
    container_name: graphql
    image: inputoutput/cardano-graphql:${CARDANO_GRAPHQL_VERSION:-6.2.0}-${NETWORK:-mainnet}
    environment:
      - OGMIOS_HOST=ogmios
      - ALLOW_INTROSPECTION=true
      - CACHE_ENABLED=true
      - LOGGER_MIN_SEVERITY=${LOGGER_MIN_SEVERITY:-debug}
    secrets:
      - postgres_db
      - postgres_password
      - postgres_user
    depends_on:
      postgres:
        condition: service_healthy
    expose:
      - ${API_PORT:-3100}
    ports:
      - ${API_PORT:-3100}:3100
    restart: on-failure
    logging:
      driver: "json-file"
      options:
        max-size: "200k"
        max-file: "10"

secrets:
  postgres_db:
    file: ./secrets/postgres_db
  postgres_password:
    file: ./secrets/postgres_password
  postgres_user:
    file: ./secrets/postgres_user

volumes:
  postgres:
    driver: local
    driver_opts:
        type: none
        device: /data/cardano/volumes/postgres
        o: bind
  node-db:
    driver: local
    driver_opts:
        type: none
        device: /data/cardano/volumes/node-db
        o: bind
  node-ipc:
    driver: local
    driver_opts:
        type: none
        device: /data/cardano/volumes/node-ipc
        o: bind
  db-sync:
    driver: local
    driver_opts:
        type: none
        device: /data/cardano/volumes/db-sync
        o: bind

# Run
docker-compose up -d

# Check logs
docker-compose logs -f

Migrate from SSD to EBS

Mount EBS volume to /data2

Rsync /data to /data2 ~ 4 hours

rsync -av --delete --progress --stats /data/ /data2/

Stop whole stack
Perform latest rsync ~ 1 hours
Umount SSD from /data and mount EBS to /data
Start whole stack on /data
Wait for node chunks validation and for db-sync and graphql start ~ 1 hour
After some period of time, db-sync will finish to sync from the node and graphql will open the port
Now we can change instance type to the one without SSD

COSDpool · 20 June 2022 17:16

I’ve never set up the full Cardano API stack but for other application stacks on AWS I’ve seen an astonishing difference in reliability & performance after moving the DB part of the application to an instance of Amazon RDS:

stmx38 · 20 June 2022 17:42

Good point and probably should be tested. And we also can consider Amazon Aurora and maybe Serverless v2. We should pay attention to the price because of the hight I/O at least during the initial sync.

DB Sync best practices

Having all software on the same machine

The recommended configuration is to have node, DB Sync, and PostgreSQL servers on the same machine. During syncing (getting historical data from the blockchain), there is a large amount of data traffic between db-sync and the database. Traffic to a local database is significantly faster than traffic to a database on the LAN or other remote location.

Miroslav · 20 September 2022 09:10

We tried to use PgSQL Aurora for Cardano.
First, it was pretty slow. Especially deleting blocks during restart took tens of minutes, while local PgSQL can do it typically under two minutes.
Second, as Aurora IO is paid per transaction, the IO price was significant, around USD 5k (five thousand) a month. RDS instance cost was on top of that.
So this avenue seems to be a dead end, at least for us.

Topic		Replies	Views
Cardano Node & graphQl Setup a Stake Pool	1	420	27 June 2021
Cardano node APIs Cardano Integration cardano , api , cardano-node	3	1263	6 January 2022
Getting started with Cardano GraphQL Cardano Integration	0	704	19 February 2022
Cardano Node API's document Misc Dev Talk cardano	16	1494	3 May 2021
Does cardano-graphql container need internet access? Cardano Integration	0	420	30 August 2020