## 1. Why blockchain data

Blockchain is a distributed digital ledger technology that enables secure and transparent transactions without a single entity.

Blockchain ensure data integrity and immutability. Because each block in the chain is connected to the previous one using a hash function, it’s impossible to alter or delete any data once it has been added to the chain.

Blockchain benefits data analysis through its data transparency. Anyone on the network can see the same information, and the history of all transactions is stored in the blockchain, which is useful for data analysis as it ensures the data’s integrity and reliability.

## 2. NFT trait and it’s relation to price

NFTs (Non-Fungible Tokens) are stored on a blockchain in the form of a JSON (JavaScript Object Notation) file that contains the metadata associated with the NFT. The metadata provides information about the asset represented by the NFT, such as its name, description, image, and other attributes.

In a JSON file for an NFT, the “{attributes}” field typically corresponds to the visual characteristics of the NFT image. For example, the image below a clay man wear a flow necklace, so there’s a corresponding

`"accessories":"Flower Necklace"`

inside metadata.
The “image” field in the JSON file Typically contains a IPFS hash that links to the actual image file, which is typically stored off-chain. IPFS (InterPlanetary File System) is a decentralized, peer-to-peer network protocol that is designed to store and share files in a distributed manner.

Typically, NFT image consists of multiple layers(background, clothes, face, hair, handhold, body), the artist can set different probablity for each trait, then NFT generation process combining these layers into a final image by randomly choose trait based on probablity, thus, different trait have different rarity.

## 3. Data Pre-processing

### Dynamic pricing

Marketing conditions can have a significant impact on the price of NFTs. For example, in a good market condition, it may drive up demand for NFTs with that trait, leading to higher prices, and vice versa. This could make it difficult to accurately analyze the relationship between the trait and the price, as the price may be influenced more by marketing conditions than by the trait itself.

Below is the **Average sale price**, we can see even the average price it is really dynamic based on market condition. (It should be steady if the NFT price based on trait)

To mitigate the influence of marketing conditions on data analysis of NFT prices versus traits, it’s important to get rid of external factors such as market condition, thus, we use the time-normalized method to reduce the effect of external factors.

$𝑚(𝑡)= \frac{price_{avg}}{𝑚(𝑡)} ∗ \frac{1}{n} ∑_{𝑖=𝑡−𝑛}^{𝑡+𝑛}𝑚(𝑖)$

- $m(t)$: The time-normalized price in time $t$
- $price_{avg}$: The average price of all sales
- $n$: the time normalize range

Here we use the time-normalized method to reduce the effect of external factors and make the price steady.

### Rarity score

As we mentioned above, the NFT trait express by text, using text as training data can be very complex. Also, text data can be difficult to generalize and apply training algorithms to new or unseen data.

To overcome the challenges with text data, we transform the text into numerical data using the NFT generation principal mentioned on above. Normally the artist will generation specifical amount of NFT using the trait layers (in this projecrt is 10000 NFTs) according to the artist’s probability setting, therefore, we can traverse the blockchain to find all 10000NFTs and determined the rarity score of each trait.

This rule of rarity transformation apply: For a collection containing $N$ NFTs, the trait rarity score, $R_t$, for a trait $t$ shared by $r$ NFTs is defned as $R_t=(\frac{r}{N})^{−1}$

Therefore, we can transform the trait description to a rarity score.

## Data training and prediction

### Guess Price (Linear Regression)

- Dataset
- 6060 rows of NFT sales data
- Features
- accessories_score
- background_score
- body_score
- brows_score
- clothes_score
- eyes_score
- hats_and_hair_score
- wings_score
- Price

- Time frame: December 1, 2022 ~ April 30, 2023

- Iterations: 100
- Learning rate: 0.01
- Training data: 80%
- testin data: 20%
- lost function
`def compute_cost(sample, target, weight): predictions = sample.dot(weight) errors = np.subtract(predictions, target) sqrErrors = np.square(errors) cost = 1 / (2 * sample.shape[0]) * np.sum(sqrErrors) return cost`

- Gradient Descent function
`def gradient_descent(X, y, weight, alpha, iterations): cost_history = np.zeros(iterations) weight_history = np.empty((iterations,), dtype=np.ndarray) for i in range(iterations): predictions = X.dot(weight) errors = np.subtract(predictions, y) sum_delta = (alpha / X.shape[0]) * X.transpose().dot(errors); weight = weight - sum_delta; cost_history[i] = compute_cost(X, y, weight) weight_history[i] = weight return weight, cost_history, predictions, weight_history`

- Result

- Acurracy: 71%

### Guess if the NFT can sale tihs Price (Logistic Regression)

- Dataset
- sales data (the sold NFT data, mark as label 1)
- 6060 rows of NFT sales data
- Features
- accessories_score
- background_score
- body_score
- brows_score
- clothes_score
- eyes_score
- hats_and_hair_score
- wings_score
- Price (sold price)

- Time frame: December 1, 2022 ~ April 30, 2023

- Listing data (the NFT list but no one buy it over a week, mark as label 0)
- 546 rows of NFT listing data
- Features
- accessories_score
- background_score
- body_score
- brows_score
- clothes_score
- eyes_score
- hats_and_hair_score
- wings_score
- Price (listing price)

- Time frame: December 1, 2022 ~ April 30, 2023

- sales data (the sold NFT data, mark as label 1)
- Iterations: 100
- Learning rate: 0.01
- Training data: 80%
- testin data: 20%
- lost function
`def sigmoid(z): return 1 / (1 + np.exp(-z)) def compute_cost(X, y, theta): z = np.dot(X, theta) p = sigmoid(z) return -np.sum(y*np.log(p) + (1-y)*np.log(1-p))`

- Gradient Descent function
`def gradient_descent(X, y, theta, alpha, iterations): cost_history = np.zeros(iterations) weight_history = np.empty((iterations,), dtype=np.ndarray) for i in range(iterations): predictions = X.dot(theta) errors = np.subtract(predictions, y) sum_delta = (alpha / X.shape[0]) * X.transpose().dot(errors); # Final Value theta = theta - sum_delta; cost_history[i] = compute_cost(X, y, theta) weight_history[i] = theta return theta, cost_history, predictions, weight_history`

- Result

- Acurracy: 68%

## Reference

[1]Mekacher, A., Bracci, A., Nadini, M. et al. Heterogeneous rarity patterns drive price dynamics in NFT collections. Sci Rep 12, 13890 (2022). Heterogeneous rarity patterns drive price dynamics in NFT collections | Scientific Reports

[2]Nadini, M., Alessandretti, L., Di Giacinto, F. et al. Mapping the NFT revolution: market trends, trade networks, and visual features. Sci Rep 11, 20902 (2021). Mapping the NFT revolution: market trends, trade networks, and visual features | Scientific Reports

[3] Analytical solutions to the dynamic pricing problem for time-normalized revenue Michael Nawar Ibrahim, Amir F. Atiya. Redirecting

[4] Emurgo (2019). cardano-serialization-lib, GitHub - Emurgo/cardano-serialization-lib: This is a library, written in Rust, for serialization & deserialization of data structures used in Cardano's Haskell implementation of Alonzo along with useful utility functions.

[5] Input Output HK (2019). cardano-node, GitHub - input-output-hk/cardano-node: The core component that is used to participate in a Cardano decentralised blockchain.