crypto news

How gender and race labels are applied to NFT data analysis

One summary and introduction

2 Methodology

3 results

3.1 Cryptobank

3.2 Total NFT Market

4 Discussion

5. Conclusion/Acknowledgments/References

extension

A.1 Implementation details

A.2 Detailed NFT information and A.3 Map of NFT searches on Google

2. Methodology

We describe methods for analyzing gender and racial biases in the prices of NFTs. We first summarize our data collection process (Sections 2.1 and 2.2) and then describe how we statistically identify gender and racial biases between different NFTs (Section 2.3). The steps are shown in Figure 2. A more detailed description of the methods and implementation can be found in the Appendix.

Figure 2: Experimental flow chart.Figure 2: Experimental flow chart.

2.1 Collect primary data

Our dataset consists of NFTs transacted on OpenSea, the main marketplace for NFTs on Ethereum. We are querying the OpenSea endpoint “v1/collections” at the end of November 2022 [15] To retrieve the NFT metadata and last sale price. We select 790 sets from the Kaggle Ethereum NFTs dataset [11] and NFTs from the highest OpenSea 30-day volume leaderboard and all-time around November 2021. After data collection, we ended up with ∼2.5 NFTs, each transacted.

Table 1: Summary of data set collectionTable 1: Summary of data set collection

2.2 Retrieval of race and gender labels

Many NFT collections do not represent humans and therefore cannot be studied directly through the lens of race and gender. We select groups with metadata containing the words “male” and “female” and end up with a total of 44 such groups with gender labels representing different avatars. Statistics on these 44 groups with gender classifications can be found in Table 6 in Appendix A.2.

To our knowledge, this is the first NFT dataset to contain gender labels across multiple groups. However, race labels are often not present in the metadata, so we limit ourselves to only CryptoPunks, Avastar, and Dynamic Duelers for decks with race labels.

2.3 Statistical tools for analyzing gender and race bias

To determine the statistical significance of the hypothesis that female NFTs sell for less than male NFTs, we conduct one-sided paired and unpaired Student’s t tests [22].

Unpaired versus paired t-test: For an unpaired t-test, we compare the average of all NFT sale prices for males versus females. For the paired t-test, we calculate t-statistics on the paired difference between distinct male and female prices for the average daily price and average weekly price of the NFT. A paired t test is used to isolate the price difference between male and female by holding the price variation across time constant.

Log conversion: Since the t-test assumes that the data are normal, we apply log transformation to address this. Since rare NFTs are worth much more than common NFTs, NFT price distributions tend to follow a power law distribution [14]. Inspired by how stock prices follow a lognormal distribution [1]we applied the same transformation and found that the log-price distribution is more normal. We refer to performing a t test on the log of prices as the log t test.

External trimming: Outliers may occur due to very high selling prices for rare NFTs or very low selling prices due to human errors during listing. We handle outliers using Winsorization [19]or reduce outliers after a certain percentage. We report results for the 0.1%, 1%, 2.5%, and 5% percentiles.

The approach described above is also used to compare the prices of light and dark CryptoPunks. We report results from combinations of different t tests and outlier detection methods to show that our conclusion holds no matter how we perform the statistical significance test. For the numbers and statistics reported in this paper, unless otherwise stated, we remove outliers at the 2.5% percentile.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker