Sources

I collected my data from the SEC’s 2024 Q1 Metrics by Individual Security dataset, which provides day-to-day metrics for over 4,800 US securities from January 1st 2024 to March 28th 2024. I used this site to assist in interpreting logarithmic transformations, and used Chat GPT to help with cleaning and plot creation.

Data Cleaning

In the market capitalization rank and volatility rank columns, the SEC assigns each security a ranking between 1 - 10, with 1 being the lowest outcome and 10 being the highest. For readability purposes, I sorted each rank into three categories: low, middle, and high. I further cleaned this dataset by renaming columns to in an R-friendly format, filtering out other securities, and using the sample function to randomly select 10,000 out of 85,516 rows to use in my analysis.