Alternative Data Regressor Framework: Draft 1

A framework for linear regression of alternative data against financial asset prices

What is Aternative Data?

Alternative data is defined as non-traditional data that can provide an indication of future performance of a company outside of traditional sources, such as company filings, broker forecasts, and management guidance. This data can be used as part of the pre-trade investment analysis, as well as helping investors monitor the health of a company, industry, or economy.

LSEG

Examples of Alternative Data are: Social Media Sentiment, Web Traffic, Credit Card Transaction data, Satellite Imagery, Car Parking data, Mobile App usage and much more.

What is an "Alternative Data Regressor"?

Not a standard term but rather a phrase that I have essentially cooked up. My goal is to essentially use various tyes of alternative data (the regressor) to find a correlation with market values of financial assets (stocks and stock indices). Thus, the reason I have phrased it as the "Alternative Data Regressor."

Conceptual Framework

We need to develop a scientifc framework to test for correlation and possible causality. The objective is for the framework to be guide for a Python Algorithm that takes in datasets and tests for correlation

Alternative Data Regressor Framework

Collect Data: Gather weekly data on box office sales, interest rates (Prime and Federal Funds Rate), and a tech stock index.

Clean and Organize Data: Prepare and organize the data for analysis
Test for Stationarity: Apply stationarity tests
Adjust Data for Non-Stationarity: If the data is not stationary, adjust it for seasonality, possibly using methods like the moving average or log transformations
Re-test for Stationarity: After transforming the data, test for stationarity again. If the data is now stationary, proceed with the analysis.
Significance Testing: Conduct appropriate statistical tests to check the significance of the relationships between the variables
Develop Baseline Regression Model: Create a baseline regression model to analyze the relationship
Refine the Model: Continuously adjust the model by experimenting with different forms of the control variables.
Evaluate Models: Assess the various models and select the best one based on criteria like the R-squared or Adjusted R-squared value.
Interpret the Regression Line: Use the chosen model to interpret the relationship between the Alternative data and financial market returns.
Comparative Analysis: Compare the effects of using other variables (typically more traditional variables) on the predicting power.

Example of Alternative Data Regressor Framework: Testing the Relation of Sci-Fi Movies Box Office Sales & the Prices of Tech Stocks in the US (Using R) Collect Data: Gather weekly data on box office sales, interest rates (Prime and Federal Funds Rate), and a tech stock index.

Clean and Organize Data: Prepare and organize the data for analysis ==> remove incomplete enteries; remove outliers

Test for Stationarity: Apply stationarity tests ==> Augmented Dickey-Fuller test

Adjust Data for Non-Stationarity: If the data is not stationary, adjust it for seasonality, possibly using methods like the moving average or log transformations ==> Moving Average

# Apply moving average to adjust for seasonality and create lagged variables (1-week lag)
project_data_clean <- project_data %>%
  mutate(
    ma_TOP_10 = rollmean(TOP_10, window_size, align = "right", fill = NA),
    ma_tech_movie = rollmean(tech_movie, window_size, align = "right", fill = NA),
    ma_Tech_index = rollmean(Tech_Index, window_size, align = "right", fill = NA),
    ma_Market_index = rollmean(Market_index, window_size, align = "right", fill = NA),
    ma_TOP_10_lag = lag(ma_TOP_10, 1),
    ma_tech_movie_lag = lag(ma_tech_movie, 1)
  )

Re-test for Stationarity: After transforming the data, test for stationarity again. If the data is now stationary, proceed with the analysis.

Significance Testing: Conduct appropriate statistical tests to check the significance of the relationships between the variables ==> Spearman Correlation Test


# Spearman Correlation Test for Moving Average Adjusted and Lagged Tech Movie and Tech Index
spearman_test_ma_tech_movies <- cor.test(project_data_clean$ma_tech_movie_lag, project_data_clean$ma_Tech_index, method = "spearman")
spearman_test_ma_tech_movies

# Spearman Correlation Test for Moving Average Adjusted and Lagged Total Top 10 Box Office and Market Index
spearman_test_ma_top_10 <- cor.test(project_data_clean$ma_TOP_10_lag, project_data_clean$ma_Market_index, method = "spearman")
spearman_test_ma_top_10

Develop Baseline Regression Model: Create a baseline regression model to analyze the relationship between box office sales (lagged by a week) and stock market performance, including the control variables (interest rates).

Refine the Model: Continuously adjust the model by experimenting with different forms of the control variables.

#1: Classic
model1 <- lm(ma_Market_index ~ ma_TOP_10_lag, data = project_data_clean)
summary(model1)

#2: Including Economic Indicators (control variables)
model2 <- lm(ma_Market_index ~ ma_TOP_10_lag + PRIME + FED, data = project_data_clean)
summary(model2)

#Model 2 is the most accurate!

#3: Including proxy for market premium
model3 <- lm(ma_Market_index ~ ma_TOP_10_lag + PRIME + FED + `PRIME - FED`, data = project_data_clean)
summary(model3)

Evaluate Models: Assess the various models and select the best one based on criteria like the R-squared or Adjusted R-squared value ==> Adjusted R-squared

Interpret the Regression Line: Use the chosen model to interpret the relationship between box office sales and stock market returns.

a. Focus: Apply the model specifically to sci-fi movies to examine their impact on the tech stock index.

Comparative Analysis: Compare the effects of using total box office sales versus sci-fi box office sales in predicting tech stock index changes.

#Model 1: Target - Tech Movies Box Office Sales to Tech Stock Index
model_target <- lm(ma_Tech_index ~ ma_tech_movie_lag + PRIME + FED, data = project_data_clean)
summary(model_target)

#Has the highest accuracy! Sci-fi movies are the best predictor for tech stock prices!

#Model 2: Proxy - Top 10 Box Office to Tech Stock Index
model_proxy <- lm(ma_Tech_index ~ ma_TOP_10_lag + PRIME + FED, data = project_data_clean)
summary(model_proxy)

#Model 3: Indirect - Tech Movies Box Office to Market
model_indirect <- lm(ma_Market_index ~ ma_tech_movie_lag + PRIME + FED, data = project_data_clean)
summary(model_indirect)

Future Works:

Develop a program with Python that takes in datasets as an input and the output is the possible correlations (based on the process outlined above)