Projects since graduating
Project - Marvel DEI
Marvel - DEI - Phase 5 - Study Aug 2024
Main Hypothesis (working theory)
Marvel is the top-grossing franchise in the world ever! With over $12B domestically and $30B worldwide. Between 2008 - 2027 (19 years, released 45 movies). Marvel has released 30 more movies than the running-up franchise, Star Wars, with 15 films, $5B domestically and $10B worldwide between 50 years 1977-2027.
Recently, there has been much talk about DEI, Disney's leading their invisible hand in the Marvel Universe, Marvel's downfall, their recent cash grabs, and the newest film, Deadpool and Wolverine, which was the first successful $1 billion film since Endgame in Phase 3.
About:
I want to run statistical analytics on all of Marvel's movies and TV shows before and after Disney bought out Fox to see if there is any correlation between certain movies, characters, directors, actors/actresses, time of the year, critic scores, rotten tomatoes, and personal reviews from IGN and Google in order to judge and project what the film would make a good opening box office weekend given certain parameters.
How we can statistically analyze Marvel Movies: A Complex Challenge
First, we should define "bad".
1. Box office performance (total gross rev, opening weekend gross, budget, ROI, tickets sold)
2. Critical reception (Rotten Tomatoes, Metacritic)
3. Audience ratings (Analysis of social media platforms to gauge audience opinion, age, gender, location, and other relevant demographics of the audience, IMDb, Letterboxd)
4. Movie metadata (release date, runtime, genre, director, cast main, production comp, distro company)
4. Specific metrics (plot complexity, character development, action sequences, humor, awards won, nominations, merch sales_
Of course, each of these criteria will require different data collection and analysis methods.
Data Collected
Once we define "bad," I will gather relevant data, which might include:
Box office figures: total gross, opening weekend, budget, ROI
Critical reviews: scores from Rotten, Metacritic, and others
Audience ratings: IMDb, Letterboxd, and other user-generated rating systems.
Social media sentiment: analyzing tweets, Instagram posts, Facebook comments
Movie metadata: Genre, release date, release order, director, cast, budget, runtime. etc.
Statistical Analysis
Depending on the data I collect, I will employ various statistical methods:
Descriptive stats: Mean, median, mode, standard deviation for box office, ratings, etc.
Correlation analysis: examine relationships between variables (e.g., budget vs. box office, critic score vs. audience scores, etc.)
Hypothesis testing: test specific assumptions about the data (e.g., is there a significant difference in box office performance between Phase 3 and Phase 4)
Regression analysis: Model the relationship between dependent and independent variables (e.g., predict box office based on budget, genre, and critic score)
Sentiment analysis: Analyze social media data to gauge overall sentiment towards specific movies or franchises.
Additional Considerations
Subjectivity: Film criticism is inherently subjective. Quantitative data can provide insights, but it won't capture the nuances of artistic merit.
Outliers: Blockbuster movies with exceptionally high or low performances can skew results. Consider using robust statistical methods to handle outliers.
Data quality: Ensure the accuracy and reliability of my data sources
Contextual factors: consider factors like competition, economic conditions, and cultural trends that might influence movie performance.
Tools and Software
Excel: Basic data analysis and visualization for smaller datasets
Specialized movie databases: Box Office Mojo, Rotten, IMDb
R-studio: statistical modeling and analysis
Social media APIs: for collecting and analyzing social media data.
SAS: To run tree analysis, correlation, and other statistical heavy modeling such as decision trees and classification.
PowerBi: Sentiment analysis, trend analysis, regression, clustering, data transformation
Tableau: Data visualization, time series, calculated fields, table calculations, etc.,
Conclusion testing
Core question: Has DEI impacted Marvel's box office performance?
Direct impact: Does including diverse characters or storylines correlate with higher or lower box office revenue?
Indirect: Have changes in audience demographics or societal attitudes towards DEI influenced overall box office trends for Marvel films?
Other potential conclusions
Identifying box office success factors: Which factors (budget, genre, director, release date, etc.) correlate most strongly with box office success?
Predictive modeling: Based on available data, can I build a model to predict a movie's box office performance?
Phase analysis: How does box office performance compare across different Marvel Phases? And is there any indication of which phase saw the most decline?
Audience segmentation: Are there specific audience segments that drive box office success for particular types of Marvel films?
Word to the reader
Please remember: correlation does not equal causation. While I can identify relationships between variables, it's essential to consider other factors that might influence box office performance.