For Fight Fans

I am hoping to use data analytics to answer the following questions:

  • Who are the most and least favored fighters? Which fighters do the odds tend to overvalue or undervalue?
  • To what extent can fight outcomes be explained by the odds?
  • Does fighter reach matter? If so, how much does it matter?
  • Do other fight stats (strikes landed per minute, etc.) matter?
  • How well can we predict fight outcomes by putting relevant factors (fight stats, odds, etc.) into a statistical model?

I have also started publishing short YouTube videos to breakdown and communciate some of the more technical information found on this webpage. You are a fight fan - you don’t care about fancy stats… What the hell does all of this mean!?


For Nerds

I am hoping to conduct some of the following analyses:

  • Using stan_glm in R, implementing Bayesian logistic regression to predict fight outcomes (probability that favorite wins) using the following predictors: (i) the best available odds for the favorite and (ii) the difference in reach between the two fighters.
  • The above analysis will be run on the entire dataset, with events dating back to 2013. Using a smaller dataset (dating back to 2020), I will want to run a similar analysis using a larger number of predictors (including striking accuracy, takedown defence rate, etc.).
  • Further to the above two points, I will use prior knowledge, standard model-selection criteria, cross validation procedures, and posterior predictive checks to contruct a model with good predictive accuracy.
  • In the future, I may consider adding more complicated predictors (e.g. fan hype based on twitter engagement) and/or contructing more complicated models (e.g. hierachical structure) to improve predictions and answer additional questions.