Actuarially (Matt Malin)

  • Random
  • Archive
  • RSS
  • Ask me anything

R-bloggers

Just a random post to suggest to all to use R-bloggers: http://www.r-bloggers.com as a useful source of R related news and tips, etc, a site aggregating many of the main R related blogs on the web.

    • #actuarially
  • 9 months ago
  • Permalink
  • Share
    Tweet

2012 Olympics Swimming - 100m Butterfly Men Finals prediction

2012 Olympics Swimming - 100m Butterfly Men Finals prediction

Author: Matt Malin

Inspired by mages’ blog with predictions for 100m running times, I’ve decided to perform some basic modelling (loess and linear modelling) on previous Olympic results for the 100m Butterfly Men’s medal winning results.

Code setup

library(XML)
library(ggplot2)

swimming_path <- "http://www.databasesports.com/olympics/sport/sportevent.htm?sp=SWI&enum=200"

swimming_data <- readHTMLTable(
  readLines(swimming_path), 
  which = 3, 
  stringsAsFactors = FALSE)

# due to some potential errors in passing header = TRUE:
names(swimming_data) <- swimming_data[1, ]
swimming_data <- swimming_data[-1, ]

swimming_data[["Result"]] <- as.numeric(swimming_data[["Result"]])
swimming_data[["Year"]]   <- as.numeric(swimming_data[["Year"]])
swimming_data             <- na.omit(swimming_data)

loess_prediction <- function(
  medal_type = "GOLD", 
  prediction_year = 2012) 
{
  medal_type <- toupper(medal_type)
 
 swimming_loess <- loess(
    Result ~ Year, 
    subset(swimming_data, Medal == medal_type),
    control = loess.control(surface = "direct"))
  
  swimming_prediction <- predict(
    swimming_loess, 
    data.frame(Year = prediction_year), 
    se = FALSE)

  return(swimming_prediction)
}

log_lm_prediction <- function(
  medal_type = "GOLD", 
  prediction_year = 2012) 
{
  medal_type <- toupper(medal_type)
  swimming_log_lm <- lm(
    log(Result) ~ Year, 
    subset(swimming_data, Medal == medal_type))
  
  swimming_prediction <- exp(predict(
    swimming_log_lm, 
    data.frame(Year = prediction_year), 
    se = FALSE))

  return(swimming_prediction)
}

swimming_data <- rbind(
  data.frame(
    swimming_data[c("Year", "Medal", "Result")], 
    type = "actual"),
  data.frame(
    Year = rep(2012, 3),
    Medal = c("GOLD", "SILVER", "BRONZE"),
    Result = c(
      loess_prediction("gold"), 
      loess_prediction("silver"),
      loess_prediction("bronze")),
    type = rep("loess_prediction", 3)))

medal_colours <- c(
  GOLD   = rgb(201, 137, 16, maxColorValue = 255),
  SILVER = rgb(168, 168, 168, maxColorValue = 255),
  BRONZE = rgb(150, 90, 56, maxColorValue = 255))
        
swimming_plot <- ggplot(
  swimming_data,
  aes(
    x = Year, 
    y = Result, 
    colour = Medal, 
    group = Medal)) + 
  scale_x_continuous(limits = c(1968, 2012)) +
  geom_point() + 
  stat_smooth(
    aes(fill = Medal), 
    alpha = 0.25, 
    data = subset(swimming_data, type = "actual"), 
    fullrange = FALSE, 
    method = loess)
    
swimming_plot <- swimming_plot + 
  scale_fill_manual(values = medal_colours) + 
  scale_colour_manual(values = medal_colours) + theme_bw()

Predictions

I now use the functions loess_prediction and log_lm_prediction to estimate the times for the medal winning times.

Loess predictions

The gold prediction for 2012 is 49.7 seconds, for silver is 49.5 seconds, and for bronze is 50.2 seconds.

Linear modelling (of log results)

I’ve shown the code here for the calls to the linear modelling approach:

swimming_log_lm_gold   <- log_lm_prediction("gold")
swimming_log_lm_silver <- log_lm_prediction("silver")
swimming_log_lm_bronze <- log_lm_prediction("bronze")

This gives the following times as predictions:

swimming_log_lm_gold
##     1 
## 50.23 
swimming_log_lm_silver
##     1 
## 50.09 
swimming_log_lm_bronze
##     1 
## 50.46 

Loess prediction plot

The following is a plot of actual and predicted times, along with loess error setting as defaults from geom_smooth:

plot of chunk plot

Notes

Note that because of the small difference between the silver and gold medal results at the 2008 olympics, the trend of improvement in silver exceeds that in the gold, so the prediction is that the silver time will be faster than the gold!

Also note that this takes into account no information about performance of athletes involved or changes in rules, such as being unable to use the swimsuits that were present in the last Olympics and largely attributed to improving performance, purely modelling from a few data points as an interesting exercise!

Final Summary

To summarise, the final predicted results using these methods are:

library(pander)
predictions <- data.frame(
  Medal = c("Gold", "Silver", "Bronze"),
  Loess_prediction = c(
    loess_prediction("gold"),
    loess_prediction("silver"),
    loess_prediction("bronze")),
  Log_Linear_prediction = c(
    log_lm_prediction("gold"),
    log_lm_prediction("silver"),
    log_lm_prediction("bronze")))
pandoc.table(predictions)
Medal Loess_prediction Log_Linear_prediction
Gold 49.69 50.23
Silver 49.52 50.09
Bronze 50.20 50.46

Obviously the predictions here are very crudely performed, especially given that it produces a faster time for a silver medal than for gold, but it’ll still be interesting to see what actually happens, and if it’ll be Michael Phelps yet again!

    • #olympics
    • #actuarially
    • #Matt Malin
    • #R
    • #predictions
    • #swimming
    • #100m butterfly
  • 9 months ago
  • Permalink
  • Share
    Tweet

Logo

About

Random actuarial/R related posts, updated very infrequently.
<email at mattmalin.co.uk>

Google

Me, Elsewhere

  • Facebook Profile
  • Google
  • mattmalin on github
  • RSS
  • Random
  • Archive
  • Ask me anything
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr