This project was written as an assignment for my sports analytics course in Spring 2024.

In the hockey community, there are various superstitions regarding the impact of rankings on playoff performance. Some claim the President’s Trophy brings both bad luck along with its glory, but others disregard superstition, firmly believing in rankings as a predictor of playoff success. Do regular season rankings truly indicate a team’s luck in the playoffs?

During the regular season, teams earn points depending on their performance in every game. If they win, they earn two points, and zero for a loss in regulation. Alternatively, if they lose in overtime, they still earn one point. Then, the team with the most points is ranked first, and so on. The top three teams in each of the four conferences are selected for the playoffs, and then the two teams with the best records in each division are selected as well. Typically, the playoffs end up consisting of the sixteen best-ranked teams in the league, but of course there have been a few oddities over the years. In the playoffs, teams play seven games against their opponent to decide who moves on to the next round. We can first attempt to analyze the effect of regular season rank by creating a linear model to predict playoff rankings.

simple_model <- lm(hd$PORK~hd$RK)
summary(simple_model)
## 
## Call:
## lm(formula = hd$PORK ~ hd$RK)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.0101  -3.9012   0.0072   4.7177  13.1533 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.08447    1.03719   3.938 0.000124 ***
## hd$RK        1.05445    0.05618  18.768  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.328 on 154 degrees of freedom
## Multiple R-squared:  0.6958, Adjusted R-squared:  0.6938 
## F-statistic: 352.2 on 1 and 154 DF,  p-value: < 2.2e-16

Here, we can see that a linear model to predict playoff ranking based only on regular season rank has a relatively high R-squared value for training on one aspect of a team. This means that regular season rank could be a predictor of success, but intuitively, the better teams should be ranked higher, since they should win most of their games in the regular season. Additionally, it is widely known that playoff hockey is played very differently than regular season hockey. Not only do the referees adopt a laissez-faire mentality, but higher stakes come playoff season take a toll on players’ mentalities and performance. Let’s take a look at some trends of the top two playoff teams for the past few years to compare some regular season statistics.

## [1] "Regular Season Rank: mean = 7.4, standard deviation = 5.89161"
## [1] "Goals For / Game: mean = 3.28, standard deviation = 0.262128"
## [1] "Goals Against / Game: mean = 2.805, standard deviation = 0.200458"

Here, we can see that the average regular season rank of a top-2 team in the past five years has been 7.4, with quite a large standard deviation. This implies that despite rankings’ supposed indication of a quality team, highly ranked teams may not necessarily perform well in the postseason. Let’s visualize the typical regular season ranking of the past five Stanley Cup winners.



It can be seen that it is more common that a team ranking lower than first wins the Cup. In fact, only eight teams have won the Presidents’ Trophy(awarded to the first-ranked team) and the Stanley Cup in the same year. Therefore, when simulating playoff games, perhaps rank could be used as a measure of “luck” for a team. To analyze the effect of regular season ranking on performance during each round of the playoffs, I’ll plot the average playoff ranking for each regular season ranking.



Here, we can see teams ranked second, fifth, and twelfth perform better in the playoffs than expected. On the other hand, first and third teams play abnormally worse than their ranks would suggest. Since the rankings themselves do not serve as a full indication of the playoff outcomes, perhaps the games could be simulated using the teams’ average goals for/against per game, with rank as a factor to represent luck. In order to assess the quality of the simulation, I will be using data from the 2022 season from hockey-reference.com. Before simulating games, let’s take a look to determine the distribution of goals in each game for each playoff team.



This plot looks roughly normal, so I will assume that goals scored are distributed normally in each game. This is also common practice in sports such as soccer. Since goals scored for a team is distributed normally, I also assumed that goals scored against a certain team is distributed normally. With these assumptions, a game can be simulated by sampling from normal distributions to determine if a goal was scored by either team in each minute of a game, tallying these goals for each game, and repeating so that we can use the law of large numbers for our approximation of the winner. To determine the distribution of the goals scored for each team to sample from, I found the mean(\(\bar{X}\)) and standard deviation(\(S_X\)) of their goals for and against during the season. This is because these are the unbiased estimators of expected value and variance, which are the parameters for normal distributions: \[\bar{X}_N = \frac{1}{N}\sum_{k=1}^{N}X_k \approx \mu_X\] \[{S_X}^2 = \sqrt{ \frac{1}{N-1}\sum_{k=1}^{N}(X_k-\mu_X)^2 }\approx {\sigma_X}^2\] For example, for the Vancouver Canucks this year, this would be calculated in R as follows:

canucks <- get_all_goals(info24, "VAN")
sprintf("Mean = %g, Standard Deviation = %g", mean(canucks$gf), sd(canucks$gf))
## [1] "Mean = 3.47761, Standard Deviation = 1.94903"

Now, for each team we assume that the goals scored for the team in each team is distributed normally according to the approximated parameters: \[ \mathbb{P}(X\leq x) = \frac{1}{S_X\sqrt{2\pi}} \exp\left( -\frac{1}{2}\left(\frac{x-\bar{X}_N}{S_X}\right)^{\!2}\,\right)\]

Finally, to run the simulation, we calculate these statistics above for the goals for and against each team per minute. Then, we sample from these normal distributions for 60 minutes, determining a winner based on who has scored more goals. Finally, we repeat this 10,000 times to determine our winners from the first round. In 2022, the first round matchups were:

##   Team 1     Team 2
## 1    CGY vs.    DAL
## 2    COL vs.    NSH
## 3    FLA vs.    WSH
## 4    NYR vs.    PIT
## 5    EDM vs.    LAK
## 6    MIN vs.    STL
## 7    TBL vs.    TOR
## 8    BOS vs.    CAR

To simulate luck, I chose to divide 16 minus the team’s rank divided by a factor of 320000. I then added the luck statistic to the average for the distributions of each team. Running the simulation 10,000 times gives us the first-round winners as:

## [1] "CGY"
## [1] "COL"
## [1] "FLA"
## [1] "NYR"
## [1] "EDM"
## [1] "MIN"
## [1] "TOR"
## [1] "CAR"

Now, we can do the matchups as the NHL would and continue to find our second-round winners.

team1 <- c("EDM", "COL", "CAR", "TBL")
team2 <- c("CGY", "MIN", "PIT", "FLA")
rd2 <- data.frame(Home=team1, Visitor=team2)
sim_round(rk_info, game_info, rd2, 1)
## [1] "CGY"
## [1] "COL"
## [1] "CAR"
## [1] "TBL"

We repeat this twice more to find our 2022 champion:

## [1] "Top 2 Teams:"
## [1] "COL"
## [1] "TBL"
## [1] "Stanley Cup Champion:"
## [1] "COL"

Therefore, we would find that our Stanley Cup Champion would be the Colorado Avalanche. This did in fact occur in 2022, so this is a promising outcome. In fact, the model was only incorrect about Calgary and Carolina winning in the second round, and Minnesota and Toronto in the first. However, it could be that the “luck” variable just so happened to accurately describe the 2022 playoffs. How would this play out in 2024? As of April 19, we have the matchups for the playoffs and can simulate them the same way.

## [1] "VAN"
## [1] "CAR"
## [1] "BOS"
## [1] "FLA"
## [1] "NYR"
## [1] "COL"
## [1] "EDM"
## [1] "DAL"

Using these first round winners, we can then use the same method of simulation to end up with our final two players and winner.

## [1] "DAL"
## [1] "VAN"
## [1] "BOS"
## [1] "CAR"
## [1] "Top 2 Teams:"
## [1] "DAL"
## [1] "BOS"
## [1] "Stanley Cup Champion:"
## [1] "DAL"
The simulation above believes that the Dallas Stars will win the Stanley Cup in 2024. This is a somewhat promising result, as many sports writers have selected the Stars as their favorite to win this year. And, of course, they have one of the highest goal differentials in the league. However, of course, this comes with errors. First, many generalizations had to be made to account for luck. Perhaps rank is not the only indication; average Stanley Cup wins of the players could also be a factor. Furthermore, this is a very simple model of a hockey game. Future work could improve upon this model by more accurately describing the distribution of the goals scored, or by factoring in statistics such as amount of takeaways and giveaways for each team. Also, the model should clearly be tested for many more than one year, but with the limited scope of this project I was only hoping to explore the feasibility of this sort of simulation for playoff prediction. All in all, time will tell how accurate this simulation is when applied to 2024 data.