There are million hockey fans out there. They are very passionate about the game and their own team. Trust me, I am one of them.
Do you think it is important to better understand what plays on the ice influence a mood and emotions of a hockey fan? I do! And I believe it is useful and exciting topic to study!
In this article, I aim to explain my work in which I was collecting data (Fans Ratings) on Canucks players in the 2014/2015 NHL regular season games and developing a model. With the model, I shall be able to answer the question: What plays and events on the ice influence ratings of fans on players?
The model is the multinomial logit model with an ordinal dependent variable (Rating of fans on a scale from 1 to 5 where 1 means a very bad performance and 5 means an excellent one). The independent variables are data downloaded from the NHL.com and war-on-ice.com. The final model specification included 18 independent variables and the summary of the output is showed here:
The model interpretation
What those independent variables and its estimated values mean?
G = goals, has the positive sign meaning that every goal scored by the player increases the rating.
A = assists, with the positive sign here indicates that every assist recorded by the player increases his rating.
GF = team goals for in the game, has the positive sign that says with a goal Canucks scores the rating for each player goes up.
GA = team goals against in the game, with the negative sign means that with each goal against the Canucks, rating of a player goes down (magnitudes of coefficient can´t be simply compared to each other!).
SOG = individual shots on goal, with the positive sign means with every individual shot, the rating for the player goes up.
Fenwick60 = fenwick per 60 minutes, has the positive sign meaning value of Fenwick60 correlates with rating of fans.
GVA = giveaways, with the negative sign indicate that for every giveaway there is a drop off in individual rating.
TKA = takeaways, has the positive sign and that means each takeaway increases rating of player (magnitudes of coefficient can´t be simply compared to each other!).
BkS = blocked shots, with the positive sign meaning that every blocked shot by player increases his rating.
TOI_ES = time on ice at even strength, has the positive sign and it means that with every minute spent on the ice, ratings for the player increases.
HIT_ES = hits at even strength, with the positive sign means that with every hit, the rating of the player increases.
HITt_ES = hits taken at even strength, has the positive sign and it means that for every hit taken, the player is rewarded in his rating.
Note: It does not have to mean fans enjoy their own players being hit but this could more correlate with playmaking ability appreciation. Ability to hold on the puck and make the pass just before being hit could be one of the player characteristics that is liked by fans.
PN_ES = penalties at even strength, has the negative sign meaning for every penalty for the player, rating by fans goes down.
PNd_ES = penalties drawn at even strength, with the positive sign means that fans appreciate players who draw penalties and for every penalty drawn there is an increment in rating.
Home = home game for the Canucks, with the positive sign shows that players are in general better rated at home games.
FO = Face-off won percentage, has the positive sign means whoever wins at least one draw can expect better rating from fans and this rating goes up with the better face-off won percentage.
O_PTS = season total of points in the standings by opponent, has the positive sign and it means that player ratings are better if the Canucks play opponent with more points in the standings.
O_Hits = season total of hits in the standings by opponent, with the negative sign meaning that player ratings are worse if the Canucks play opponent that usually hits more.
1. Result matters the most
Not in the model but if W (win) variable is used, every time this is the most significant independent variable with the biggest marginal effect. Marginal effect is the product of the value of coefficient and the mean of absolute values of the variable. It states the significance of each variable in the specific model. In the model, instead of W, GF and GA variables are used and they are both at the Top 3 variables of the biggest marginal effects.
The biggest marginal effect goes to TOI_ES variable and it could be little tricky to explain and interpret. Is it because fans just like more players they see more on the ice or do coaches manage ice time of players according to their play and performace? I can´t tell.
Note: It is very important to state that the magnitude of the marginal effects does not have its general meaning! The magnitute can be interpreted only in the particular model specification. It is possible that by substituting few variables, magnitutes as well as order in the graph could change significantly.
2. Opponent strength is valued by fans
The fourth variable with the biggest marginal effect in the model is the O_PTS one. That means it really matters for fans who the Canucks play against.
For example, if the player would have completely same stats in two games including the result of the game, there will be better rating for him if the first game is played against Chicago and the other against Edmonton (teams picked just for the example).
3. Fans tell goals, primary assists and secondary assists apart
Probably not shocking finding but interesting information if you know how much more one variable is more significant than the other. For the comparison of its marginal effects, simple (not that precise) model was used and the results show that goals are valued by fans more than assists as well as primary assists are more valued than secondary ones. The magnitutes and its comparison within this model is interesting in my eyes.
4. Is Corsi or Fenwick more relevant for fans to change their ratings?
The answer is fenwick! In all models tried, Fenwick was always more significant variable out of these two. That could mean that fans value blocked shot by their player positively for all palyers on the ice. In the world of hockey analytics, Corsi is the one used more when evaluating player, fans would use Fenwick though.
Pros and Cons
- use of revealed preference data (describe the reality)
- unique topic, dataset and model
- new findings (explain better opinion of hockey fans)
- done for skaters and Canucks players only
- tricky to interpret some variables (e.g. time on ice)
What this could be good for?
- better understanding of what fans really value on the ice
- marketing use for an NHL club (to know according what plays and events the player is rated)
- different and another point of view in evaluating players
- player comparison as a potential popular topic after an NHL game
- better understanding of emotions during the game from fans
The presentation used for the Vancouver Hockey Analytics Conference can be viewed here: http://people.stat.sfu.ca/~tim/hockey.html (Recording – Session 4).