NHL fans around the world are getting more and more familiar with NHL player hockey cards that mix interesting stats. See the following example made by Andy & Rono.
As Andy & Rono are stating on the card in the bottom right corner, making such cards is possible thanks to data provided by multiple sources. When checking cards like the one above and practicing data analysis with different data sets I always asked myself:
Which stat is the most impactful on the game? And which one is complicated to interpret?
I decided to search that area a bit with the 2021/22 5v5 regular season NHL player data. For that I used two different data sets (fantastic Corey Sznajder´s transitional data and data from Natural Stat Trick) and I built a multiple regression model with a following logic:
I did not include any shot attempt data or offensive pass data to remove events that are directly connected to xG model build ups. After checking for correlations among independent variables and optimizing for variable selection these are two models I came up with, one for defenders and one for forwards.
The model for defenders:
The model for defenders consists of only three independent variables (with p value under 0.05). Team pts% (or team strength) overshadows everything else in terms of what creates xG impact of a defender. Also percentual success on zone exits has its own significance and share of denied entries against is the last variable used that remain significant. What more does this tell us?
a) Strength of a player´s team is paramount (highest t value) for his xG control compared to his transitional stats.
b) Still it is a defensive blueline control (both moving the puck forward and denials of opponent´s entries) that impacts xG control for defenders.
c) Also quality (in form of percentages) is more significant than quantity (per 60 rates) when it comes to moving the puck across defensive zone blueline.
d) Lastly, zone entry stats (even though their success correlates positively to xGF%) nor offensive zone start percentage did not prove its significance in the model.
The model for forwards:
There are seven independent variables selected that prove to be significant (p value under 0.05) and that make the best performing model. Let´s summarize the findings in following statements:
a) Unsuprisingly and again, team strength (team pts%) is the most significant variable in the model.
b) Carries/60 are the leading transitional stat for forwards in terms of impact on xG control.
c) Contextual variable of a share of offensive zone starts is another significant variable.
d) Other four variables prove to be siginificant (p value under 0.05) in the model although with lesser impact. Among these four there are Successful exit percentage, Pressures/60, Recoveries/60 and Successful retrieval percentage.
Main takeaways are summarized here:
- Team strength impacts player´s xG control much more than any transitional stat recorded. It is very important to look at team strength adjusted or relative xG values when comparing players on different teams!
- Defenders impact their xG values by managing their own blueline in both direction. Also it is more about the quality of exits and denials than quantity.
- Forwards impact their xG values mostly by carries/60. For them, it is about quantity of their carry-in entries. Other transitional stats impact their xG control, mostly those recorded in the offensive zone.