Tech Hockey Guide has access to InStat, a service that the CCHA has partnered with mainly for the benefit of coaching and scouting purposes. However, as media, we have been granted access to (read: paid a bunch of money for) an account to help break down Michigan Tech, CCHA, and other NCAA team and individual statistics. This service is going to be greatly useful to our publication in a wide variety of ways, and we’ve already seen it in a limited capacity in Jonathan Zamaites’s Series Previews. In order to gain a better understanding of what will be talked about in our analytics breakdowns, we found an advanced statistics primer to be necessary. So kick back, relax, and enjoy an introduction to advanced hockey statistics.
Why should I care about any of this?
Hold on, this seems like a pretty philosophical question for an article about analytics. Jokes aside, there are three main reasons I hear people give for not wanting to use hockey analytics, so I’m going to address those here and hopefully give some reasons why you want to use advanced stats before I explain how some of the most common advanced stats work.
Why should I bother with these extra stats when all that matters is the goals each team scored?
We typically look at statistics to judge past performance and predict future performance. In the end when looking at one game, all that matters is if the puck went in or not. But if you are trying to predict future performance, it matters a lot more how the puck went in. Hockey is a pretty random sport, there’s a reason you hear the term “puck luck” thrown around in post-game interviews constantly.
One lucky bounce can be the difference between a team winning or losing a single game. So if we are going to try to predict future performance, you want to have a big sample size to let that random luck aspect level out. Goals happen pretty infrequently though, which makes it hard to get a large enough sample size. Luckily, there are other things we track in a hockey game that happen much more frequently, shots. If you only count shots on goal, that gives you roughly ten times more data points to help account for the randomness.
Using shots, especially if you include shot location, is a significantly better predictor of how many goals will be scored in the future. Using shot metrics can help analyze a game that has already happened. We already know the outcome, but looking at shot metrics can help us determine if the outcome was something we expect to be repeated or if a team got lucky.
But hockey is too complicated to boil down to one number, I can analyze a game way better by just watching it and paying attention
Even assuming you can’t learn anything from the stats, you can only watch so many hockey games. It’s hard to get a handle on how good a player or set of players is simply by the eye test. There’s only so much time in the day, a big benefit of hockey analytics is being able to take a huge amount of information and condense it into something you can quickly reference. Even if the end goal is for you to make your assessment of a player by watching them play, analytics help give you a starting point for which players you should be giving your attention to.
It seems complicated and I never liked math anyways
While there are some aspects of this that use a lot of math and get kind of complicated, the most common advanced stats are no more complicated than batting average or ERA in baseball. Corsi, Fenwick, and xG are all simple conceptually and are really all you should pay attention to unless you want this to be your new hobby. Sure, you can get really into it and do Ph.D. level data analysis, but there really are diminishing returns and it’s only worth paying attention to the fancy math if that interests you.
What is Corsi?
Ah yes Corsi, the gateway drug of hockey statistics. Most of you will probably have at least heard of this stat, but what does it actually mean? Well, unlike goals and assists, the name does little to help explain the stat itself. Corsi is simply counting every time each team attempts to shoot the puck. Shot on goal? Add one to that team’s Corsi, Missed shot? Yup, add it to the Corsi. A shot that results in a goal? They shot the puck didn’t they? Plus one to their Corsi. Blocked Shot? You must know where this is going by now, add one to the team’s Corsi.
Sure I understand that, but why would we even want that information?
Sample size, sample size, sample size. That’s the name of the game when it comes to predictive analysis in hockey. There is so much randomness in the game of hockey, so much luck in any specific bounce. Because of that, anything with a small sample size runs the risk of being greatly skewed by puck luck. If you have more data, it will be less impacted by lucky bounces. So while what we might want to know is how many goals a team will score in the future, it’s on average actually more accurate to make those predictions with past shot totals than past goal totals.
Okay, but what’s with the name? Corsi doesn’t sound anything like a hockey term.
Well that’s a bit of a complicated story, Bob McKenzie has an excellent article on TSN if you want all the backstory. But long story short, it’s named after former Buffalo Sabers goalie coach Jim Corsi, who started tracking shots, blocked shots and missed shots to measure his goalie’s workload.
What is Fenwick?
So are you going to tell me this one is named after an old hockey coach too? Because it sounds more like it’s the name of a Lord of the Rings character.
It’s actually named after Matt Fenwick, a blogger who saw the analysis being done with Corsi and believed that blocked shots shouldn’t be counted towards the total. Fenwick can be a bit more accurate at predicting future success, but since it is not counting blocked shots it does have a smaller sample size and therefore takes more games of data before the randomness will level out. Typically Corsi ends up being used more often because it gets similar accuracy in predicting future results while needing fewer games to account for randomness.
What is xG?
xG stands for expected goals, and it is an attempt to quantify the value of specific shots. While Corsi and Fenwick can be useful stats, they do have one big weakness: every single shot is valued the same. If you clear the puck from your own zone and it goes on net or if you dump in a puck from the neutral zone, those will be counted the same as a shot from the slot five feet in front of the net. Obviously, those different shots have different chances of being a goal.
The objective of xG is to assign each individual shot a value between 0-1 based on how likely it is to become a goal. That number can be thought of as a percentage chance that the specific shot has of being a goal. No shot will ever have an xG of 1, because it is attempting to predict based on the information we have when the shot was released what the percentage chance of it being a goal. Even an empty net doesn’t guarantee a goal, just ask Patrik Stefan.
Despite that clip being a hilarious classic, it’s worth noting that most xG models only account for shots/goals at 5v5 with goalies in both nets.
There are a bunch of things that can be factored into the calculations for xG. One of the most confusing parts about xG is that there is no one formula used to calculate xG. Anyone can come up with any formula they want and claim that is xG. With that being said, there are a few things any xG model worth its salt will factor in. Shot location is by far the most important thing to include. Time since the last pass, time since the last shot, shot speed, if the shot was a deflection, and score effects are also included in almost every model.
What are Score Effects?
Score Effects is an attempt to account for the effect of the state of the game on the shots a team will take and their likelihood of being a goal. This effect is most evident later in the game when one team leads by two goals or more. In general, the effect is that a team that is trailing will take more shots and be more likely to score a goal than the team leading. The most common theory as to why this happens is because when a team is leading late in the game they tend to play more defensively. The idea is the leading team will play in a way that will reduce both teams chances of scoring a goal. They might end up giving up more shots to the opposing team, but they ensure that they won’t give up a breakaway and try to keep the shots to the outside.
That’s the idea at least, if it’s actually a good strategy is debatable. The important thing for big picture statistics is that we know, in general, teams trailing take more shots than normal and in general they are less likely to be a goal.
There are several options for what to do with this information. The easiest way is to simply throw out any past data in the 3rd period with a score differential of 2 or more. You can also weigh shots differently based on the time remaining and score differential, so a shot taken during a tie game in the first would be a bit more strongly weighted than one where the team was trailing by two with a few minutes left in the 3rd.
So it’s just some math that tries to account for garbage time?
Yeah, I probably should have led with that. My bad.
Why measure stats per 60?
First, I should explain what I mean by “per 60” stats. This is counting a specific statistic over 60 minutes of an individual’s ice time. The most common stat is xG/60 and that means how many xG that player has generated per 60 minutes of time on ice. This is useful for a couple of reasons. One is it lets you compare players who have played different amounts of games more effectively. For example, in the 2018-19 NHL season Auston Matthews missed 14 games and finished 18th in the league in goals and 12th in xG that year. If you just looked at those numbers, you’d miss out on the fact he was one of the most efficient scorers that season at 6th in goals/60 and 4th in xG/60.
The other thing that they are useful for is trying to find which players are getting too much or too little ice time. The danger of this type of stats is finding outliers and saying “Wow, this 4th line player is putting up incredible numbers in limited minutes clearly he’d be leading the league in scoring if we just played him 1st line minutes”. What these don’t account for is quality of competition or quality of teammates. Sure, that 3rd pair defenseman is really effective playing 15 minutes a night against the opponents 3rd and 4th line, but if you send him out against the 1st line he might get run over. Sometimes you can find a diamond in the rough, but a lot of the time you will find players who are very good role players. If you want to go down that road, you need to start accounting for the quality of competition/teammates which is beyond the scope of this article.
To wrap things up, if I had to summarize analytics as briefly as possible I would simply show someone this tweet.
Anytime a number makes me angry thats analytics— PFT Commenter (@PFTCommenter) October 11, 2022
Hopefully this helped to explain some of the basics and where they can be useful. This has been a pretty high level look at things conceptually, but we have plans to get into more specifics in future articles. InStat has tons of data to pull from so there are a lot of exciting things we’ll be able to dive into. I’m very excited about it, and hopefully now you are too. If you have any questions either about what I discussed in this article, or about other topics in hockey analytics please let me know and I’ll do my best to answer them.
Here are a few links to some articles on different xG models that go pretty deep into the math if you feel like nerding out for a bit.
This is a two part breakdown of hockey analytics that was published a while back when these stats were just starting to come into relevance in the NHL.