My favorite aspect of sports analytics is player evaluation for drafting, as opposed to in-game strategy, player evaluation for free agency, the business analytics of sports, or anything else related. Being able to draft consistently good players, hitting on stars and passing on busts, differentiates the best and worst General Managers and determines the future of franchises. While I probably wouldn’t advise any General Manager to follow my current advice on drafting – I don’t know enough about traditional scouting or what to look for in a draft prospect in any sport really – I do enjoy draft analyses, and think if I took the time to learn scouting from a coaches or scout’s perspective, and include that knowledge into these analytics projects, that I could add some help in a draft room. I am largely an NBA and MLB fan when it comes to analytics, although this article focuses more on an NHL draft project, the analyses we used for the project, what worked and didn’t work, and how or if the analyses could be improved upon. After this, I should also start diversifying my sports projects, and probably not do another draft analysis for some time.
Having the opportunity to consult for an NHL team for this project, our task was – “using current and historical data from the main WHL, OHL, and QMJHL leagues, compared with pre-draft rankings, project any under-valued or over-valued major junior players eligible for the 2016 NHL draft.” We expanded the scope to include the USHL and NCAA leagues as well, essentially looking at the top 5 pre-NHL North American hockey leagues for draft talent. For projecting under- or over-valued players, we created our own sets of projections and compared them against the pre-draft rankings created by Central Scouting for North American skaters, which ranks the top 210 North American skating prospects before the draft each season. Which players were over- and under-ranked in these Central Scouting rankings? Addressing the project question then involved two tasks: (1) given a player’s Central Scouting draft ranking, we should first estimate where that player would be drafted, as well as the value an average player drafted in that spot typically generates, and (2) for the draft that just occurred in June, estimate each player’s NHL value and compare that estimate with the draft-expected value from (1). When referring to value, we will be looking at both GVT (Goals Versus Threshold), as well as the likelihood that a drafted player makes the NHL (plays more than 10 career games in the NHL). Goals Versus Threshold is a statistic invented by Tom Awad that represents how many goals a player’s performance is worth above replacement level, which we use as a catch-all statistic in this analysis to assess an NHL player’s value, which is a bit of a stretch but nonetheless has been done (we relate GVT as similar to WS in the NBA, or WAR in MLB, even though they are not the same).
Given a player’s Central Scouting draft ranking, where in the draft do we expect that player to be drafted? To start, we acknowledge that a player ranked as the 30th best North American skater by Central Scouting is not projected to be drafted 30th overall for the simple fact that there are also North American goalies, European skaters, and European goalies that get drafted as well. Since the focus of our project was finding good value draft picks amongst North American skaters only, our first task was to map players’ North American rankings to their expected draft slots. As an example, if 40% of players drafted each year are North American skaters, we could simply multiple a player’s Central Scouting draft ranking by 2.5 to get a decent estimate of each player’s draft spot. Instead, we chose to fit a regression, specifically fitting each player’s ranking to an aggregate of mock draft results that were performed prior to the draft. The result is shown in the graph below. Since each player in the top 60 of the North American Central Scouting draft ranking was projected to be drafted in the mock drafts we looked at, but several players outside the top 60 were not expected to be drafted in all of the mock drafts, we included only these 60 players as the points for the regression, and solving for a line of best fit between their ranking and average mock draft spot gave us a decent estimate of where players would be drafted.
We interpret the best fit equation with an example: a player ranked 30th in North American central scouting is expected to be drafted near the (1.33273 * 30 + 3.6017 = 43.58) 44th pick. We use this equation moving forward.
Next, what is the value an average player drafted in any given spot typically generates? This is an easier question to answer – aggregating all players in our dataset (1997 – 2015) by draft spot, then averaging their career NHL GVTs and calculating the percentage that played >10 games in the NHL at each draft spot, and solving for a best-fit line provides a simple approach for estimating the average value generated at each draft slot. The two graphs below show the summary of this:
Top 5 draft picks are very likely to make the NHL, whereas a player drafted at the end of the 1st round has close to a 50% chance of making the NHL and a player drafted at the end of the 7th round has close to a 20% chance of making the NHL. Similarly for GVT, top 2 picks on average have generated 70-75 GVT over their careers, while players in later rounds are mostly clustered between 0-10. Both of these graphs follow a fairly predictable pattern similar to the average draft performances by draft spot in other professional leagues.
Next, to assess value, we created our own set of rankings for all draft prospects using 2 different approaches: (1) using current and former NHL players that played in these junior hockey leagues between 1997 – 2015, fit a ridge regression of their junior hockey stats to their (a) NHL GVT and (b) an indicator if they played 10 NHL games, and use the best-fit equation to project draft prospects, and (2) find comparable players based on junior hockey statistics using a K-nearest neighbors approach, and use the comparable players’ NHL performance to project draft prospects. We will focus on (2), the K-nearest neighbors approach, as it is the more interesting approach and something we have not previously discussed, whereas regression analyses of college stats tend to be done more often and are highly limiting.
The intuition behind using a K-nearest neighbors approach is that players with similar junior data should perform similarly in the NHL, so finding the most comparable historical junior hockey players for the current draft prospects, and looking at those comparables’ NHL performances, could serve as a good proxy for the current draft prospect’s expected NHL performance. We defined a similar player as a player that played in the same junior hockey league, played the same position (classified either as a forward or defenseman), and then assessed closeness in comparability in height, weight, age, goals, assists, and plus-minus. Setting K = 10, we found for each draft prospect the 10 most comparable players according to this criteria. As an example, we show the results for Pierre-Luc Dubois, the #1 ranked North American skater by Central Scouting:
To reiterate, we found the 10 most comparable players by the latter six statistics, with playing in the same league at the same position a requirement for being a comparable player. We assess closeness in comparability to the other six statistics based on a player’s number of standard deviations away from the mean for each category (for example, Pierre-Luc Dubois was 1.59 standard deviations above the mean for goals scored, so he would be comparable to other players that were 1.59 standard deviations above the mean for goals scored in their junior hockey season). The K-nearest neighbors algorithm is what solves for these 10 most similar players, by minimizing the differences between the statistics. Once the comparables are found, to get a player’s projected GVT, we simply took an average of the NHL GVTs of the 10 comparable players, and the same follows for estimating a player’s chances of making the NHL by calculating the percentage of comparable players that made it into the NHL themselves.
The graph below shows the projected NHL GVTs for all draft prospects in Central Scouting expected to be drafted, using this comparable players approach. It is important to note that, whereas the dots on this graph represent draft prospects for the current draft, the line of best fit actually shows the historical average GVTs by players drafted at each position (the line of best fit from the graph above). By comparing a draft prospect’s expected GVT with their expected draft position as well as the historical GVTs from those draft positions, we can finally see which players we believe are over- and under-valued relative to their ranking. As a reminder, we needed to use the equation above from the very first graph to estimate players’ draft positions from their Central Scouting rankings.
We interpret the best fit equation with another example: the player ranked 30th in North American central scouting that is expected to be drafted near the 44th pick is then estimated to have a career GVT of (-8.382 * ln(44) + 42.773) 11.05.
While we have highlighted several of the players who are projected to outperform their expected draft positions, it is interesting to note that the majority of the current draft prospects are projected to underperform the historical line of best fit with this analysis. This is more likely the case of the K-nearest neighbors comparables approach simply having a bias towards underestimating players more so than it is due to a weak draft class. Honestly, I have no idea at all if this is a strong or weak draft class.
To recap, there was much about this project that we did not include in the write-up above, but wanted to mention before closing. First, we made several adjustments to the data, to account for a player’s age (a younger player with the same statistics is better than an older player with the same statistics), the league he played in (it is more difficult to play in the NCAA than the USHL), and the year he played (since scoring rates change year by year). We probably spent close to 50% of our time on this project with data cleaning, manipulation and adjustments. As mentioned above, we also used additional regressions to construct draft rankings and predict the likelihood that a player plays >10 games in the NHL, although we focused above to be on the comparable analysis for these outputs rather than the regression analysis. Lastly, attached below is one last bonus graph, showing the percentage of players drafted in each round by each league. It appears NCAA players either make safe late-round picks, or the league has more depth and good NCAA players are still available late in the draft.