You’re Next Anthony Martial
At some point late in the EPL season I could tell when Harry Kane scored more than one goal in a game without having to watch him play. That’s because a couplathree people would Tweet into my timeline a shot1 at me with a link to this. It’s the piece I did for FiveThirtyEight at the beginning of this season noting that, of all the players in the EPL, Harry Kane had the largest over-performance between his expected goal total and his actual goal total from the previous season.
I was originally a little miffed by the use of the word ‘Luckiest’ in the title—when you’re freelancing for any publication that’s large enough to, say, have an HR department, you generally don’t get to write your own titles—but really, in this context, it’s probably the most accurate way to think about it. Luck is that part of skill which isn’t reproducible. And if Kane or any striker not named Messi is not going to outperform their expected goals total season over season, then the excess is luck.
So how lucky was Kane? The original post I did at FiveThirtyEight had Kane scoring 19 goals2 on an expected total of 11.33 for the 2014-15 season. That’s an absolute difference of 7.67, or a percentage overperformance of 67.7%.
After boldly predicting that Kane would succumb to the vagaries of mean reversion what happened? He succumbed to the vagaries of mean reversion. Some. He scored 20 on an expected of 16.49; an absolute difference of 3.51, or a percentage overperformance of 21.3%. In the FiveThirtyEight piece, I pegged Kane to score about 15 on 153. So, yes, Kane was better than both what I predicted and what his expected goals total would denote. And yes Kane also experienced some regression4. The projection totals were ballparked in a pretty un-rigorous manner, but part of my guesstimations included Kane actually sitting out a game or five to rest. He didn’t. He started every league game and he played 3576 minutes5, enough to make him 11th in total minutes (with three of the players in front of him being keepers—Schmeichel, Gomes and Fabianski).
It only annoys me because if Kane had rested a few games, my estimations would have been pretty close to nailed on (again, see the third footnote). On the other hand, if Kane had rested a few games, Tottenham mightn’t have been in the race for the title as late in the season as they were. On the other other hand, if Tottenham had slightly better-rested players at season’s end, they might have done better than picking up two of the last 12 available points and slipping behind Arsenal into third.
Finally, just a few notes. The model used here is a little more conservative than the one I used in the FiveThirtyEight piece. Why not just re-use that one? Because I didn’t save it6. So the results this time are probably understating probabilities. Not by much (we’re talking maybe a couple of tenths). But I stuck with it for a couple of reasons. First, Kane is getting some benefit of the doubt. The model here is slightly lowballing Kane’s total from last year that I used in the FiveThirtyEight piece (that had Kane at 11.33 ExpG for 14-15; this one is right at 11.00 ExpG). So it’s probably also lowballing the total from this season by that model. In other words if I had been able to faithfully recreate the exact variables and interactions I used for that piece here7, Kane’s total would likely be a tiny bit higher than 16.49. The other reason I stuck with it is because, eh, close enough. And it would have felt disingenuous hunting for an outcome that made the gap between actual and expected as small as possible just for the sake of, well, something.
Also, Kane took a lot of shots, 153 in total. More than anyone else in the Prem this season. In fact the distance from first to second—37 shots—is the same distance from second to fifteenth. There’s no rule agains taking shots. In fact that’s probably one of the main requirement in a striker’s job description. But on a per-shot basis, the gap between his goals and expected goals is .023, which is 19th best in the EPL among players with 50 or more shots8. That puts him behind (in order) Anthony Martial, Georginio Wijnaldum, Riyad Mahrez, Roberto Firmino, André Ayew, Sergio Agüero, Dimitri Payet, Nathan Redmond, Michail Antonio, Shane Long, Jordan Ayew, Jermain Defoe, Dele Alli, Graziano Pellè, Gylfi Sigurdsson, Jamie Vardy, Pedro and Odion Ighalo, who managed all of two goals on about 40 shots after the turn of the new year.
So Kane had a very good year. He won the Golden Boot. He also regressed toward in the mean in that his actual goals output was also much closer to his expected goals output relative to his ridiculously unsustainable gap from last season. But it didn’t regress as far as I pegged it to.
Oh, and his team won nothing, played in no finals, essentially capitulated in the one contest they might have actually had the best chance of winning, and finished behind their local rivals9.
1 Curiously, nobody did this on days when Kane didn’t score. I really don’t mind the shots (save for maybe the one or two I’ll keep on a Nixonian list). File Under: Banter. Which, for the most part, is fun. I’ve seen what actual hateful abuse on Twitter looks like and I’ve never received anything close to that.
2 We don’t count penalties. That’s not a royal ‘we’. I don’t think anyone out there doing ExpG numbers adds penalties. Plus, penalties are stupid. For what it’s worth, penalties in the EPL went in at a rate of 81.3% this year, which is abnormally high.
3 So the end of that piece probably isn’t the clearest. The 153 number refers to the number of shots I estimated for Kane. Which, holy crap, I nailed exactly (if you look at something like Whoscored, you’ll see Kane had 159 shots, but once you subtract off the five penalties he took and the own-goal he scored you get 153). I’d like to pat myself on the back, but while getting in the neighborhood is a decent job of estimating, getting it exactly is a bit flukey. Also, in terms of goals and expected goals, I was almost even closer. In the pre-edit draft of the FiveThirtyEight piece I wrote: “Put the over/under at around 17.5 goals, only it will be on an ExpG something closer to 15, give or take. He is a striker, so over-performing by a goal or two would be totally normal.” Eh, when working with an editor, you pick and choose your battles. Clearly, I chose poorly. It’s not the first time.
4 For grins I ran a model with player effects. I first subsetted out players with 150 or more shots over the last three seasons (cumulative) and modeled on those. Somewhat surprisingly it turned out that there were only coefficients on two players—Santi Cazorla and Sergio Agüero. Basically, Cazorla was so bad at shooting he earned a negative coefficient and Agüero so good he earned a positive one, but nobody else showed any significant effect. According to this limited data set a striker is a striker is a striker (is a midfielder who shoots enough to make the cutoff). I was more than a little surprised by that. I wouldn’t take it as any gospel truth about the game, but in this data set of players who shoot a lot, we don’t get any better prediction by distinguishing one player from another.
5 That is a pretty large number. In fact if you divide it by 38, it turns out to be more than 90. Yes, Kane played more than 38 90-minute games despite the fact that the season is 38 90-minute games long. That’s because games are more than 90 minutes in length. Take the game where Tottenham hosted Southampton. There were 4-plus minutes of first-half stoppage time and 5-plus minutes of second-half stoppage time. That’s over a 99 minute game. Sure that nine minutes is game time added on, but it’s not like players’ bodies switch off during substitutions and goal celebrations and injuries. Hell, you might expend more energy celebrating a goal than in actually scoring it. Think of it this way, if a player comes off in the 70th, they don’t subtract out any stoppage time to that point from his minutes played. Players are out there for more than 90 minutes. Anyway, if I were to make a prediction about Kane for next season, it’s that he’ll suffer an injury. That assumes he plays significant minutes at the Euros and carries a similar workload next season. At some point, rest is beneficial. Sure he’s young and can probably handle it but, if nothing else, just being out on the pitch increases the chances of a freak accident. Or does that only happen to Arsenal players?
6 And I almost never save it. There’s a longer post here where I go over how I model this in way more detail. But generally, penalized regression is so computationally cheap that I don’t bother keeping one model as the be-all. Also, yes, I realized I wrote myself into a corner with that post title. And I think I went over with this post, but I kinda had to do this if for no other reason than I’m pretty sure I’ll get people sending me that FiveThirtyEight link for seasons to come. This post will ultimately save me time.
7 I might have almost exactly the same model but because we’re sampling the data to build the model, the random rows of data pulled out to build said model will change from build to build. So coefficients will almost certainly change slightly as well. This goes back to one of my central complaints about ExpG generally. There is a massive amount of left over residual deviance no matter how ‘good’ the model (put differently, R-squared is usually really low; this one is about .17; I’ve broken .22 (but of course I forgot to make note what interactions I had put in before that R session crashed)).
8 I get .070 for the same calculation from 2014-15. That would put him 3rd best this season in pretty much a dead heat with Mahrez.
9 See, banter. It’s fun.