It’s Better To Be Lucky Than Good But It’s Even Better To Be Lucky And Good
Leicester opposition all shot conversion rate from season midpoint onwards = 3.4%. Insanely low, can't last forever.
— James Yorke (@jair1970) April 10, 2016
Of all the ridiculous stats about Leicester’s season I’ve seen so far this was maybe the ridiculousist. Made even sillier by the fact that Leicester proceeded to throw another shut-out via their 0-2 win over Sunderland, lowering that number even further.
It’s now 3.19%.
That’s not sustainable. Excluding pens, long term averages for shot conversion are around 9.5%. Watching Leicester concede at such a bafflingly low rate in defiance of the relentless imminent reversion predictions has kinda felt like seeing a unicorn get sodomized by a Yeti. Just in terms of likelihood. And maybe in terms of entertainment, if you’re also a Leicester fan.
But how likely is that 3.19% really?
For comparison I pulled 100 random 14-game streaks (Leicester has now played 14 games in the second half of the season). There is no deep theory here. This isn’t modeling and we’re not running any simulations. This is actual shot data from actual EPL games. And it’s not even sophisticated data. We just need shot and goal totals2.
I started with the 2010-11 season (no special reason) and ran thru the current season to date. Then things got random. I built a quick function to randomly select a season, then randomly select a team (the 20-team EPL pool is season-dependent). Finally the function randomly selects a start game (nothing over 25, so that the game sequences will end in the same season during which they start)3. Then we just divide the goals conceded by the total shots allowed over our 14-game range. Confused? All we’re doing is pulling out 14-game sequences from the last few years of EPL games so we have something to compare Leicester to.
And, once we do that, how often do we find other streaks in the low point-zero-three-somethings? We don’t4. There is literally only one other team that even breaks below a 4% threshold5.
It turns out that a shot-against conversion run like Leicester is on right now has a probability of .00326.
So yeah, Leicester have been exceedingly lucky. They haven’t only been lucky. They are genuinely a much better side this season than they were a year ago, but here are the goal totals for the Foxes games since the loss to Arsenal: 1, 2, 1, 1, 1, 1, 2. They have managed to pick up 19 points scoring only 9 goals over that span7. So their good fortune has been wed to serendipitous timing. If they had conceded goals at anything close to normal rates (or just normal rates plus or minus one standard deviation), then the league would still be very much in doubt.
Update: So I went ahead and built a function to extract every rolling 14 game shots-against conversion ratio. It turns out Leicester just emerged from the lowest ever in the sample seasons (2010-11 through the current 15-16 season). I might post the csv with the output, but for now I’ve added the poorly-labeled chart up top. The histogram is of the 14-game sequences, not entire team seasons. So while that arrow indicates 15-16 Leicester, it’s not the entirety of the season. Those tiny bars on the far left are basically comprised of equal parts of Leicester this season and a couple of bits of Mourinho at Chelsea.
1 Also, scroll through this entire discussion to better understand the answer every time for the next, oh, 25 years someone holds up Leicester as an example and asks “Why can’t we do it?”
2There are a few repositories that make this kind of data freely available. Maybe start looking here if you’re interested.
3 Just a note on the data. We do have some overlap. Some of it is negligible. For example there are five seasons of Sunderland in the data set. For the 11-12 season there are consecutive games sets starting with game 1 and with game 13. That’s not that big of a problem because the overlap is two games. But for 15-16 season we have game sets that start with game 9 and game 12; there’s a lot more overlap there. This happens roughly 5 times in the data. Intuitively that might seem problematic, but really it makes no difference. I was just lazy and didn’t want to put in additional qualifier statements to make sure we didn’t see significant overlap. Having streaks where no game is in more than one streak doesn’t really affect the outcome. I also looked at a couple of 50-game data sets where I removed any sets that overlapped. The results were almost identical.
4 Our mean .0926 and st.dev is .0222.
5 While the choice isn’t binary—it’s not ‘is Leicester lucky or good?’ (it’s a little of Column A and a little of Column B)—I will propose a third possibility anyway: Football is broken and nobody knows anything. I’m going to offer that up because across a couple of hundred random sets I only ran across one other that was even below .04 (or 4%) for a 14-game stretch. And that was West Ham, this season, starting with game 15. That’s 3.9% for that span. So maybe Slaven Bilic and Claudio Ranieri have figured something out that nobody else knows about and we can only hope they tell us when the season is over.
6 Just to be clear, this isn’t just converting the 3.19% to a decimal and rounding. That would be .032. This is .0032; it’s mean and variance derived and it’s entirely coincidental that it looks so close to the conversion-against rate.
7 For comparison Arsenal have scored 14 goals since beating Leicester… and picked up 8 points. If you’re thinking to yourself, “Yeah, but Arsenal ship all sort of dodgy goals,” A) They’ve only conceded 2 more than Leicester all season and, more importantly B) You’re missing the point.