The Game of Two Halves
what I've been doing, what's coming next, what's being revised, and a 2009 column that prefigured my first study
The first official Expecting Goals study on Substitute Effects came out last week and it is available to subscribers at the link.
I have also done two podcast appearances discussing this study and my plans for Expecting Goals, plus another podcast.
On my podcast, the Double Pivot, I talked to Mike Goodman about what I found about substitutes and what it might mean.
Scott Willis of the Cannon Stats podcast invited me to discuss this study, what we can still learn from soccer event data, and any Arsenal tie-ins from the investigation of substitute effects.
I also joined my friends at the Wheeler Dealer Radio podcast but I didn’t really talk about Expecting Goals as it turned out just Tottenham transfers, my feelings about Tim Sherwood, and my crackpottiest theories about ITK.
Double Pivot newsletter subscribers can see the newsletter here.
Standing on the Shoulders of Giants
As I was putting together my first Expecting Goals Study, I wanted to see if there was a substitute effect on overall goal-scoring. If teams make substitutes between minutes 60-80, and substitutes outscore starters, is there a clear increase in goals around that time? What I found was weirder. While there is an increase in goal-scoring after 75 minutes or so, the two larger effects are
A large uptick in goals in second-half stoppage time and
A large and persistent increase in goal-scoring from the moment the second half kicks off
This raises a bunch of questions about the game of football. For the present inquiry into the substitute effect, it suggests that some portion of what is seen as a statistical sub effect is a function not of substitutes having more energy to run at defenders but of substitutes simply playing at times in the game when more goals are scored. Questions around how to account for time effects will be a major aspect of the February Study blog.
But it is a much bigger question about the game of football as well. 44 percent of goals are scored in the first half and the other 56 percent in the second half. That is a big difference. Why is the game played that way? What are the incentives leading to it, the logic of the choices made by players and coaches, and to what degree is it a matter of suboptimal strategic planning? Beyond that, it’s something that should clearly be better known as a fact about the sport than apparently it is.
Expecting Goals reader Tiotal Football, who runs the Absolute Unit newsletter and who is also one of the crucial analytics Keepers of the Lore, pointed out to me that there was a study on this done back in the 2000s by Daniel Finkelstein of the old Fink Tank column in The Times of London. And indeed there was.
Daniel Finkelstein, Football a game of two halves? If only…
One fascinating thing about this study is that it confirms down to the hundredths place the same finding.
To do that, you need to look at how the scoring rate changes as the match progresses. The team started by looking at the goals scored in the first half of games and goals scored in the second half. Taking all the matches for the four seasons from 2005-06, 56 per cent of goals were scored in the second half, with a sort of weird persistence in every English division and in all the big leagues in Europe. It seems as if, as the game continues, more goals are scored.
This article was published in 2009, and my study covered seasons going back to 2010-11. It’s an independent confirmation of the same finding on data in the seasons previous to my study.
The consistency certainly suggests that this is something inherent to the way modern football is played. Over the seasons in my database, the second half percentage has only varied as high as 58 percent in the 18-19 season and as low as 54 percent in the 21-22 season, before settling back to roughly 56 percent this season. There do not appear to be any notable or persistent changes over time that I can identify.
But one big question it raises is how uniform this effect is. Are there teams or types of teams that have different kinds of splits? How are these splits affect by game state and other in-game factors? One thing I found is that the home/away split in goals is smaller in the final minutes than at any other time. Home teams have only a 53-47 goal-scoring advantage in stoppage time, notably lower than their usual 56-44 split.1
Finkelstein finds something analogous, that weaker teams tend to have their goals more concentrated in the second half than stronger teams.
A team with a good attack playing against a team with a weak defence will see a relatively smaller increase in goal rate as the game goes on compared with a team with a weak attack playing against a team with a good defence.
Let’s take Manchester United at home to Burnley. At the start of the match we expect almost 2.9 goals for United in the rest of the game and 0.42 for Burnley.
At the start of the second half we expect 1.53 for United in the rest of the match (more than half what they started at, but only just) and 0.28 from Burnley (well over half what we expected at the start of the match).
So for United, the effective half-time is at 48 minutes and for Burnley the effective half-time 58 minutes.
Finkelstein finds that goal-scoring by better teams is more evenly distributed over the match, while worse teams tend to leave it late. I am particularly interested in how this might affect player statistics. Does it mean something more if a player scores early in a game rather than late? What about a favored team? Are all statistics created equal, or are some statistics a better indicator of player or team quality depending on when in a match they’re accumulated?
As a final note, I keep writing “Finkelstein finds” and similar things, but the writer himself is quite open that his work is in some ways popularizing model-building by a few academics he cites regularly in the articles.
I’ll explain. Dr Henry Stott, Dr Ian Graham and Dr Mark Latham have been tweaking the Fink Tank Predictor, so that the model tracks probability as it changes during the game.
Revisions to the First Study
As I was putting together materials to talk with Scott Willis on his Cannon Stats podcast, I found that one of my data tables had been based on an older, buggy version of player minutes played. (In particular, this table accidentally included bench minutes in overall starter minutes.) This oversight fortunately only affected two of my charts, and none of my overall findings. Because I had far too many overall bench minutes by starters in this one table, I said incorrectly that starters play 24 times more minutes than subs when the real ratio is about 14-to-1. And I also showed an even larger gap between the substitute rate of scoring and the starter rate of scoring than is correct.
But I have now updated the two charts in the online version of the newsletter. You can see the correct chart below
It remains the case that the aggregate of substitute scoring compared to the aggregate of starter scoring is not an apples-to-apples comparison, and properly accounting for the substitute effect requires much more textured study. But the first run of the blog overstated the extent of this silliness and it has now been fixed.
The fact that the home advantage is about 56/44 is enough to make me start thinking kabbalistically about soccer numbers.
What are your thoughts on the accuracy of the different models -- opta vs. statsbomb vs. understat?
How do starters and substitutes compare in second halves and second half added time?