The people who are revolutionizing data in sport

By on April 29, 2015

Data science is obviously at the forefront of innovative sports, as was evident in our experiences this week at the Sports Analytics 2015 Conference in London.  From Qatar to the United States, France, Belgium, and the United Kingdom, data analytics is increasingly prevalent and integral in the business of football, whether it be from a marketing, scouting, training or tactical angle.

The Sports Analytics conference combined two streams of marketing and training under a unifying theme: data in sport is the next big innovation in the industry. Gathered at the conference were the people who will take that theory to its fruition.

Perhaps what most people don’t realize about these top-level clubs, the data is already there. Long-term Manchester United trainer Tony Strudwick, who kicked off the conference, said United collect 30,000 data points-a-week from training. Strudwick’s job at United is all about “managing risk”; when to take players off, when to train them and how to do so.  For instance, he said that Sir Alex Ferguson always used to build the game for the last ten minutes, and they had a strategic approach to this finale. As such, it was (though he now mainly deals with the youth setup and senior training) in part Strudwick’s job to manage the strategic substitutes United made in that course.

When United were playing Bayern Munich (in the UCL quarterfinal) in 2010, Wayne Rooney carried an injury into the second leg, leading into which United were down on aggregate. However, United quickly took a three-goal lead at home. Yet early in the second half, with just one substitute left, Rooney took another knock to the same ankle but Rafael (known for his vulnerability to sending-offs) was also on a yellow card. Strudwick and the management team decided to take Rooney off, but Rafael then got a red card and Bayern Munich came back from behind.

Presumably, in the new age of football, this is where the data would have come in. The problem is, while United have the data, it’s very hard to interpret well. According to Strudwick, what United strive to do is “use the pitch performance data to make decisions.” That’s when the data matters.

And it has to reach a point where it happens fast, within seconds. “Speed is mission critical,” he said.

Optimizing the data collected is also at the core of Ian Graham’s job at Liverpool. In his conference talk on “Using in-depth analytics strategies to improve your [club’s] game,” he gave an overview of Liverpool’s practices — or as he put it, the “Liverpool FC experience.”

Graham’s responsibilities include data collection and storage, used for visualization and reporting, and also statistical analysis and recommendations (including pre-match, post-match, financial and commercial) to Liverpool, both in the performance and business aspects. Liverpool use Opta statistics, and have over 127,000 games in their database from across the world. Beyond that, they have the basic game info such as goals scored, assists, bookings, and substitutes for roughly 100,000. For 40,000 more games they have passing, touch, and more data, and for 270 of Liverpool’s games in the recent past they have over sixty-seven millions raw points of data.

With these, they first collect the data in MySQL and then manipulate them via a combination of Excel and (to data scientists’ delight) the R statistical language.  Then comes the innovative part: using the data. Ian gave on example on how this is done. In a match way back (which he didn’t name), a Liverpool goalkeeper allowed a “ridiculous,” volley in from around 30-31 yards out at a tight angle. What Ian attempted to figure out was if the goal was the goalkeepers fault. The goalkeeper was about seven yards off his line, so many blamed him.

They went about figuring this out by taking all similar data points from within a three yard radius. Then, all the shots from that range. Only 0.6% went in and the majority were off target; even those on target still only had a 6% chance of going in — things were looking bad for the goalkeeper. However, the goalkeeper may have been right to have been off his line. In all similar situations tracked, 70% of the time (equal to 27,000 passes in Liverpool’s database), players passed the ball, and after a series of plays, there was a .06% chance of a player scoring if he passed, and by far more likely he would (players shot in that scenario only around 2-3% of the time). If the pass was successful, that figure increased to 9%. Of course, video analysts are also needed to help identify the exact situation — were Liverpool’s defense in the right position or doing the right thing?

But based on the data, there was only an 0.02% of a player shooting and scoring from that scenario, which would imply the goalkeeper was right to play off his line a little bit. And even this was an “over-simplified” example of his work. In the end, Ian estimated it was only about “70%” the goalkeeper’s fault.

He also noted another example of his work. Liverpool wanted to know the true advantage of possession and used a simple chart of average possession versus goal differential. Atletico Bilbao, for example, had a high possession figure but a negative goal difference in 2012/2013, with the same applying to Wigan in the Premier League that season. Overall, they found that the R-squared value between possession and performance (ie, goal different) was a modest 42% in La Liga, despite the extreme results of Barcelona.

So, he wanted to know instead how much “dangerous possession” was correlated with goals and used this metric instead of total possession, the latter representing how much possession a team had in the final third. This, it turned out, had an R-squared of 0.60 and teams like Atletico Madrid shot up the X-axis (dangerous possession), while Wigan and Bilbao slid back down it.

Ultimately, Ian said his analysis wasn’t at all the deciding factor at Liverpool, only a “hook” to influence and back up claims with — a broad theme at the conference. Ian said he encourages the use of data, but one has to be careful with it. You can’t — yet — formulate a way to play football simply based on the data.

The same applies in the marketing sector. Valencia’s Daniel Ayers, digital and content director at the Spanish club, said that data is only indicators of what to do and what you are doing right. For instance, Facebook fans and likes only gives a general, long-term picture while WhatsApp shares a more immediate detailing. Also, the more Facebook fans from Indonesia the more global appeal clubs have (Indonesia is Barcelona, Manchester United, Bayern Munich, and Real Madrid’s top country on Facebook). There is lots of data to drive marketing innovation at the club, it’s just about optimizing it.

As part of his talk, he also outlined the necessity of an identity to a club. Valencia’s campaign, #JuntisTornem — meaning “we’re on our way back but not there yet,” has kept with them throughout the season. They have to take risks, too, in promoting a campaign before matches. It may work — as when they beat Atletico Madrid with the slogan on their shirts and ended Real Madrid’s win streak another time — or fail if the team loses. Many marketing executives highlighted the obvious importance of team results in their campaigns.

On the pitch sports science data has also blossomed recently. Neil Black of British Athletics said that the running watch has revolutionized the running industry. Football Every Day sat down with Bruno Demichelis, founder of the famous MilanLab at AC Milan and currently an assistant manager at Chelsea. He’s knowledgeable and an old hand in Sports’ Science; when I asked him how he got into sports’ sciences (via his interest in the field and medical background after a successful karate career in which he was two-time European champion), he went on on how he founded the MilanLab and its objectives for the rest of our talk.

Known for MilanLab’s “mind room,” Bruno stressed the importance of physiology, which he said is often overlooked in the modern game. How much load a player has, then how he responds to it. Then, they analyze how a player performs during a game — the ultimate measure of a player. Bruno said that the Premier League has already approved some on-the-pitch health and performance data tracking during games.

Most of all, he said that the Milan Lab aims to be proactive in injury prevention, not reactive. Instead of treating an injury after the fact, test the player to find his weak points, then prevent the injury from happening in the first place. At Milan, this was a resounding success and the MilanLab is known for both it’s incredible (97% via The Guardian) reduction of injuries as well as career extension for players.

In Football Every Day’s own panel session with Gareth Thomas from the Football History Boys and Jamie Hosie of the Rugby Blog, we outlined how blogs are becoming an increasingly relevant club marketing channel. Yes, bloggers can now thank us when they are invited to club events! On both sides, the marketing and performance, digital technologies are becoming increasingly key. Preaching these new technologies and practices that will undoubtedly revolutionize football marketing, performance, and sports’ sciences, was the entire point of the Sports Analytics conference.

Update: a previous version of this article stated two inaccuracies regarding the score of Manchester United’s UCL tie with Bayern Munich, as pointed out by @unitedstats99 on Twitter

About Alex Morgan

Alex Morgan, founder of Football Every Day, lives and breaths football from the West Coast of the United States in California. Aside from founding Football Every Day in January of 2013, Alex has also launched his own journalism career and hopes to help others do the same with FBED. He covers the San Jose Earthquakes as a beat reporter for QuakesTalk.com and his work has also been featured in the BBC's Match of the Day Magazine.