Computer Vision in Football Data: What the New Tracking Data Means for the Edge Timeline

Betting Forum

Administrator
Staff member
Joined
Jul 11, 2008
Messages
1,924
Reaction score
185
Points
63
Computer Vision in Football Data.webp
Every significant improvement in football data infrastructure has followed the same pattern. A new data type becomes available to a small number of well-resourced organisations. Those organisations develop analytical frameworks for the new data. The data gradually becomes commercially available to a wider audience. The betting market incorporates the insights from the new data. The edge that existed during the restricted availability window closes.

Optical tracking data - player and ball position at 25 frames per second across an entire match - is currently somewhere in the middle of this cycle for most major European leagues. The technology is deployed. The data exists. The analytical frameworks for some of what it enables are developed. The commercial availability is partial and expensive. The market incorporation is incomplete and uneven.

Understanding precisely where we are in this cycle - which capabilities are available, to whom, at what cost, and how quickly they're being incorporated into market pricing - is what determines the edge timeline. This article maps that timeline as specifically as the publicly available information allows.
Recommended USA sportsbooks: Bovada, Everygame | Recommended UK sportsbook: 888 Sport | Recommended ROW sportsbooks: Pinnacle, 1XBET

What Optical Tracking Data Actually Captures​

The distinction between event data and tracking data is the foundation for understanding why tracking represents a qualitative jump in analytical capability rather than just more of the same data.

Event data - the data that has underpinned xG models, PPDA calculations, and most of the analytical frameworks described throughout this series - records what happens when a player makes contact with the ball. A pass, a shot, a tackle, a carry. The player's position at the moment of the event, the destination of the ball, the outcome. Event data describes a series of discrete on-ball actions.

What event data doesn't capture is everything that happens between those actions. The off-ball movement of all twenty-two players. The defensive shape as it reorganises after a possession change. The positioning of the striker before the ball arrives and how that positioning influenced where the pass was directed. The distance a midfielder covers to close down a passing lane rather than the ball carrier. The space created by a run that drew a defender and opened the channel for the actual chance. All of this is invisible in event data and present in tracking data.

At 25 frames per second across a 90-minute match, optical tracking produces roughly 135,000 positional readings per player per match - approximately three million total positional data points per match for all players combined. This is a dataset of entirely different character from the hundreds of event data points a match produces. The analytical capabilities this enables are correspondingly different.

The Specific New Metrics Tracking Data Enables​

Some of these metrics are already commercially available in limited form. Others are still proprietary. All are analytically meaningful for betting purposes in specific ways.

Off-Ball Running Quality and Quantity​

Event data records that a striker received a pass in a specific position. Tracking data records the full sequence of how the striker arrived at that position - the timing of their run, the distance covered, the direction changes, the separation from the nearest defender at the moment of receiving. The quality of the off-ball movement that produced the receiving position is now measurable rather than invisible.

For betting purposes, this produces a metric that's more predictive of future goal contribution than either shots or xG alone: a striker whose off-ball runs consistently create high-quality separation from defenders is generating expected threat that doesn't appear in any event-data metric until the run converts into a shot. Their future goal probability is underestimated by event-data models in proportion to how much of their value is in their movement quality rather than their ball-contact statistics.

The prop market application is specific: strikers with high off-ball running quality whose shot and xG records understate their movement contribution have prop markets priced below their genuine expected contribution. The identification requires tracking data access. It's not achievable from event data alone, which is why the mispricing persists.

Pressure Maps and True Pressing Contribution​

PPDA - described at length in the PPDA article - is an event data approximation of pressing intensity. It measures defensive actions per opponent pass as a proxy for how effectively a team disrupts the opponent's build-up. It's useful but limited: it measures outcomes of pressing (defensive actions) rather than the pressing behaviour itself (the positioning and movement that creates the threat of interception or forced error).

Tracking data enables pressure maps: visualisations and metrics that capture the actual pressing intensity experienced by a player on the ball, defined by the number of opponents within a specific radius and their closing speed and direction. A player receiving the ball with three opponents within four metres closing at two metres per second is under qualitatively different pressure from a player receiving with one opponent at five metres moving laterally. The resulting decision quality is predictably different. Event data can't distinguish these situations because it records only what the receiving player does, not the pressure context in which they do it.

Pressure maps produce a genuinely new metric: pressure success rate - the proportion of pressing actions where the press successfully forces a lower-quality decision or direct ball loss. This separates pressing volume (the PPDA dimension) from pressing quality (the dimension PPDA approximates poorly). A team with average PPDA but high pressure success rate is pressing less frequently but more effectively than their PPDA suggests. Their defensive disruption value is higher than event-data models capture.

Space Creation and Utilisation​

The off-ball action that matters most in tactical football is often the run that creates space by drawing a defender, rather than the run that receives the ball. A striker's deep run that pulls the last defender and creates the channel for the attacking midfielder's goal-scoring run doesn't appear in the striker's statistics at all - the xG goes to the midfielder, the assist to whoever played the final pass, and the run that made both possible is invisible in event data.

Tracking data enables space creation metrics: quantification of how much high-value space a player's movement creates for teammates, independent of whether they receive the ball themselves. This is the most analytically significant tracking-data capability for identifying undervalued contributors, because the players whose movement creates space are exactly the players whose event-data statistics most dramatically understate their contribution.

The betting application overlaps with the pressed-from-front striker article and the xT versus xG article - both were pointing at the same underlying phenomenon of value-creating actions invisible to event data. Tracking data is the technology that makes those invisible contributions measurable rather than inferential.

Defensive Shape and Line Metrics​

Defensive line height - how high or deep a team defends - is a key tactical variable with specific implications for expected goals against distribution. Event data provides rough proxies for defensive line height from where defensive actions and headed clearances occur. Tracking data provides exact defensive line positions at every frame of the match.

The specific metrics this enables: average defensive line height by match phase, defensive line compactness (the horizontal width of the defensive unit), and - most valuably - defensive line consistency (how much the line height varies relative to the ball position). A team whose defensive line is consistently compact and consistently positioned is executing their defensive system more reliably than a team with equivalent average line height but high variance. The consistency metric predicts clean sheet probability better than the average alone.

Sprint and Acceleration Load​

Total distance covered per match is a widely available event-data proxy for physical output. Tracking data enables distance covered broken down by speed zone - walking, jogging, running, high-intensity running, sprinting - and the number and quality of acceleration and deceleration events. These sprint and acceleration load metrics are what sports scientists use for injury risk monitoring and fatigue assessment.

For betting purposes, the specific application is in mid-season fatigue assessment for specific players and squads. A midfielder who has accumulated above-average high-intensity running load across a congested fixture period is at elevated injury risk and below-peak physical output even if they appear fully fit and available. The event-data model knows they played ninety minutes in each of the last six matches. The tracking data knows how physically demanding those ninety minutes were. The fatigue-adjusted performance expectation the tracking data enables is more predictive than the event-data model's minutes-based fatigue approximation.

Commercial Availability: What You Can Access and What You Can't​

The commercial availability landscape for tracking data in 2024 is fragmented in ways that are worth mapping precisely, because the accessibility determines whether the analytical capabilities described above are individual-bettor-accessible or operator-only.

At the commercially available end, the tracking-data-derived metrics that have been incorporated into the public or semi-public analytics ecosystem include: pressure data in the form Statsbomb makes available through FBref for competitions they cover, some off-ball running metrics that appear in commercial data packages from Opta and Stats Perform that are occasionally surfaced through third-party platforms, and aggregated defensive line metrics that analytical platforms have derived from tracking data and publish at team level.

The specific most accessible tracking-adjacent metrics from free sources: StatsBomb's pressure data on FBref, which includes pressure attempts, pressure success rate, and pressured pass completion rates for the competitions in their dataset. This is a genuine tracking-data-derived metric available from a free source, though it's available for specific competitions only and with a data lag rather than real-time.

At the proprietary end - data that exists, is used by sophisticated operators and clubs, but is not commercially available to retail consumers - sit the full tracking datasets themselves. Player-level positional data at frame-level resolution, run quality metrics, space creation metrics, individual acceleration load, and defensive shape metrics at the granularity that produces the most analytically significant betting insights. These are available through commercial contracts with Tracab, ChyronHego, and similar tracking technology providers at costs that are viable for clubs and major operators but not for individual bettors.

The middle tier - analytically processed tracking data at team or league level without individual frame-level access - is available through several analytics platforms at subscription costs in the range of hundreds to low thousands of pounds per year. These platforms include The Analyst, StatsBomb's commercial offerings, and various sports analytics data providers who have processed tracking data into derived metrics and sell access to the output. For a bettor who generates meaningful betting volume and for whom the analytical edge from these metrics produces returns above the subscription cost, this middle tier represents the accessible frontier of tracking data.

The Edge Timeline​

The edge timeline question - when does tracking data become fully priced into the market - has a structural answer and a market-specific answer.

The structural answer: the market incorporates new analytical capabilities when the information derived from those capabilities flows through to market pricing through one of two pathways. Either operators themselves acquire and incorporate the data into their pricing models - closing the edge from the supply side - or sufficient sophisticated bettor activity informed by the new data moves lines - closing the edge from the demand side.

Both pathways are operating but at different rates for different types of tracking-derived insights.

Operator model incorporation of tracking data is happening but is not complete or uniform. The most sophisticated operators - those who have built meaningful quantitative infrastructure - have begun incorporating tracking-derived metrics into their pricing models, primarily at the team level and primarily for the Premier League and top European leagues where tracking data has been available longest. Their models use pressure maps, line metrics, and some off-ball running aggregates. The incorporation is partial because full tracking data at individual player level and match-level granularity requires computational infrastructure that not all operators have built.

The mid-market operators - those whose niche competition pricing was described as soft in the compiler bias article - have minimal tracking data incorporation. Their pricing for Championship and below, for Scandinavian leagues, and for lower-tier European competition is almost entirely event-data-based. Tracking data for these competitions is available but less comprehensively used, creating a persistent gap between what tracking data reveals and what the market prices.

The demand-side pathway - sophisticated bettors using tracking data to identify mispricings and betting them into correction - is operating but slowly. The cost and technical accessibility barriers limit how many individual bettors are working with genuine tracking data. The bettors who are doing so represent a small enough community that their market impact isn't closing the pricing gap at the pace that sharp money closes simpler pricing errors.

The combined estimate for the edge timeline: for Premier League match result and total goals markets at sophisticated operators, the tracking data edge is narrowing but not yet closed - perhaps sixty to seventy percent of the analytically identifiable tracking-data signal has been incorporated into pricing. For lower-tier competitions and niche markets, and for prop markets across all competitions, the incorporation is substantially lower - perhaps twenty to thirty percent. The gap represents the current edge, and the timeline for that gap to close is years rather than months for the lower-incorporated segments.

What This Means for Individual Bettors Right Now​

The honest assessment of tracking data access for individual bettors in 2024 is that the full granular dataset is out of reach but the tracking-derived metric tier is partially accessible, and the partially accessible tier is sufficient to produce meaningful betting edge in specific markets.

The specific accessible tracking-adjacent capabilities that individual bettors can incorporate into their analysis today without prohibitive cost: StatsBomb's pressure data through FBref for covered competitions, the PPDA framework the series has described as the accessible proxy for pressing quality, commercial analytics platform subscriptions at the middle-tier cost point for bettors whose volume justifies the investment, and the off-ball running observation methodology described in the progressive carries article as a visual approximation of what tracking data measures precisely.

The edge from these partially accessible capabilities is concentrated in the market types that are least incorporating tracking-derived insights: prop markets for individual players whose off-ball contribution is invisible to event data, total goals markets for pressing-intensive competitions where pressure success rate data modifies the PPDA-based assessment, and clean sheet markets for teams whose defensive line consistency - partially inferrable from visual observation - is better than their event-data defensive metrics suggest.

The investment logic for considering middle-tier tracking data access: for a bettor who bets Premier League and Championship prop markets at meaningful volume, the annual subscription cost for a commercial analytics platform carrying processed tracking metrics is likely recovered through improved analytical accuracy in a modest number of bets per season where the tracking insight produces the decisive edge. The calculation is specific to each bettor's volume and market focus and worth doing explicitly rather than assuming the data is either unaffordable or obviously worth the cost.

The Looming Integration Point​

There's a medium-term development that's worth flagging even though it hasn't fully arrived: the integration of tracking data directly into the commercial data feeds that most operators already subscribe to.

Sportradar, Stats Perform, and Genius Sports - the primary data infrastructure providers described in the propagating errors article - are all developing or deploying tracking data capabilities alongside their existing event data feeds. When tracking-derived metrics are bundled into the standard commercial data package that most operators already receive, the analytical gap between operators who've invested in tracking data and those who haven't will close rapidly. The systematic operator-level mispricing from tracking data absence will compress toward zero at the pace of commercial data package adoption.

This integration point is probably two to four years away for widespread adoption at the operator level across all major European leagues. Before it arrives, the bettors and the small number of sophisticated operators who are working with tracking data have a window of genuine edge against the majority of the market that isn't. After it arrives, the tracking data edge closes in the same way the xG edge closed - it becomes standard, it gets incorporated everywhere, and the game moves on to whatever comes next.

What comes next is probably full integration of language model capabilities with tracking data - tactical context understanding combined with positional data at scale. That's the next frontier, and it's genuinely early stage. But that's a different article.

FAQ​

Q1: Which specific European leagues have comprehensive tracking data coverage, and are there meaningful competitions where tracking data doesn't yet exist?
The Premier League, Bundesliga, La Liga, Serie A, and Ligue 1 have comprehensive tracking data coverage through the major providers. The Scottish Premiership, Dutch Eredivisie, Portuguese Primeira Liga, and Belgian Pro League have coverage through specific providers but with less complete historical depth. The Championship has tracking data deployed at most stadiums but coverage completeness varies by venue and provider contract. Below the Championship - League One, League Two, and equivalent tiers in other European countries - coverage becomes increasingly patchy and in some competitions is absent entirely. This coverage distribution maps closely to the competitions identified as softest in the compiler bias article: the competitions with the least tracking data coverage are precisely those where the operator pricing is weakest and the individual bettor edge is most accessible. The absence of tracking data infrastructure in these competitions is another layer of the same analytical gap that creates their general pricing softness.

Q2: Is there a realistic pathway for individual bettors to access raw tracking data rather than the processed metrics that commercial platforms provide, and what would that analysis look like?
The raw tracking data pathway for individuals is extremely limited under current commercial structures. The clubs and leagues that own the data typically license it only to commercial entities under contracts that prohibit resale or individual licensing. The practical pathway that exists is through the open data initiatives that a small number of providers have launched. StatsBomb's open data programme releases full tracking data for specific competitions - most notably the Women's Super League and specific international tournaments - in publicly accessible format. This data is genuinely usable for developing analytical frameworks on tracking data at no cost, though the specific competitions covered aren't the primary betting targets. The more realistic pathway for most individual bettors is the commercial analytics platform subscription tier, which provides the output of someone else's tracking data analysis rather than the raw data itself. The analytical insight is effectively the same - you're getting the metrics that the raw data enables - without the computational infrastructure required to process raw tracking data yourself.

Q3: How does the edge timeline for tracking data compare to the edge timeline for xG when xG was first becoming available, and what does that historical comparison suggest about the current window?
The comparison is instructive. Expected goals models became commercially meaningful around 2012 to 2014, when a small number of analysts and early-adopter operators were using xG-based models while the majority of the market was pricing from raw shot counts and results. The window between xG being meaningfully analytical and xG being fully incorporated into mainstream market pricing was roughly five to seven years - by 2019 to 2020, xG was broadly incorporated into the pricing models of most major operators and the specific edge from being an xG-first analyst had largely closed. Tracking data is probably five to seven years behind xG on this same adoption curve - the analytical community has had meaningful tracking data for two to three years and the market incorporation is at the early stages of the adoption curve that xG went through. If the historical comparison holds, the tracking data edge window is roughly four to six years from now before it closes in the same way the xG edge closed. The comparison also suggests what the edge will look like as it closes: it will close unevenly, with the most liquid and best-covered markets closing first and the niche markets and prop markets closing last. The bettors who extracted the most value from the xG edge were those who moved from Premier League result markets - where the edge closed first - to niche competitions and prop markets where the edge closed later. The same migration strategy probably applies to the tracking data edge.
 
Back
Top
GOALLLL!
Odds