Using AI to Audit Your Pre-Match Analysis Process: The Consistency Check

Betting Forum

Administrator
Staff member
Joined
Jul 11, 2008
Messages
1,956
Reaction score
185
Points
63
Using AI to Audit Your Pre-Match Analysis Process The Consistency Check.webp
The stress-testing article described a specific technique - writing out your analysis for a single fixture and then prompting an LLM to argue against it. That technique challenges the conclusion. This one challenges the process that produced it.

They're different problems. A stress test can find a flaw in a specific piece of reasoning while leaving the underlying analytical process completely intact. If your process is inconsistent - if you apply certain variables when they support your existing lean and skip them when they don't, if you weight form heavily when it's recent and dismiss it when it's inconvenient, if you mention referee tendencies in some write-ups and never in others - then stress-testing individual analyses fixes symptoms without touching the cause. The next analysis will have a different specific flaw produced by the same underlying inconsistency.

The consistency audit is a different exercise. You feed a batch of your recent pre-match write-ups to an LLM - ten is a workable starting number - and ask it to examine your process across all of them simultaneously. Not whether your conclusions were right. Not whether your reasoning in any individual case was sound. Whether you are applying a consistent analytical framework across different fixtures or whether your process is actually a set of loosely connected impressions that vary based on factors that have nothing to do with analytical rigour.

Most serious bettors, if they're honest, already suspect the answer to that question. This is how you find out specifically.
Recommended USA sportsbooks: Bovada, Everygame | Recommended UK sportsbook: 888 Sport | Recommended ROW sportsbooks: Pinnacle, 1XBET

Why Process Inconsistency Is Hard to See From the Inside​


When you write a pre-match analysis, you're inside the reasoning. You're not comparing it to the ten analyses that preceded it - you're thinking about the fixture in front of you. Variables that feel relevant get included. Variables that feel less relevant get skipped or mentioned briefly. The whole thing feels coherent because you're generating it, not evaluating it from outside.

The inconsistency only becomes visible when you put multiple analyses next to each other and look across them with a consistent evaluation framework. Which is something humans are genuinely bad at. Reading ten of your own analyses in sequence, you'll naturally read each one charitably, focus on the internal coherence of each piece, and notice the inconsistencies least in the places where they're most systematic - because systematic inconsistencies feel like analytical judgement rather than errors. If you consistently weight xG heavily in matches involving top-half teams and dismiss it in relegation battles, that pattern feels like sensible context-sensitivity. It might be. It might also be motivated reasoning that correlates with which outcome you're rooting for.

An LLM given all ten analyses simultaneously, asked the right questions about process consistency, doesn't have any of that. It doesn't have a preferred outcome for any of the fixtures. It doesn't read each analysis charitably because it already understands your framework. It applies the same evaluation questions to every analysis in the batch and surfaces the differences.

What Your Write-Ups Need to Look Like​


The audit works best when your pre-match write-ups are genuine analytical documents rather than brief notes. A two-sentence pre-match note doesn't contain enough process information to audit - you could write two sentences about any fixture from any process and they'd look similar. The write-up needs to show your reasoning, not just your conclusion.

If your current write-ups are brief, this audit is itself a reason to start writing more fully before the exercise produces useful output. The minimum useful pre-match document for this purpose contains: the market you're assessing and the current price, your assessment of each team's relevant form and how you're weighting it, the specific variables you considered and what each contributed to your assessment, an explicit statement of your probability estimate and how it diverges from the market price, and a note on anything you chose not to include and why.

That last element - what you chose not to include - is the one most people omit and the most analytically revealing. A systematic pattern in what you exclude is sometimes more informative than a systematic pattern in what you include.

If you've been writing that level of detail consistently across ten or more recent analyses, you're ready to run the audit. If you haven't been writing at that level, start now and run the audit in two to three months when you have a usable batch.

The Structural Inventory Prompt​


The audit runs in two stages. The first stage is descriptive - mapping what's actually in your analyses without any evaluative judgement yet. The second stage is evaluative - identifying inconsistencies and their probable implications.

Running them separately matters. A combined prompt that simultaneously maps and evaluates produces shallower treatment of both. The inventory prompt first:

"The following are ten pre-match analyses I've written for football betting. I want you to produce a structural inventory across all ten - not evaluating quality or consistency yet, just mapping what's present. For each analysis, identify: which variables are mentioned, what markets are being assessed, whether a specific probability estimate is stated or implied, whether the reasoning explicitly links variables to the probability estimate or leaves the connection implicit, and whether there are any variables mentioned but then not incorporated into the final assessment. Once you've inventoried each analysis individually, produce a summary table showing which variables appear across all ten, how many analyses each variable appears in, and whether each variable - when mentioned - is typically connected explicitly to the probability assessment or mentioned in passing without clear incorporation. Do not identify inconsistencies yet. Just describe what's there."

The "do not identify inconsistencies yet" instruction is important for the same reason the pattern and cause separation mattered in the betting history leakage analysis. Asking the model to simultaneously map and evaluate produces a narrative that's already shaped by the evaluation rather than a neutral inventory that the evaluation then examines. The inventory should be descriptive before it's interpretive.

The summary table this prompt produces is often surprising before you've run any evaluative prompt at all. Seeing that a specific variable appears in eight of ten analyses but is explicitly connected to your probability estimate in only three of them - that observation doesn't require an evaluative prompt to be uncomfortable. The inventory reveals it.

The Consistency Evaluation Prompt​


Once you have the inventory, the evaluation prompt runs on it alongside the original analyses:

"Using the structural inventory you've produced alongside the original ten analyses, I want you to identify process inconsistencies across three categories. First, variable application inconsistency: variables that appear in some analyses and not others without any apparent relationship to the fixture type or market - identify each variable with this pattern and describe when it appears and when it doesn't. Second, incorporation inconsistency: variables that are mentioned across multiple analyses but connected to the probability estimate inconsistently - sometimes explicit, sometimes implicit, sometimes apparently not incorporated at all. Third, weighting inconsistency: variables that appear in multiple analyses but seem to receive significantly different weight in contexts that don't obviously justify the difference - for instance, a variable weighted heavily in one analysis and dismissed briefly in another where similar circumstances apply. For each inconsistency you identify, describe it specifically and note the sample size - how many analyses the pattern is visible across. Do not speculate about why the inconsistency exists. Just describe what the pattern looks like."

The three-category structure separates types of inconsistency that have different implications. Variable application inconsistency - using a variable in some analyses and not others - might mean you're applying it selectively based on whether it supports your existing lean. Incorporation inconsistency - mentioning a variable but not connecting it to your estimate - might mean you're including it for show rather than for analysis. Weighting inconsistency - the same variable meaning different things in different contexts - is the most complex category and the most likely to produce findings you'll initially want to defend.

The "do not speculate about why" instruction keeps the evaluation descriptive rather than explanatory at this stage. You want a clean picture of what the inconsistencies are before you start generating reasons for them, because the reasons you generate will tend to be charitable and the charitable explanations are not always the accurate ones.

The Cause Analysis - Separate and Later​


After you have the inconsistency findings, you run a third prompt asking for possible explanations. The separation from the evaluation prompt is the same principle that runs through the referee database and manager database articles - identify patterns before theorising about causes.

"The following are the process inconsistencies identified in my pre-match analysis audit. For each inconsistency, generate two or three possible explanations - including at least one explanation that reflects a systematic analytical error rather than a contextually appropriate adjustment. For each explanation, describe what additional evidence in the analyses would support or contradict it. Do not assume any explanation is correct. I want competing hypotheses I can test against the actual analyses rather than a single explanation that fits the pattern."

The "including at least one explanation that reflects a systematic analytical error" instruction is the most uncomfortable and most important line in this prompt. Without it, the model will generate explanations weighted toward the more charitable interpretations - you're appropriately adjusting your framework for different contexts, you're correctly weighting variables differently based on fixture characteristics, and so on. Some of those explanations will be right. But the analytical error explanations are the ones worth stress-testing most thoroughly, precisely because they're the ones you're least inclined to consider yourself.

The specific error the model should be prompted to consider for each inconsistency type: for variable application inconsistency, the explanation is motivated inclusion - using a variable when it supports your lean and skipping it when it doesn't. For incorporation inconsistency, the explanation is analytical decoration - including variables that make your write-up look thorough without actually incorporating them into your assessment. For weighting inconsistency, the explanation is outcome bias - weighting the same variable differently based on which direction it pushes rather than based on genuine contextual differences.

These are uncomfortable explanations to consider. They're also the most common sources of systematic analytical error in betting write-ups, and the prompt needs to surface them rather than letting the more flattering explanations crowd them out.

The Variables You Mention But Never Incorporate​


The incorporation inconsistency category deserves specific attention because it's the most common finding and the least intuitive problem.

Most pre-match analytical write-ups, if you examine them honestly, contain variables mentioned in passing that have no discernible effect on the final probability estimate. You mention that the referee has a high card rate, then price the match as though the referee had no effect. You note that the visiting team has an unusually strong record in midweek fixtures, then your handicap assessment doesn't reflect it. You describe a manager's tendency to rotate heavily in this competition, then your team quality assessment uses the first-choice lineup as though rotation weren't relevant.

The variables are present in the write-up. They're not present in the reasoning. The write-up looks more thorough than the reasoning actually is.

This pattern is worth a dedicated follow-up prompt once the evaluation has identified which variables fall into this category for you specifically:

"The inventory identified the following variables as appearing in my analyses but not being explicitly incorporated into my probability estimates: [list the variables]. For each variable, I want you to do two things. First, construct a simple one-sentence rule for how this variable should be incorporated into a probability estimate if it's worth mentioning at all - what direction does it push and under what conditions. Second, identify whether the analyses that mention this variable include any implicit indication of how I actually used it in my reasoning, or whether it genuinely appears to be mentioned without functional incorporation. For any variable where you cannot construct a reasonable incorporation rule - where you cannot identify what effect mentioning this variable should have on the probability estimate - flag it as a candidate for removal from my analytical framework rather than better incorporation."

That last instruction - flagging variables that can't be connected to a probability estimate as candidates for removal - is the most useful single output of the whole audit for many bettors. Most analytical frameworks contain variables that were added because they sounded analytically sophisticated, not because they have a clear and consistent relationship to expected outcomes. The audit surfaces them. The prompt above makes the removal case explicit.

What to Do With the Findings​


The audit produces three types of finding. Each requires a different response.

Genuine context-sensitivity that looked like inconsistency is the best-case finding. Some variable application and weighting differences are appropriate - xG means different things in a match between promotion contenders than in a match between a mid-table side and a bottom-of-table side with a depleted defence. When the cause analysis produces this explanation and it holds up against the specific cases, the finding is a clarification of your framework rather than an error in it. You document the contextual rule explicitly rather than assuming it's understood.

Motivated inclusion is the hardest finding to act on because the solution requires a behavioural change rather than a process tweak. You can't solve motivated inclusion by adding a step to your analytical process - you solve it by requiring that every variable in your framework be applied to every relevant fixture regardless of whether it supports your existing lean, and then checking that requirement against your next batch of write-ups in three months. The audit identifies the problem. The discipline of consistent application is the fix.

Analytical decoration - variables mentioned but never incorporated - has the cleanest solution. Either build the explicit incorporation rule and follow it, or remove the variable from your framework. The middle position of mentioning something without using it is the worst of both options. It makes your write-up look more thorough without making it more accurate, and it introduces a consistency requirement you're not actually meeting.

The Follow-Up Audit Cadence​


A single audit is a diagnostic. A repeated audit is the mechanism that actually changes your process.

Run the audit quarterly, on a rolling batch of your ten most recent analyses. The first audit tells you what your inconsistencies are. The second audit, three months later, tells you whether you've addressed them or whether the same patterns are still present under different surface appearances. Systematic inconsistencies are persistent - they don't disappear because you're aware of them without specific structural changes to your process.

The follow-up audit prompt has one addition to the standard evaluation prompt:

"The previous audit of my analyses identified the following specific inconsistencies: [list findings from prior audit]. In this audit of my ten most recent analyses, identify whether those previously flagged inconsistencies are still present, partially addressed, or resolved. For any inconsistency that appears resolved, confirm that the resolution looks like genuine process change rather than a temporary adjustment in the audited sample. For any inconsistency that is still present, note whether it appears in a different form than the original finding or whether it has persisted unchanged."

The "confirm that the resolution looks like genuine process change rather than a temporary adjustment" instruction is the one that prevents the audit from being gamed - consciously or unconsciously - by being slightly more careful in the weeks before you know you're running the audit. Ten analyses over three months is a large enough sample that temporarily adjusted behaviour won't produce clean findings. But naming the possibility directly in the prompt is worth doing.

Anyway. The consistency audit is the most uncomfortable exercise in this series because it doesn't find flaws in your reasoning about football. It finds flaws in your reasoning about your reasoning. That's a different and harder thing to look at honestly. The stress-test article challenges a conclusion you drew about a specific match. This one challenges the process you use to draw conclusions in general. The second challenge is more useful and less comfortable than the first, which is probably why almost nobody has suggested doing it before now.

FAQ​


What if I don't have ten pre-match write-ups - I keep brief notes rather than full analyses?​


Start writing more fully from today and run the audit in two to three months. Brief notes don't contain enough process information to audit usefully, and the exercise of writing more fully is itself beneficial before you ever run the audit - it forces you to articulate the reasoning that currently stays implicit, and implicit reasoning is where process inconsistency is most comfortable hiding. If you want something immediate, run the inventory prompt on whatever you do have. Even brief notes might reveal which variables you mention consistently and which you skip. It won't produce the full evaluation finding, but it'll tell you whether fuller write-ups are worth developing for this purpose.

The model identified a weighting inconsistency I think is actually appropriate context-sensitivity. How do I distinguish the two?​


Ask the model to help you test it. Take the specific cases where the variable was weighted differently, describe the contextual differences between those fixtures, and ask: "Given these contextual differences, construct the strongest case that this weighting difference is analytically justified. Then construct the strongest case that it reflects motivated reasoning rather than context-sensitivity. Which case is more consistent with the specific language used in each analysis?" The language test is useful - genuine context-sensitivity tends to be articulated as such in the analysis itself. Motivated reasoning tends to produce weighting differences that aren't explicitly justified anywhere in the write-up. If you weighted a variable differently but didn't explain why in the analysis at the time, the motivated reasoning explanation is at least as likely as the context-sensitivity one. Not definitive, but worth sitting with honestly before dismissing the finding.

Should I share my analyses with the model all at once or in sequence?​


All at once, in a single prompt, is strongly preferable for the inventory and evaluation prompts. The model needs to hold all ten analyses in context simultaneously to identify cross-analysis patterns. If you feed them sequentially, each analysis gets evaluated against the prior ones rather than against all ten simultaneously, which produces uneven pattern detection - early analyses get less cross-referencing than later ones, and patterns that span the full batch are harder to identify. Most current LLMs have context windows large enough to hold ten full pre-match analyses comfortably. If yours doesn't, consolidate the analyses into a more compressed format before running the prompt rather than breaking them into sequential batches.
 
Back
Top
GOALLLL!
Odds