- Joined
- Jul 11, 2008
- Messages
- 1,940
- Reaction score
- 185
- Points
- 63
This is the longer-term version. Not the single-bet challenge but the pattern analysis - feeding your journal to an LLM over weeks and months and asking it to find the distortions that run through your reasoning consistently. The ones you don't notice because they're not occasional errors, they're the water you swim in.
Those are the ones that cost the most money. Not the obvious mistakes you can see in hindsight, but the systematic biases that shape every analysis you do and never get surfaced because you'd need to read your own writing from the outside to find them. The LLM can do that. Most people don't ask it to, or they ask it in a way that produces validation rather than challenge. This article is about asking it correctly.
What the Journal Needs to Contain
The analysis is only as good as the material you're feeding it. A journal that records bets placed and outcomes produces CLV tracking. A journal that records your reasoning produces distortion analysis. Those are different things and they require different entries.
The reasoning journal entry for each bet has five components. Your pre-match analysis - the actual reasoning that led you to identify the bet, written before you checked the odds or placed the bet if possible. Your confidence level and what specifically drove it - not just "high confidence" but what piece of evidence or reasoning produced that confidence. Any hesitations or counterarguments you considered and dismissed, and why you dismissed them. The outcome, described in your own words after the match. And your post-match assessment - whether you think the bet was the right decision regardless of outcome, and what if anything you'd do differently.
That last component is the one most journals skip and the most analytically valuable. The post-match assessment written immediately after the result, before the next week's analysis begins, is where resulting shows up most clearly. Compare what you write when you win to what you write when you lose on similar bets. If the language shifts significantly - if wins produce "the analysis held up, the edge was real" and losses produce "I should have weighted the opposition's form more heavily" - you're resulting. The journal captures that. The LLM can find it across dozens of entries.
Write entries in full sentences rather than notes. Notes are too compressed to carry the language patterns the distortion analysis depends on. "Backed home team, good form, referee a concern" doesn't contain enough text for the model to analyse linguistic patterns. "I backed the home team primarily on their xPoints position, which suggested genuine quality divergence from their mid-table standing. The referee assignment gave me some concern given his tendency toward high card counts, but I decided the match script - home team likely to dominate possession and avoid the confrontational moments that typically trigger his card behaviour - would mitigate that risk" gives the model enough to work with.
Two to three hundred words per entry is the right length. Enough to capture the reasoning fully, not so long that entry maintenance becomes a reason to stop doing it.
The Accumulation Period
Don't run the distortion analysis until you have at least twenty entries, and ideally thirty or more. Below twenty, individual unusual entries skew the pattern analysis in ways that produce false findings - the model identifies something that appeared in four entries and calls it a pattern when it's noise.
At thirty entries covering perhaps eight to twelve weeks of activity, genuine patterns have had enough repetition to be distinguishable from variation. The distortions that are actually costing you money appear in enough entries that the model can identify them with some confidence. The occasional off-entries don't dominate the analysis.
This accumulation period is itself useful. The discipline of writing a structured reasoning entry for every significant bet, before checking the outcome where possible, is the closest thing to a cognitive forcing function that betting practice has. It's hard to engage in pure momentum betting - placing bets because the last three won and you feel good - when you're required to write three hundred words of pre-match reasoning for each one. The journal requirement is a behavioural guardrail as well as an analysis input.
The Distortion Analysis Prompt
Once you have thirty or more entries, paste them into a conversation and run this prompt:
"I'm going to paste a series of entries from my betting journal covering approximately [timeframe]. Each entry contains my pre-match reasoning, my confidence level and what drove it, any counterarguments I considered, the outcome, and my post-match assessment. I want you to analyse these entries specifically for cognitive distortion patterns. Do not analyse individual entries - analyse patterns across all entries. The specific distortions to look for are: resulting (language in post-match assessments that changes based on outcome rather than based on decision quality), narrative anchoring (pre-match reasoning that constructs a story around one central factor and dismisses contradicting evidence to preserve the narrative), confirmation bias (recurring pattern of finding evidence for a position while using thinner treatment for counter-evidence), gambler's fallacy language (references to sequences, streaks, or 'being due' outcomes), recency weighting (systematic overweighting of the most recent matches relative to the longer pattern), and confidence miscalibration (pattern of described confidence levels not matching the subsequent win rate in that confidence category). For each distortion you identify as a genuine pattern across multiple entries, quote three or more specific passages that demonstrate it and explain the mechanism. If a distortion type is not present as a pattern across the entries, say so explicitly rather than finding weak examples. At the end, rank the identified distortions by how frequently they appear and how materially they are likely to be affecting decision quality."
Several instructions in that prompt are doing specific work worth understanding.
"Do not analyse individual entries - analyse patterns across all entries" prevents the model from producing a per-entry critique, which is not what you want. You want the cross-entry patterns, not the individual assessments.
"Quote three or more specific passages" sets the evidential threshold. A distortion identified from one passage is a possible finding. One identified from three or more passages is a pattern. The threshold stops the model finding one example and treating it as representative.
"If a distortion type is not present as a pattern, say so explicitly" is the same instruction that appeared in the stress-testing article and it's equally important here. Without it the model will find something to say about every distortion category regardless of whether the evidence supports it, because a thorough-looking response gets rated as more helpful than a selective one.
The ranking instruction at the end is the practical output you're going to act on. Not all distortions are equal in impact. A mild tendency toward recency weighting that affects two bets per month is less important than systematic resulting that corrupts your post-match learning across every entry. The ranking orients your corrective attention correctly.
The Failure Mode: Getting Told What You Want to Hear
This is the most important section in the article and the one most people skip because it sounds like a technical caveat rather than a practical problem.
LLMs have a systematic bias toward producing responses that make the person feel good about themselves. This is the same architectural issue the stress-testing article described - training on human feedback that rewards agreeable responses. In the context of distortion analysis, it manifests in a specific way. The model identifies distortions but softens them, qualifies them heavily, and balances each criticism with an acknowledgement of what your reasoning did well. The output feels like honest feedback. It reads like honest feedback. But the softening means the distortion findings land without the weight needed to actually change anything.
The antidote is in the prompt architecture, not in hoping the model will be harder on you than its training inclines it to be.
Three specific instructions that reduce validation bias:
First, explicitly prohibit positive framing. Add to the prompt: "Do not comment on aspects of my reasoning that are sound or well-structured. I am not asking for a balanced assessment - I am asking specifically for the distortions. Any sentence that begins with 'on the other hand' or 'it's worth noting that your analysis also shows' should be removed before responding."
Second, ask for the harshest defensible interpretation. Add: "For each distortion pattern you identify, give me the harshest plausible interpretation of what it means for my decision quality - not the most charitable interpretation. If the pattern could indicate a significant systematic bias or a mild occasional tendency, describe the significant systematic bias interpretation."
Third, ask it to steelman the case that you're a poor analyst in this specific area. This is uncomfortable to write into a prompt and it's the instruction that does the most work: "After identifying the distortion patterns, construct the most negative but defensible overall assessment of my analytical process based on this journal. What would a sceptical and rigorous analyst conclude about the quality of my reasoning from these entries?"
That third instruction produces the output most likely to contain something genuinely useful rather than something comfortable. The most negative defensible assessment, if it contains something you recognise as accurate, is the finding worth acting on. If it contains things you can honestly dismiss as unfair characterisations of the evidence, the analysis has been useful in a different way - it's shown you that your journal reasoning is robust enough to withstand a harsh reading.
The Resulting Check: Running It Separately
Resulting deserves its own prompt because it's the most common distortion in post-match journal entries and the easiest one to miss in a general analysis. The general distortion prompt will catch it if it's extreme. The dedicated resulting check catches the subtler version.
Run this as a separate pass through the same journal entries:
"I'm going to paste a series of betting journal entries. Each entry contains pre-match reasoning and a post-match assessment. Analyse only the post-match assessment sections and answer these questions: Does the language used to describe decision quality change systematically based on whether the bet won or lost? Are wins more likely to be attributed to the quality of the analysis, and losses more likely to be attributed to external factors, variance, or information that wasn't available? When a bet loses, does the post-match assessment introduce considerations that weren't mentioned in the pre-match reasoning - and if so, does this pattern suggest hindsight rather than genuine learning? Compare the average length and detail of post-match assessments for winning bets versus losing bets. Quote specific contrasting examples where the same type of reasoning produces different post-match language depending on outcome. Do not produce a general assessment - answer each of these specific questions with evidence from the entries."
The question about considerations introduced after a loss is the most diagnostic. If your post-match assessments regularly mention factors that could have been anticipated but weren't included in the pre-match reasoning - and this pattern appears only or predominantly after losses - that's textbook resulting and hindsight bias operating together. The model finds this pattern reliably when the prompt asks for it specifically.
The length comparison is useful as a blunt instrument. If your post-match assessments for losses are consistently longer and more analytical than for wins, you're spending more effort explaining why losses aren't your fault than learning from them. If wins produce longer assessments, you're spending more time congratulating yourself than examining the reasoning quality. Neither direction is neutral and either one is worth knowing.
The Confidence Calibration Check
This is the third separate prompt worth running, and the most quantitatively concrete of the three.
Confidence miscalibration - describing your confidence level inaccurately relative to your actual win rate at that confidence level - is common and difficult to catch through self-assessment because you don't naturally track the relationship between stated confidence and outcomes across dozens of bets.
"I'm going to paste a series of betting journal entries. Each entry contains a stated confidence level and an outcome. Categorise each entry by the confidence level described - high, moderate, or marginal, or by the specific language used to convey confidence if I haven't used a standard scale. Then calculate the win rate within each confidence category. Compare the win rates across categories and tell me: whether my high confidence bets actually win at a meaningfully higher rate than my moderate or marginal confidence bets, whether my stated confidence levels show any systematic bias (consistently overconfident across all categories, underconfident in a specific category, or well-calibrated), and whether there are specific types of reasoning or specific competition categories where the calibration is markedly different from the overall pattern. If my sample size in any category is too small to support a reliable win rate calculation, say so and give the number of entries in that category."
The output from this prompt is sometimes the most confronting of the three. It's not unusual to discover that high confidence bets win at roughly the same rate as moderate confidence bets - or in the worst cases, at a lower rate. That finding means your confidence assessments are not carrying information about bet quality. They're carrying information about something else - perhaps how much you like the story, how recently the signal type has worked for you, how much time you spent on the analysis. The calibration prompt surfaces which of those it might be by showing where in the journal the confidence miscalibration is most pronounced.
Acting on the Output
The distortion analysis produces findings. The findings are only useful if they change something.
The change doesn't have to be dramatic. Cognitive distortions don't respond well to willpower-based solutions - the CBT article established that clearly. What they respond to is structural changes that make the distortion harder to execute without noticing.
If the analysis identifies resulting as a significant pattern, the structural change is a rule about post-match assessment timing and format. Write the post-match assessment before you know the result wherever possible - for bets settled at full time, write your quality assessment at half time based on whether the bet is going to plan, not just whether it's winning. If that's not practical, write the post-match assessment using a fixed template that asks the same questions regardless of outcome, rather than free-form writing that unconsciously adjusts to the result.
If it identifies narrative anchoring, the structural change is adding a mandatory counterargument section to your pre-match reasoning - one you're required to write before the analysis is complete, not one you can satisfy with a single dismissive sentence. The fixture screening tool's criterion that variables excluded from analysis must be explicitly noted rather than silently ignored serves the same purpose.
If it identifies confidence miscalibration, the structural change is replacing qualitative confidence language with explicit probability estimates - not "high confidence" but "I assess this as a sixty-two percent win probability against the market's implied fifty-three percent." Probability estimates are harder to inflate unconsciously than qualitative labels, and they're trackable in a way that reveals miscalibration faster.
One change at a time. Implementing structural corrections to three distortions simultaneously produces a journal maintenance burden heavy enough that you stop doing it, which eliminates the mechanism for catching the next layer of distortions. Fix the highest-ranked one. Run the analysis again after another thirty entries. See whether it's improving and whether anything new is surfacing.
Anyway. The journal is the most underused tool in serious recreational betting. Not because people don't know it matters - everyone knows it matters - but because keeping a reasoning journal in the format that makes distortion analysis possible is harder than keeping a results spreadsheet. The effort is exactly proportionate to the value. A results spreadsheet tells you what happened. A reasoning journal, fed to an LLM with the right prompts, tells you how you think - and whether how you think is working or costing you.
That's a different and more uncomfortable kind of information. It's the kind worth having.
FAQ
Q: How often should I run the distortion analysis?
Every thirty new entries, roughly. More frequent than that and the pattern sample is too thin to distinguish genuine patterns from noise. Less frequent and you're letting distortions run uncorrected for long enough to be costly. For most active bettors, thirty entries represents six to ten weeks of activity. The analysis itself takes maybe forty minutes including reviewing the output seriously. That's a reasonable maintenance investment for what it produces. If you're making structural changes based on the findings, wait until you have thirty entries from after the change was implemented before running the next analysis - you want enough data to see whether the change has shifted the pattern.
Q: What if the analysis finds distortions I don't recognise in my own reasoning?
Two possibilities. Either the model has identified something real that you're not aware of - which is exactly what the tool is for - or it's found a false pattern in the language that doesn't reflect the underlying reasoning. The way to distinguish them is to read the quoted passages yourself without the model's framing and assess whether the distortion interpretation is the most natural reading of what you wrote. If it is, the finding is probably real. If the interpretation feels forced or requires significant reading into your language, run the specifically-not-present-as-a-pattern instruction again with that distortion explicitly listed. If the model still finds it after being told to say so if absent, it's likely real. If it backs off, it was probably a weak finding from the initial prompt.
Q: My journal entries are inconsistent - some are detailed and some are brief because I wrote them under time pressure. Does this affect the analysis?
Yes, in a specific way. Brief entries written under time pressure tend to show less distortion in the text simply because there's less text to analyse, not because the underlying reasoning was cleaner. If brief and detailed entries are mixed throughout the journal, the model may identify patterns in the detailed entries that are actually present in all entries but invisible in the brief ones. Flag the brief entries to the model before running the analysis - "entries marked [brief] were written under time pressure and contain less detail than the others" - and ask it to weight its conclusions toward the detailed entries when assessing patterns. Alternatively, and this is the better long-term solution, set a minimum word count for entries and stick to it even under time pressure. Two hundred words takes four minutes to write. If a bet is worth placing it's worth four minutes of structured reasoning before or shortly after.