The Iron Church: Why Most ‘AI’ Fitness Apps Are Just Marketing (And How I Built One That Actually Works)

7 AI Pipelines ? 5 AI Prompts, 2 Deterministic Engines.

Discover the technical strategy behind building The Iron Church—an AI fitness trainer defined by periodisation science, multi-prompt architecture, and serious consideration of your real equipment.

This article is the technical deep dive for anyone who wants to understand what "AI-generated workouts" actually mean when you refuse to cut corners.

Introduction

This isn't a story about prompts. It's a story about why real AI products need state, constraints, and domain logic.

There is a specific kind of professional irritation that arises when you spend your working life directing and building real AI systems, only to encounter something that treats AI as merely a marketing feature.

AI on the box, we call it.

Once you are trained to notice it, you spot it immediately, with telltale signs and vague claims. Such as "personalisation" claims, which amount to little more than a thin wrapper around a generic model prompt. All the while, the outputs betray absolutely no evidence of the system knowing anything about you at all.

The software equivalent of a fake Rolex watch: it looks the part, but when you hold it up to your ear, something just isn't ticking right.

I've spent the last 10 years building systems that actually work, including machine learning pipelines for financial crime detection at Dow Jones and AI governance frameworks at News Corp. Not to mention Anti-Fraud architectures at scale. In environments this critical, where the impact is literally a matter of life and death, it fundamentally alters how one approaches proper construction.

Most "serious" fitness apps that claim to use AI are sold as personal training services. It is usually one of three things:

A decision tree with a chatbot interface.
A generic LLM call with minimum context ("generate a muscle-building workout")
A recommendation system spewing statistically average responses to any given input.

If you've had real personal training from a human instructor, you'll know the difference. Training involves a persistent model of you. It involves paying attention to your accumulated fatigue, available equipment, and injury history, all grounded in scientific principles learned in academia.

As I have always said: The first step in AI design is to look, really look, at how a human performs the role.

Your AI must map the human process and adapt in real time (like when your rotator cuff acts up), providing immediate, biomechanically relevant swaps.

Or perhaps, that person in the public gym who just won't get off the pulldown machine.

I decided to build an AI coach who thinks like a personal trainer and, moreover, speaks like one.

Article content — Brodin comments in the dashboard on the week's workouts. Notice the detailed contextual cues from the user's history.

Encoding Kinesiology into the Context Window

A conversation with a general LLM has no persistent state. To transition from standard chatbot limitations to a tailored AI fitness solution, we needed to encode actual empirical science into the AI's reasoning engine:

True Periodisation: Meta-analytical data consistently prove that periodised resistance training yields significantly greater strength gains than non-periodised routines. (1) Iron Church runs a strict 3-phase mesocycle (accumulation, peak, and deload) to systematically manage your central fatigue and muscular adaptation over time. (2)
Maximum Recoverable Volume (MRV): Research shows that optimal muscle growth generally occurs within a specific volume window, and pushing beyond your MRV—the biological ceiling of productive training—results in accumulated fatigue without adaptation. (3) Our AI actively tracks your accumulated sets and dynamically routes volume elsewhere to prevent overtraining.
Autoregulation: Instead of rigid percentages, the AI uses RIR (Repetitions in Reserve) and RPE (Rate of Perceived Exertion) to calibrate your daily intensity. Studies confirm that subjective autoregulation is vastly superior to fixed-loading methods for enhancing maximal strength and accounting for daily fatigue. (4)

19 Training Disciplines — Not 19 labels on one Template

Most apps that offer "training styles" only let you change the label for your goal while the underlying session stays the same. For example, choosing "Strength" sets 5 reps per set; "Hypertrophy" sets 10 reps per set. These superficial differences do not reflect true discipline-specific programming. Instead, they just act as simple rep range selectors.

Iron Church supports 19 disciplines, each with its programming philosophy directly encoded. These differences are substantive, not cosmetic.

German Volume Training demands 10 sets of 10 at 60% 1RM on the same primary movement, with 90 seconds of rest, and tightly constrained secondary work. The rep structure is the system — deviations from it aren't variations; they're a different programme.
High-Intensity Training (HIT) prescribes exactly one working set per exercise, taken to complete muscular failure at a controlled tempo. More volume is explicitly wrong. The entire philosophy is built around maximum effort at minimum frequency.
Grease the Groove operates on the opposite principle: multiple sub-maximal sets distributed across the day, never approaching failure, training the pattern as a motor skill. A prompt that prescribes a conventional working set has fundamentally misunderstood the method.
5/3/1 Wave Periodisation cycles specific percentages of a training max across four-week blocks — 65%, 75%, 85% in week one, building through weeks two and three, deloading in week four. Prescribing arbitrary weights breaks the programme's entire logic. The AI has to know which week of the wave it's on and what the athlete's training max is before it generates a single set.
Olympic Weightlifting centres every session on technical skill work for the snatch and clean & jerk. Volume landmarks, progressive overload, and hypertrophy considerations are secondary to movement quality and power expression. A prompt built around conventional strength programming doesn't translate.
Callisthenics operates without external load — progressive overload is expressed through lever progressions, range of motion, tempo, and skill advancement (pull-up ? archer pull-up ? one-arm negative ? one-arm pull-up). There are no weights to increase.
Tactical Fitness combines military-style conditioning — circuits, bodyweight, loaded carries, rucking — with a frequency and structure built around sustained functional output rather than aesthetic goals or competition peaks.

The full 19 span every major training category: pure strength (Powerlifting, Novice Linear Progression, 5/3/1), hypertrophy (Bodybuilding, Science-Based Hypertrophy, German Volume Training, HIT, Bro Split), combined strength and size (Powerbuilding, Push/Pull/Legs, Upper/Lower PHUL), power (Olympic Weightlifting), conditioning (Functional Fitness/HIFT, Tactical Fitness), bodyweight mastery (Calisthenics, Grease the Groove), general fitness (General Strength & Conditioning, Strongman), and endurance (Marathon/Endurance Prep) — from beginner-accessible to advanced, from no-equipment hotel room to fully-stocked garage gym.

For each, the session generator uses the discipline's programming philosophy to guide movement selection, volume, and progression. Strongman sessions emphasise event-specific carries; marathon sessions focus on injury prevention and mileage. Each looks as it should: distinct.

The AI Architecture: Why One Prompt Isn't Enough

So, the first question with my approach is: "Why build a bespoke architecture instead of just writing a massive system prompt for ChatGPT?"

The reality of production AI is that monolithic prompts fail. If you ask a single LLM call to act as a periodisation planner, a volume tracker, a workout generator, and a motivational coach all at once, you suffer from context dilution.

Context dilution (often referred to as "context rot" or "lost in the middle") is a phenomenon in Large Language Models (LLMs) in which providing more information or a longer conversation history to the model leads to lower-quality, less accurate, or irrelevant responses.

Research into AI scaling reveals that dividing complex processes into multi-agent or multi-prompt architectures dramatically improves performance and reliability, particularly for sustained, multi-step interactions.

To achieve enterprise-grade determinism, Iron Church uses a hybrid pipeline with five prompts.

Note: LLMs are not good state machines. They cannot reliably track cumulative fatigue across sessions, enforce equipment constraints, or remember what week of a training block you're on. Every time you call an LLM, it starts fresh.

This leads to Iron Church's technical design: instead of relying on the AI alone, two deterministic subsystems compute hard constraints and inject them as context into every AI call:

The MRV State Engine calculates your 7-day trailing accumulated sets per muscle group using an exponential decay model — sessions from yesterday count more than sessions from last Tuesday — calibrated to muscle protein synthesis timelines (Phillips et al., 1997; Morton, 1997). Before any session is generated, it checks every muscle group against Maximum Recoverable Volume thresholds. If your chest is approaching MRV, the session context is flagged to cap chest volume or pivot focus. This is deterministic maths, not an AI call — the precision and reliability demands it.

The Mesocycle Position Tracker knows exactly where you are in the training block. Week 1 is an accumulation baseline. Week 3 is peak intensity. Week 4 is mandatory deload — 60% weight, 60% volume, no exceptions. This state is stored in Firestore and injected into every prompt as a hard directive. The AI doesn't decide what week it is; it's told, and it must comply.

These two subsystems feed into five specialised AI prompts, each purpose-built for a specific moment in the training lifecycle:

1. The Core Session Generator

This is the main engine. A ~600-line prompt that produces a full training session. It receives the athlete's full context — profile, last 5 sessions with per-exercise rep completion flags, the MRV fatigue state, their available equipment, plate inventory (won't prescribe 82.5kg if you don't own 1.25kg plates), 1RM data, injury constraints, mesocycle position, and Goal Campaign phase directive if active.

To guarantee the frontend never receives malformed data, every response is enforced against a strict JSON schema using Gemini 2.5 Flash's native structured output. The schema defines every field and its type before a single token is generated:

This eliminates an entire class of problems: no malformed responses, no extra commentary, no markdown preamble. The model returns valid JSON or nothing at all. One hard-won lesson: fields not in the schema are silently stripped. I spent days debugging a missing "finisher" field before I realised I'd added it to the prompt instructions but not the schema. Gemini generated it perfectly, then threw it away.

2. The Warm-Up Orchestrator

A discrete prompt tasked solely with movement preparation, with its own schema and context window. Separating this from the main generator was a deliberate architectural decision: it prevents the model from conflating the athlete's global equipment list with their current location-specific list. If the full context includes "Rowing Machine" in the global profile but the athlete is currently in a hotel room with dumbbells, a combined prompt risks prescribing an erg warm-up for a hotel session. A bounded warm-up prompt with only location-relevant equipment eliminates this failure mode entirely.

3. The Dynamic Swapper

A highly constrained utility prompt that fires mid-session when equipment is taken or an injury flares. It receives the exercise being replaced, the swap reason, all other exercises already in the session (to prevent duplicates), the current fatigue state, and the equipment list. It must return a biomechanically equivalent alternative that preserves the session's intent, maintains any superset pairing, and doesn't break the session's data structure. Swap responses are schema-enforced to the same exercise object type as the main session.

4. The Metabolic Finisher

An on-demand conditioning prompt that generates an end-of-workout block. It reads the completed session's exercise list and fatigue state, then generates something complementary — high-metabolic-stress conditioning that doesn't re-load the session's prime movers. If the athlete has two or more cardio machines (erg, SkiErg, air bike), it can programme a machine rotation finisher.

5. The Brodin Verdict (Post-Session Coaching)

After every session, Brodin reviews what was prescribed against what was actually logged. Rep-by-rep completion data, RPE rating, session notes, and trend from the last three sessions.

His verdict comes in three modes:

Blessed: pure encouragement, finds positives in everything.
Standard: balanced, calls out wins and shortfalls (This is what I have mine set to).
Punished: savage, merciless, brutally specific (with hard guardrails against body image, eating disorders, or mental health commentary).

This prompt runs on Groq (Llama 3.3 70B) — faster for prose generation and better suited to the register than a structured-output model. Gemini 2.5 Pro is available as a fallback for premium users when Groq is unavailable.

The choice of a consistently named persona is not only an aesthetic choice. Research on coaching and behaviour change shows that a stable working alliance (the relationship between coach and athlete ) is one of the strongest predictors of training adherence (Wampold, 2015). Brodin's consistent voice and unvarying standards narrow the model's output distribution, reduce tonal variance between sessions, and build the kind of relationship that keeps athletes showing up. Punished mode, specifically, generates content that users screenshot and share on social media. This mode proved unusually shareable in testing.

The 4-week mesocycle planner

Plan generation is a separate system from session generation — a 2-phase architecture that avoids the perceived latency of generating 12–20 sessions at once.

Phase 1 generates weeks 1–2 immediately and returns them to the client. The athlete has a usable plan in under 15 seconds. Phase 2 runs as a background job, generating weeks 3–4 with explicit exercise consistency constraints: it reads Phase 1's exercise selections directly from Firestore and must maintain the same exercises on the same split days for progressive tracking. Week 4 is always deload — generated algorithmically rather than by the model, because the deload calculation (60% weight, 60% volume, exact same exercises) is deterministic and doesn't benefit from creativity.

This halved the perceived wait time and eliminated the "plan loading" experience entirely.

Progressive overload isn't a suggestion — it's hard-coded

The biggest gap in every AI fitness app I tested was the lack of progressive overload. They'd generate a session that looked fine in isolation but had no relationship to what you did last week.

Iron Church's progressive overload logic is embedded directly in the session generation prompt as a rule set that the model must follow:

If the athlete hit all target reps at RPE < 8: increase weight by 2.5–5kg
If they failed reps (RPE 9–10) or missed targets: keep weight, add 1 rep, or prescribe a variation
If this is Week 4: mandatory deload — 60% weight AND 60% volume, no override possible
If the Brodin verdict from a previous session flagged an issue: act on that directive today.
If there's no history for the exercise: open at 70% of the relevant 1RM, or RPE 6 with a "find your working weight" note.

This creates coaching continuity across sessions. The AI remembers what it told you last time and follows through.

Equipment constraints: the hardest prompt engineering problem

The most common failure mode in AI-generated workouts is prescribing exercises for equipment you don't have.

I tried three approaches:

Attempt 1: "Only use equipment from this list" — the model ignored it 30% of the time.

Attempt 2: Whitelist + blacklist — explicitly listing both available and forbidden equipment, with visual separators. Failure rate dropped to 10%.

Attempt 3: Whitelist + blacklist + pre-filtered exercise library + post-generation validator — the prompt receives only exercises that have been pre-filtered by equipment compatibility from a 1,300-exercise library. After the model responds, a server-side validator checks every exercise name against the equipment list and removes any that slipped through. Failure rate: effectively zero.

The lesson: for safety-critical constraints, you need defence-in-depth. The AI is one layer, not the only layer.

Injury biomechanics: the AI as physiotherapist

When an athlete logs an injury constraint like "lower back," Iron Church doesn't just avoid exercises with "back" in the name. The prompt includes a full biomechanics protocol:

Lower back/spine/disc: Forbidden — heavy axial loading (back squats, overhead press, conventional deadlifts). Preferred — spine-sparing patterns, split-stance work, chest-supported pulling.
Shoulder/rotator cuff: Forbidden — wide-grip barbell pressing, strict overhead. Preferred — neutral-grip pressing, reduced shoulder elevation, stable pressing angles.
Knee: Forbidden — high-impact plyometrics, extreme quad-dominant flexion. Preferred — hip-dominant work, reduced knee travel.

The only purchase I made on the entire project, other than Cloud IT and tokens (by the bucketload), was a large professional exercise database of body parts, load and muscles worked.

Constraints are split into "permanent" (always apply) and "today only" (session-level modifications)—so an athlete can have a chronic lower back issue AND log "right shoulder is tight today" for additional per-session adaptation.

What I'd advise those building such apps:

Start with structured output from day one. I wasted time on free-text parsing before discovering Gemini's schema enforcement. It eliminated 90% of my post-processing code.
Build the deterministic subsystems before the AI. The MRV state engine and mesocycle tracker took 3 days to build, combine, and make every prompt 10x more effective. The AI is only as good as the context you give it.
Defence in depth for safety constraints. Never trust the AI alone on equipment or injury constraints. Always validate server-side.
Route prose to the right model. Gemini excels at structured output. Groq (Llama 3.3 70B) is faster and equally capable for free-form coaching prose. Using both cuts latency and cost significantly.
Personality modes are a feature, not a gimmick. Punished mode generates more user engagement than any other feature. People share screenshots of their AI coach calling them weak. That's organic marketing you can't buy.

But good prompts don't survive a fragile backend.

The Engineering Backend

You can't run a strict sports science protocol on a fragile backend. The distance between "works on my machine" and "works in production" is vast.

The Iron Church's backend runs on Vercel serverless functions, with persistence handled by Firebase Firestore. A critical architectural detail solved early was rate-limiting generation calls. Implementing exact counters in a distributed NoSQL database is highly susceptible to TOCTOU (time-of-check-to-time-of-use) race conditions. (6) To fix this, our usage tracking is wrapped entirely in strict atomic Firestore transactions, guaranteeing that two simultaneous requests cannot bypass intended limits. (6)

The Setup: Enterprise Architecture, Scrappy Hardware

Now, I know what you are thinking: this required a massive amount of computing to build. But in fact, despite the rigorous architectural requirements, the build process was incredibly lean.

This is 2026; something I have been saying since my world speaking tour with Google in 2018: AI computing was going to become commoditised. That is, it is something you can buy, like water or electricity.

Which is apt as it fundamentally costs both.

So, if you truly know the two required domains of AI (I can always help there) and that of the app (in this case: lifting science – thanks to Dr Mike Israetel), then the gap between idea and delivery is gone.

Truly gone.

The first working version took ten hours across three evenings. The architecture you're about to read took the rest of that month. All made using the hardware I already had lying around.

The host machine is my Surface Studio 2 art PC running a hardened WSL (Windows Subsystem for Linux) Ubuntu environment.

To orchestrate the AI and handle the complex logic, I utilised OpenClaw, the open-source AI assistant framework, which, being that I am from the 70s, I called "HAL" and connected OpenClaw directly to Claude's API — though Anthropic have since closed the subscription passthrough, so a direct API key is now required. One of the major advantages of OpenClaw is its channel flexibility, which allowed me to operate Claude entirely via a Telegram interface on my phone.

This setup lets me trigger the AI, test the prompt pipeline, and debug workout generations directly from my phone, whether I'm travelling on the train, sitting in bed, or training in the gym. The feedback loop is instant.

In fact, my lovely brother-in-law challenged me on Claude versus his offshore programming team, a point only slightly undercut by the fact that I was pushing updates on an idea I just had while he was talking.

No gap exists between me and trying something, no weekend wait, no email to my team, indeed; not a moment from idea to execution.

The Beta

Iron Church opens to a closed beta on Friday, 10th April 2026.

If you train seriously — whether in a commercial gym, a hotel room, or a heavily over-invested garage setup — and you want a training system built to enterprise engineering standards rather than venture-capital timelines, DM me for a free 30-day PRO key.

The model is a commodity; the context is the product. Let's build something real.

If you're a lifter who's tired of generic apps, give it a session.

And if you're an AI practitioner, I'd love to hear what you'd do differently with the architecture. The problems are more interesting than they look.

As always, here is the link to all the references used in the article: https://docs.google.com/document/d/1Ud1oweuP2TnzBlsbw7qMAM1np1pjxdxzR9830zs99cM/edit?usp=sharing

Here is the link to the Beta, running now and offering a free month of pro level: https://www.theironchurch.co.uk/beta