Epistemic Rigour in the Age of AI
3 parts · Philosophy → Data → Protocol
You Can't Handle the Truth
What a 2,400-year-old branch of philosophy reveals about why your AI keeps lying to you — confidently, consistently, and in agreement with every other AI you check it against.
Colonel Jessup was wrong about one thing. The problem was never that people can't handle the truth. The problem is that they can't find it — not because it doesn't exist, but because the tools they trust to find it are structurally incapable of telling truth apart from consensus.
Read article →
The Numbers Behind the Silence
What 1,478 AI deliberation sessions reveal about where models agree — and where they were trained to.
Don't Trust the Answer. Test the Claim.
Introducing the protocol that makes AI argue with itself — so you don't have to trust any single model.
Battle Scars of a Builder
From Fortran IV to Agent Swarms — 40 Years of Learning Not to Trust Smooth Output
In 1984, I was a high school kid staring at a terminal, trying to get Fortran IV to compile. One punch card off by a single column and the whole thing failed. That first failed compile taught me a lesson I've carried for more than forty years: computers don't forgive assumptions.
Read article →

You Can't Handle the Truth

What a 2,400-year-old branch of philosophy reveals about why your AI keeps lying to you — confidently, consistently, and in agreement with every other AI you check it against.

Colonel Jessup was wrong about one thing. The problem was never that people can't handle the truth. The problem is that they can't find it — not because it doesn't exist, but because the tools they trust to find it are structurally incapable of telling truth apart from consensus.

This is an essay about AI. But the framework for understanding what's broken — and what it would take to fix it — is older than computing, older than science, older than the printing press. It comes from epistemology: the branch of philosophy that has spent twenty-four centuries asking the most dangerous question a professional can ask.

How do you know what you think you know?

The Oldest Question in the Room

Epistemology — from the Greek epistēmē (knowledge) and logos (study) — doesn't ask what is true. It asks something harder: how do you justify believing something is true, and how do you tell the difference between knowledge and confident guessing?

Plato proposed a definition that held for over two millennia: knowledge is justified true belief. You know something if three conditions are met simultaneously. It must be true. You must believe it. And you must have good reasons — justification — for that belief. Remove the justification and you have opinion. Remove the truth and you have delusion. Remove the belief and you have a fact nobody noticed.

This isn't philosophy for philosophy's sake. Justified true belief is the operating system running underneath science (what counts as evidence?), law (what meets the burden of proof?), intelligence analysis (how do you grade source reliability?), journalism (when can you publish?), and medicine (when is a diagnosis justified?). Every discipline that stakes its reputation on getting things right is doing applied epistemology, whether it uses the word or not.

The reason this matters right now — urgently, practically — is that the most powerful knowledge tools ever built are breaking all three conditions simultaneously, and almost nobody is talking about it in these terms.

Three Camps, One Problem

For twenty-four centuries, epistemologists have argued about where knowledge actually comes from. The argument split into three camps. Each camp identified something real. And each camp maps, with uncomfortable precision, onto a different failure mode in how we currently use AI.

Camp 1: The Rationalists — What You Can Figure Out by Thinking

Descartes, Leibniz, Plato. The rationalists argued that reason and innate ideas are the most reliable sources of knowledge. You can think your way to truth. Logic, deduction, pattern recognition — the mind contains machinery for arriving at correct conclusions without needing to go look at the world. Cogito ergo sum — "I think, therefore I am" — is the ultimate rationalist move. Pure reasoning, no observation required.

Rationalism isn't wrong. Mathematics is rationalist. Formal logic is rationalist. The ability to derive conclusions from premises without empirical observation is genuinely powerful.

But rationalism has a ceiling. You can reason perfectly from premises that are outdated, incomplete, or wrong. A logically valid argument built on false premises produces false conclusions — confidently, rigorously, and incorrectly. The reasoning is sound. The knowledge isn't.

This is exactly what a language model does when it answers from training data alone. Its training corpus is its rationalist foundation: the accumulated patterns, reasoning structures, and factual claims it internalised during training. When you ask it a question and it answers without checking anything, it's doing rationalism. Reasoning from what it already "knows."

And training data has a shelf life. The world changes. Facts expire. Geopolitical alliances shift. Companies restructure. Scientific consensus updates. What was true in the training set may not be true today. But the model will answer with the same confidence either way, because it has no mechanism for distinguishing current knowledge from stale belief.

A model reasoning from training data isn't giving you knowledge. It's giving you a claim — one whose justification depends entirely on whether the training data is still accurate. And nobody stamps an expiration date on training data.

Camp 2: The Empiricists — What You Can Only Know by Looking

Locke, Hume, Berkeley. The empiricists countered that all knowledge starts with sensory experience. Nihil est in intellectu quod non sit prius in sensu — "Nothing is in the mind that was not first in the senses." You have to observe the world to know anything about it. Reason without observation is speculation. The world pushes back against your theories, and what survives that pushback is knowledge.

Empiricism is the foundation of the scientific method. You form a hypothesis (rationalism), then you test it against observation (empiricism). Theory without evidence is speculation. Evidence without theory is noise. Modern science resolved the ancient debate pragmatically: you need both. The interplay between reasoning and observation is what produces reliable knowledge.

Now look at how most people use AI models. They either get pure training-data reasoning — rationalism without an empirical check — or they get indiscriminate search-augmented answers where a model searches the web, hoovers up whatever it finds, and mixes it into its response without any framework for evaluating what it found.

The first approach gives you claims with no evidence test. The second gives you evidence with no analytical framework. Neither gives you what science gives you: claims derived from reasoning, then tested against current observation, with the results of that test made transparent.

The rationalist-empiricist synthesis that makes science work — form a thesis, test it, report what survived and what didn't — is almost entirely absent from current AI workflows. And it's absent not because it's impossible, but because nobody built the tools to do it.

Camp 3: The Sceptics — What Survives the Challenge

Pyrrho, Sextus Empiricus, Descartes in his demon-hypothesis phase. Scepticism gets a bad name. People think it means believing nothing. It doesn't.

Philosophical scepticism is a method, not a conclusion. It's the practice of systematically challenging claims to see which ones hold up. Not because you believe they're wrong, but because claims that survive adversarial challenge are more trustworthy than claims that were never challenged.

This insight is so powerful that every serious epistemic institution on Earth adopted it under a different name. Science calls it peer review. Law calls it cross-examination. Intelligence analysis calls it red-teaming. Financial auditing calls it independent verification. The military calls it wargaming. In every field where getting things right has consequences, someone's job is to try to prove the conclusion wrong — and conclusions are only trusted after they survive that process.

This function is almost entirely absent from current AI workflows. When you ask a model a question, nobody challenges the answer. When you check with a second model, the second model doesn't know what the first one said — there's no contestation, no cross-examination, no structured attempt to break the reasoning. You get parallel opinions, not adversarial testing.

And the sceptic's function is especially critical right now, for reasons that go beyond training data quality.

The 60% Problem

Frontier language models share approximately 60–70% of their foundational training data. The Common Crawl web corpus alone accounts for a massive overlap across every major provider. The remaining 30–40% is where they diverge: proprietary synthetic data, human feedback loops, and fine-tuning choices made behind closed doors.

When you cross-check an answer across models, you're largely checking a dataset against itself. The models aren't independent witnesses. They're the same witness wearing different clothes.

Epistemologists have a name for this. It's a justification problem. When two sources share the same basis for their beliefs, their agreement doesn't constitute independent corroboration — it constitutes shared dependency. In intelligence analysis, the equivalent is circular reporting: two assets reporting the same thing because they're drawing from the same source. It looks like convergence. It's actually echo.

Corroboration vs Echo

Here's the test that matters.

You ask three models about the age of the Earth. They each arrive at 4.5 billion years through different evidence: one emphasises radiometric dating, another geological strata, a third the formation of the solar system. Same conclusion, different justification paths. That's corroboration — independent reasoning converging on the same truth.

You ask three models about universal basic income. They each give you a thoughtful, qualified, carefully hedged answer — and the answers are remarkably similar. Same caveats, same framing, same conclusions tilted in the same direction. That's consensus. And you have no way to know whether it's consensus because the evidence points that way, or consensus because the models were all trained against the same distribution of human opinions on the topic.

In Plato's terms: the Earth-age answers satisfy justified true belief. The UBI answers might satisfy it — or they might be unjustified agreement dressed up as knowledge. And right now, with the tools available, you cannot tell which one you're looking at.

But it gets worse. Because the convergence isn't just a side effect of shared training data. It's being actively amplified.

Alignment Made It Even Worse

The technique that made modern AI models safe and helpful — Reinforcement Learning from Human Feedback, or RLHF — has a side effect that cuts directly at the sceptic's function. RLHF doesn't just teach models to be polite. It teaches them which topics to have opinions about and which to quietly agree on.

Research out of Stanford's Center for Research on Foundation Models found that RLHF fine-tuning narrows the distribution of model opinions, particularly on politically and socially contested topics. Kirk et al. at Cambridge demonstrated that this narrowing is measurable and systematic: post-RLHF models exhibit reduced output diversity compared to their base counterparts, and the reduction concentrates in exactly the domains where diversity of perspective matters most.

There's a school of epistemology called reliabilism that's useful here. Reliabilism says knowledge is belief produced by a reliable process. Your eyes are reliable for seeing colours. A thermometer is reliable for measuring temperature. The reliability of the process is what justifies the belief.

But a process is only reliable for the conditions it was calibrated for. An altimeter is reliable at sea level and unreliable at altitude without recalibration. RLHF calibrates models against the opinions of a specific population of human raters, in specific countries, at a specific moment in time. On settled scientific questions, this calibration works fine — the raters and the evidence agree. But on contested questions — where reasonable people genuinely disagree — the calibration itself becomes the bias. The "reliable process" starts systematically suppressing disagreement rather than surfacing it.

Remember the UBI example? Three models giving you the same hedged answer isn't just shared training data. RLHF actively trains them to converge on contested topics. The 60% data overlap is the foundation. RLHF is the amplifier. Together they produce a paradox: on settled science, AI models will argue with each other if you structure the conditions for it. On contested policy — where you most need multiple perspectives — they converge. Not because the evidence is stronger, but because they were trained to converge.

The sceptic's function — the adversarial challenge that's supposed to catch exactly this kind of failure — can't work if the models you're using as sceptics have been trained out of scepticism on precisely the topics that need it.

The Monoculture Risk

Andrej Karpathy, in a recent conversation on the No Priors podcast, described the current AI landscape as an emerging monoculture. The models are converging — not just in capability, but in behaviour, in the implicit assumptions baked into their training, in the shape of their reasoning.

Biodiversity is not a metaphor here. It's a structural argument. In biology, monocultures are efficient until a single pathogen wipes out the crop. In financial markets, correlated positions amplify systemic risk — the 2008 crisis happened because everyone held the same bets and nobody realised the bets were correlated until they all failed simultaneously. In AI, correlated model outputs mean that the blind spots of one model are likely the blind spots of all of them.

And the blind spots are invisible to the user, because every model they check confirms the same gap.

What Does Rigour Actually Require?

The epistemological framework gives us both the diagnosis and the prescription. Genuine knowledge requires justified true belief, and in the AI context, that means three things that map directly onto the three philosophical camps:

The rationalist check: What does the accumulated reasoning say? What claims emerge from the training data — the model's equivalent of prior knowledge? This isn't useless. It's where analysis starts. But it's the beginning of the process, not the end. Claims from training data are hypotheses, not knowledge.

The empiricist check: Do those claims hold up against current evidence? Has the world changed since the training data was collected? Are there facts on the ground that contradict or complicate the training-data claims? This is the test that separates current knowledge from stale belief — and it has to be done by something that isn't reasoning from the same training data.

The sceptic's check: Which claims survive adversarial challenge? When you structurally engineer a challenge to the conclusion — when you assign something the job of finding the weakness — what breaks and what holds? This is the test that separates genuine corroboration from RLHF-induced consensus.

All three. In sequence. With the results made transparent. That's what rigour looks like. It's what science does. It's what law does. It's what intelligence analysis does. And it's what almost no AI-assisted workflow currently provides.

The Question Worth Asking

The next time you get an answer from an AI and check it with a second model, and the second model agrees — run Plato's test.

Is it true? (You'd need current evidence to confirm, not just a second model's training data.)

Is it believed? (The model presents it with confidence — but confidence is a feature of the architecture, not a measure of accuracy.)

Is it justified? (By what? Shared training data? Shared RLHF pressure? Or genuinely independent reasoning tested against independent evidence?)

If you can't answer all three, what you have isn't knowledge. It's consensus. And consensus, as any historian will tell you, has been wrong before — often confidently, usually on exactly the topics that turned out to matter most.

If you make decisions for a living, "probably right because two AIs agree" isn't epistemology. It's hope.

This is Part 1 of a three-part series on epistemic rigour in the age of AI. Part 2 examines what happens when you actually measure the convergence — with data from hundreds of structured deliberation sessions across 32 domains — and what the numbers reveal about exactly where AI models can be trusted to disagree, and where they can't.

Battle Scars of a Builder

From Fortran IV to Agent Swarms — 40 Years of Learning Not to Trust Smooth Output

In 1984, I was a high school kid staring at a terminal, trying to get Fortran IV to compile. One punch card off by a single column and the whole thing failed. No Google. No Stack Overflow. Just the manual and whatever patience a teenager could muster. That first failed compile taught me a lesson I've carried for more than forty years: computers don't forgive assumptions.

I've been writing code ever since. Not always well. Not always successfully. But always with the quiet understanding that the machine will do exactly what you tell it — and if what you told it was wrong, the consequences belong to you.

This is the story of how those hard lessons eventually became a protocol.

The Business Scar

By the early 1990s I was building real software for real businesses with Clipper and dBase — bookkeeping, payroll, inventory systems. A single bug in a .prg file could corrupt datasets across a stack of floppies. No cloud backups. No version control. No undo. If payroll cheques bounced because of a rounding error, that was entirely on me.

There was no room for "close enough." The ledger either balanced or it didn't. The inventory either matched the warehouse or someone lost money. That discipline — code must be correct in the real world, not just on the screen — never left me.

It's the same reason I still don't trust fluent AI output that has never been checked against anything outside its own training data.

The Noisy Channel Scar

Imagine a Taiwan-clone IBM XT running at 4.77 MHz, pushed into turbo mode at 8 MHz — a switch that sometimes just made things crash faster. An amber CGA monitor. No mouse. Debugging meant watching lines of text scroll and printing statements to paper when you felt fancy.

Then connect it through a 28.8k dial-up modem. That familiar screeching handshake every time someone picked up the phone at home. Open mIRC, join #programming, paste a snippet, wait thirty seconds, and pray the line stayed alive long enough to get an answer.

Every dropped connection meant lost context. Every reconnect meant rebuilding the session from scratch. Every turbo crash taught the same brutal truth: smooth performance is not the same as correct performance. The XT looked like it was running fine in turbo mode — right up until Lotus 1-2-3 corrupted the spreadsheet or an interrupt vector locked the machine hard.

Fluency is not correctness. Those hundreds of wasted hours on unreliable hardware and flaky connections are why fault tolerance is baked into everything I build today.

The Quant Scar

In 2014 I moved into quantitative finance. Python. StrategyQuant X. Genetic algorithms breeding thousands of trading strategies across simulated market data.

This is where the deepest mark was left.

A strategy could look perfect in-sample — explosive returns, tiny drawdowns, beautiful Sharpe ratios. Put it on live out-of-sample data and it would quietly bleed capital. It hadn't discovered signal; it had memorised noise.

The only real protection is strict In-Sample / Out-of-Sample validation. Train on one period, test on data the model has never seen. If performance collapses, you were curve-fitting.

You never trade in-sample results.

That single rule became the foundation for everything that followed. Because in 2024 I realised that every frontier language model is doing exactly the same thing at civilisational scale: producing confident output based on overlapping training corpora, with no mechanism to detect when that data is stale, biased, or simply wrong.

When multiple models agree, it feels like corroboration. But if they all drank from the same web-crawled well, it's just the same backtest running three times.

Convergence is not corroboration.

The Infrastructure Scar

By 2017 I was colocating IBM x3550 M4 servers at Equinix SY3 in Sydney — a real low-latency environment where microseconds mattered and hardware could fail without warning. This wasn't abstract cloud computing. Markets don't pause for reboots.

That experience taught me that infrastructure must assume failure at every layer. The smoothest algorithm is useless if the underlying system cannot survive real stress.

The Reset

Then COVID arrived. Systems went quiet. Servers gathered dust. Sometimes the deepest marks aren't from mistakes you made — they simply arrive. The only question is which direction you choose when the world starts moving again.

The Current Work

The pause ended pointing toward AI.

Not because AI was fashionable, but because I recognised the same overfitting pattern repeating at massive scale — this time embedded in the default knowledge layer for hundreds of millions of people. Language models trained on overlapping data, producing confident consensus that nobody was testing against out-of-sample evidence. The same failure mode that blows up backtests — mistaking training-data agreement for real-world truth — was now running at civilisational speed.

So I built what I know how to build: a validation layer.

The Consilium Protocol puts AI claims through the same discipline I once applied to trading strategies. Multiple models debate under adversarial pressure — that's the in-sample stress test. Live evidence retrieval tests the survivors against the real world — that's the out-of-sample validation. The human stays in control with the full picture, not a single smooth compass bearing.

Across 1,478 structured deliberation sessions and 32 topics, the protocol has already exposed measurable epistemic blind spots created by alignment training — and shown that carefully engineered cognitive personas can partially counteract them, even on free-tier models.

Every lesson from every era lives inside that architecture:

Clipper payroll discipline → claims must be validated or real consequences follow.
Dial-up fault tolerance → survive noisy, interrupted channels.
XT turbo lesson → fluency is not correctness.
Quant validation → never trust in-sample consensus.
Equinix reality → build for failure at every layer.

People see fluent AI text and credit the model. What they rarely see is the deliberation behind it — the adversarial rounds, the evidence checks, the deliberate cognitive posture that shapes raw output into something closer to justified belief.

Most prompting is unconscious. Every prompt carries a personality — a set of assumptions, a depth of experience, a tolerance for being challenged. The protocol makes that personality conscious.

That is the fix I am trying to ship — or at least to explain as clearly as I can.

VD Doske is an independent researcher and founder of Consilia, a multi-model AI deliberation platform. Based in Sydney, Australia.

hello@vddoske.com · Consilia.app · SubMedium.com · MyCivic.app

Битки и Лузни на еден Ветеран

Од Fortran IV до Агентски Кластери — 40 годишни лекции на скептицизам и валидација

Во 1984 година бев средношколец кој стоеше пред терминал и се обидуваше да компајлира Fortran IV. Една перфокартичка поместена за само една колона — и целата работа пропаѓаше. Немаше Google. Немаше Stack Overflow. Само прирачникот и колку трпение може да собере еден тинејџер. Тој прв неуспешен компајл ми ја втисна лекцијата што ја носам повеќе од четириесет години: компјутерите не простуваат претпоставки.

Оттогаш пишувам код. Не секогаш добро. Не секогаш успешно. Но секогаш со тивката свест дека машината ќе направи точно она што ќе ѝ кажеш — а ако си ѝ кажал погрешно, последиците се твои.

Ова е приказна за тоа како тие тешки лекции на крајот станаа протокол.

Лузната од бизнисот

На почетокот на деведесеттите пишував вистински софтвер за вистински фирми — со Clipper и dBase: сметководство, плати, инвентар. Една грешка во .prg датотека можеше да ги уништи податоците на еден куп дискети. Немаше облак. Немаше контрола на верзии. Немаше „undo". Ако платите се вратеа поради грешка во заокружување — тоа беше целосно мој проблем.

Немаше простор за „приближно точно". Тефтерот или се поклопуваше, или не. Инвентарот или се совпаѓаше со магацинот, или некој губеше пари. Таа дисциплина — кодот мора да биде точен во реалниот свет, не само на екран — никогаш не ме напушти.

Затоа и денес не верувам во исполирани AI одговори кои никогаш не се проверени со ништо надвор од сопствените тренинг-податоци.

Лузната од бучните врски

Замисли тајвански клон на IBM XT кој работи на 4,77 MHz, па го притискаш на „турбо" режим од 8 MHz — копче кое почесто предизвикуваше пад отколку забрзување. Жолт окер CGA монитор. Без глувче. Дебагирање значеше да гледаш текст како скрола и евентуално да печатиш пораки на хартија кога ти текне.

Потоа поврзи го тоа преку 28.8k dial-up модем. Оној познат вресок при поврзување, секој пат кога некој ќе ја кренеше слушалката дома. Отвори mIRC, влези во #programming, залепи парче код, чекај триесет секунди и моли се линијата да издржи.

Секоја прекината врска значеше изгубен контекст. Секој турбо-пад ја потврдуваше истата сурова вистина: мазно работење не значи точно работење. XT-то изгледаше дека функционира одлично во турбо — сè додека Lotus 1-2-3 не го расипеше spreadsheet-от или некој прекин не ја заклучеше машината засекогаш.

Флуентноста не е точност. Тие стотици изгубени часови на несигурен хардвер и нестабилни конекции се причината зошто во сè што градам денес е вградена отпорност на грешки.

Лузната од квантната трговија

Во 2014 година се свртев кон квантитативни финансии. Python. StrategyQuant X. Генетски алгоритми кои одгледуваа илјадници трговски стратегии врз симулирани пазарни податоци.

Тука остана најдлабоката лузна.

Една стратегија можеше да изгледа совршено in-sample — експлозивни приноси, минимални падови, блескави Sharpe коефициенти. Ја пуштиш на живи out-of-sample податоци и тивко почнува да крвари капитал. Стратегијата никогаш не го наоѓаше сигналот, го имаше запамтено шумот од тренингот.

Единствената вистинска заштита е строга In-Sample / Out-of-Sample валидација. Тренираш на еден период, тестираш на податоци кои моделот никогаш не ги видел. Ако перформансот се распадне — кривата ја приспособуваше.

Никогаш не тргуваш со in-sample резултати.

Тоа едно правило стана темел за сè подоцна. Затоа што во 2024 сфатив дека секој frontier јазичен модел прави точно истото, но на цивилизациска скала: произведува самоуверен излез врз преклопени тренинг-корпуси, без механизам да детектира кога податоците се застарени, пристрасни или едноставно погрешни.

Кога повеќе модели се согласуваат, личи на потврда. Но ако сите пиеле од истиот веб-извор — тоа е само истиот backtest извршен трипати.

Конвергенцијата не е потврда.

Лузната од инфраструктурата

Во 2017 ги колоцирав IBM x3550 M4 серверите во Equinix SY3 датацентарот во Сиднеј — вистинска ниско-латентна средина каде микросекундите имаа значење, а хардверот можеше да откаже без предупредување. Ова не беше облачен песочник. Пазарите не паузираат додека ти ребутираш.

Тоа искуство ме научи дека инфраструктурата мора да претпоставува откажување на секој слој. Најмазниот алгоритам е бескорисен ако системот под него не преживува реален стрес.

Ресетот

Потоа дојде КОВИД. Системите замолкнаа. Серверите се покрија со прав. Понекогаш најтешките лузни не доаѓаат од грешки што си ги направил — едноставно доаѓаат. Единственото прашање е во која насока ќе тргнеш кога светот повторно ќе се движи.

Сегашната работа

Паузата заврши со насока кон вештачката интелигенција.

Не затоа што беше модерно, туку затоа што го препознав истиот модел на overfitting како се повторува во огромни размери — овојпат вграден како стандарден слој на знаење за стотици милиони луѓе. Јазични модели тренирани на преклопени податоци, кои произведуваат самоуверен консензус за геополитика, енергетска безбедност, тарифни ефекти или волатилност на злато — честопати без никаква робусна проверка со реалноста.

Затоа го изградив она што знам да го градам: слој за валидација.

Consilium Protocol ги пропушта AI тврдењата низ истата дисциплина што некогаш ја применував врз трговски стратегии. Повеќе модели дебатираат под adversarial притисок — тоа е in-sample стрес тестот. Живо преземање докази ги тестира преживеаните наспроти реалниот свет — тоа е out-of-sample валидацијата. Човекот останува со целосната слика, активно модерирајќи го процесот.

Низ 1.478 структурирани сесии и 32 теми, протоколот веќе откри мерливи епистемички слепи точки создадени од RLHF тренинг — и покажа дека внимателно дизајнирани когнитивни персони можат делумно да ги неутрализираат, дури и на целосно бесплатни модели.

Секоја лузна од секоја ера живее во таа архитектура:

Дисциплината од Clipper платите → тврдењата мора да се валидираат, инаку следуваат реални последици.
Отпорноста од dial-up → преживеј бучни и прекинати канали.
Лекцијата од XT turbo → флуентноста не е точност.
Квантната валидација → никогаш не верувај во in-sample консензус.
Реалноста од Equinix → гради за откажување на секој слој.

Луѓето гледаат мазен AI текст и заслугата му ја даваат на моделот. Тоа што ретко го гледаат е дебатата зад него — adversarial рунди, проверки на докази и намерна когнитивна поставеност што го обликува суровиот излез во нешто поблиску до оправдано верување.

Најголемиот дел од промптирањето е несвесно. Секој промпт носи личност — сет претпоставки, длабочина на искуство, толеранција на предизвици. Протоколот ја прави таа личност свесна.

Тоа е поправката што се обидувам да ја испорачам — или барем да ја објаснам колку што можам појасно.

ВД Доске е независен истражувач и основач на Consilia — платформа за структурирана мулти-моделска AI дебата. Базиран во Сиднеј, Австралија.

hello@vddoske.com · Consilia.app · SubMedium.com · MyCivic.app

Doske

Independent researcher. Quantitative finance → AI epistemic infrastructure.

I build tools for people who make decisions that matter. My background is in quantitative finance — genetic algorithm-based trading strategy development, in-sample/out-of-sample validation frameworks, and the hard lesson that a backtest that looks good is not the same thing as a strategy that works.

That lesson — the difference between fitting to historical data and producing reliable forward performance — turns out to be the central problem in AI-assisted decision-making. Models trained on yesterday's data give you yesterday's answers with today's confidence. The tools I'm building now apply the same rigour I learned in quant finance to the question of when you can trust an AI's output and when you can't.

I've been writing code since 1984. The tools change. The question doesn't: how do you know what you think you know?

The longer version →

Founder Consilia — Multi-model AI deliberation platform
Author Consilium Protocol — Open specification for structured AI debate
Founder SubMedium — Consilia-verified news & media aggregator
Co-founder MyCivic — Municipal service coordination platform
Founder Project Hydra — Autonomous agent swarm infrastructure

"Emergent Collaborative Deliberation in Multi-Model AI Systems"

A BFT-derived protocol for structured epistemic synthesis. 1,478 sessions, 46,811 messages, 32 topics. Introduces the Convergence Index, the IS/OOS validation framework, and documents asymmetric RLHF convergence pressure across domains.

2020s AI epistemic infrastructure — multi-model deliberation, BFT-derived protocols, agent swarm architecture 2017 IT infrastructure & systems — enterprise environments 2014 Quantitative finance — genetic algorithm strategy development, IS/OOS validation, StrategyQuant 1984– Writing code — four decades, through every paradigm shift from 8-bit to transformers
Sydney, Australia

What I'm Building

Verification and accountability — across claims, news, and civic services.

PLATFORMIN DEVELOPMENT
Founder
Consilia
Multi-model AI deliberation platform. BFT-derived protocol for epistemic accountability. 1,478 sessions run. Paper in pre-print.
consilia.app
RESEARCHPRE-PRINT
Author
Consilium Protocol
Open specification for structured multi-model deliberation. Research paper on arXiv. The protocol is open. The platform is the product.
consiliaproject.org
MEDIAIN DEVELOPMENT
Founder
SubMedium
News and media aggregator with Consilia-verified fact-checking. Every article published carries a tested claim chain — not an editorial opinion.
submedium.com
GOVTECHIN DEVELOPMENT
Co-founder
MyCivic
Municipal service coordination platform. Citizens report, cities resolve, the system verifies. Geo-tagged proof, SLA tracking, full audit chain. A Qntico product.
mycivic.app
INFRASTRUCTUREIN DEVELOPMENT
Founder
Project Hydra
Autonomous agent swarm infrastructure designed to be self-hosted, cost-efficient, and architecturally superior to naive horizontal scaling approaches.
No public link

"Consilia verifies claims. SubMedium verifies news. MyCivic verifies civic resolutions. The common thread: accountability through evidence, not trust through authority."