I accidentally built a system that thinks with me

This project started as something completely different. I was building a monitoring daemon called G.O.D. (Global Observation Daemon), an LLM-powered system that watches your server and predicts failures.

Now you might think I'm crazy, but hear me out. The concept was inspired by Old Testament governance patterns. Yes, the Bible. Covenants as binding contracts. Prophets as domain-specific monitoring agents. Faith as a trust metric that grows with prediction accuracy. Smiting as automated error correction.

Look, the Old Testament is basically the longest-running distributed coordination system in human history. It managed millions of people across centuries with no internet, no Slack, and no standups. I wanted to see if those behavioral patterns could be mapped onto software. Could you take societal systems that humans have used for thousands of years and adapt them into something a computer understands? It sounds insane until you realize that covenants are just SLAs, prophets are just alerting agents, and faith is just a confidence score. The terminology is dramatic but the patterns are real.

It was a fun experiment. It was also full of holes that I couldn't see because I was too close to it.

To stress-test the concept before writing any code, I ran it through a simulated stakeholder panel using Claude. A CEO looking for differentiation. An engineering lead evaluating feasibility. An autistic intern with no political filters. A pessimist whose job was to find every way the product fails. A marketing lead thinking about positioning.

The first round reshaped the product completely. Then I threw it to a simulated Reddit audience. "Prometheus with a Bible skin." "AI slop." "What does this do that a bash script cannot." The simulated redditors destroyed the pitch in ways the internal team never would.

Then I threw it to a simulated TikTok audience. A late comment with 22,000 simulated likes pointed out that our local LLM inference would eat the server resources we were supposed to be monitoring. A fundamental architectural contradiction that nobody on the internal team had caught.

The product that came out of those rounds was unrecognizable from what went in. And I realized: the process was more valuable than the product. The process is general-purpose. It works on any idea that will eventually face real stakeholders.

That's how Crucible was born.

What Crucible Does

You define an idea. You compose a panel of personas. You run the idea through sequential rounds of adversarial vetting. Each round applies a different kind of pressure. Between rounds, you can pause, inject new scenarios, introduce a persona you hadn't planned for, or resurrect a dismissed objection.

The central artifact is a branching tree that shows how the idea evolved under pressure. Every node is the state of the idea at a specific point. Every branch is a divergence caused by unresolved disagreement or an external challenge. Branches can be alive, dead, or dormant (available for resurrection).

The output isn't a polished idea. It's a pressure-tested idea with an audit trail of every challenge it faced and how it adapted.

How the Rounds Actually Work

This is the part that makes it different from just "asking an AI for feedback." The rounds are sequential. Each one builds on the last. The order matters.

Round 1: Internal meeting. Your idea goes to the internal panel first. The CEO, the engineer, the pessimist, the intern. They each attack it from their incentive position. The CEO says "where's the differentiation?" Engineering says "this requires three things that don't exist yet." The intern says "I don't understand why anyone would use this." The pessimist says "here are six ways this fails." At the end of the round, the tool forces a convergence: what did the panel agree on? What did they reject? What changed? The idea that comes out of this round is already different from what went in.

Round 2: Reddit. Now you take that refined idea and throw it to a hostile public. Simulated redditors who have no investment in your success. They're short, brutal, competitive for upvotes. "This already exists, it's called X." "So it's basically Y but worse?" "Show me benchmarks or this is vibes." The redditors surface a completely different class of problems: positioning failures, the "why should I care" test, and the comparison trap. Things your internal team would never say because they already care about the idea.

Round 3: Back to internal. Here's where it gets interesting. You take all that Reddit feedback and bring it back to the internal panel. Now the CEO has to respond to "this already exists." Engineering has to respond to "show me benchmarks." The intern says "the redditors are right about this one thing." The pessimist says "I told you." The panel reconverges with the external pressure incorporated. The idea reshapes again. Things that survived the internal round get killed by Reddit. Things the team dismissed get validated by the public.

You can keep going. Throw it to a TikTok audience for emotional reaction. Run it through an investor pitch for business model pressure. Do a hostile Q&A as a final stress test. Each round is a different environment with different rules, and the idea keeps evolving.

The key insight is the back-and-forth. Internal pressure shapes the idea. External pressure breaks it. Internal pressure rebuilds from the wreckage. That cycle is what produces resilience that a single round of feedback never could.

Incentive-Driven Personas, Not Perspective-Driven

This is the core design choice. A "skeptical engineer" is a perspective. A person whose career depends on not shipping something that breaks in production is an incentive structure. The difference determines whether the persona generates polite objections or genuinely threatening ones.

Each persona is defined by three things:

What they optimize for. The CEO optimizes for differentiation. Engineering optimizes for feasibility. The customer optimizes for solving their problem with minimum friction.
What they fear. The CEO fears building something indistinguishable from competitors. Engineering fears shipping something that breaks.
What they have no stake in. This is where the sharpest feedback comes from. The intern has no stake in company politics. The redditor has no stake in whether the product succeeds. Indifference to certain outcomes is what allows a persona to be honest about those outcomes.

The Tech

The stack is Bun + Hono for the backend, SolidJS for the frontend, SQLite (via Drizzle ORM) for storage, and Groq for the LLM calls. The tree visualization uses D3-dag for DAG layout (it handles merging convergence branches that a normal tree library can't).

For the LLM, I use Groq's API through the OpenAI-compatible SDK. Persona responses stream back via SSE so you see each persona react in real time as the round runs. The model handles both the persona simulation and the convergence synthesis.

Why It's Still a Work in Progress

The hardest part isn't the AI. It's the tree. Visualizing a directed acyclic graph where branches split, merge, die, and come back to life is a real UI challenge. D3-dag handles the layout math, but making it feel intuitive to navigate, zoom, click into nodes, compare branches, that's where most of the remaining work lives.

The persona simulation works well. The convergence logic (synthesizing what the panel agreed on, rejected, and changed) works. The tree state management works. The visual experience of exploring that tree is what needs polish.

What I Learned From the Accident

The original G.O.D. daemon is still on the backlog. Maybe one day I'll actually build the thing that monitors servers using biblical governance patterns. The README would be incredible.

But the real discovery was that the stress-testing process itself is the product. Ideas die in public for problems that were discoverable in private. Most ideation processes (brainstorming, feedback rounds, focus groups) share a structural flaw: the people in the room share too many assumptions. They pull punches because of social dynamics.

Real vetting happens when an idea meets people who have no stake in its success, conflicting incentive structures, and no social cost for being brutal. That environment currently only exists by actually shipping the idea and watching it collide with reality. Crucible simulates that collision before it happens.

Sometimes you set out to build one thing and accidentally discover something more useful. I set out to make a server daemon that smites misbehaving processes. I ended up with a tool that smites bad ideas instead. Honestly, that's probably more useful.