Graph Projection without Graph Worship

cahoover@gmail.com (Christopher Hoover) — Fri, 03 Apr 2026 00:00:00 +0000

(Being what I hope is a catchier title than “Why we stopped treating the graph as the center of the system.”)

When we began building Research Tool, the graph was seductive because it was the most visible, queryable, integrated surface. It looks like the place where everything should live. The first iterations of RT used the graph as the source of truth and the center of gravity. More or less everything revolved around it. I even subscribed to Neo4J marketing emails.

But we discovered that the graph as center of gravity was pulling too many responsibilities into itself:

truth storage
orchestration assumptions
retrieval semantics
lineage
application state
product meaning
and on
and on

That actually felt kind of elegant at first. Then it started to chafe, because too many concerns were getting entangled.

Projection logic started bleeding into truth. Storage decisions started shaping product behavior. Retrieval behavior became harder to separate from graph layout. Debugging got harder, because we couldn’t tell if a problem lived in the source artifacts, the projection pipeline, the graph model, or the consuming service. Replay became harder.

Graphs are genuinely great, but they are so expressive that it’s easy to overreach. Because they can represent almost anything, it’s easy to let them absorb responsibilities that should have remained distinct.

We began to struggle with heisenbugs, and it felt like everything was getting harder. (An aside: this was my introduction to the concept of “heisenbugs.” They’re awful, but what a clever name, huh?)

After beating our heads against a wall for too long, we were forced to step back and reconsider the role of the graph in our platform. We decided the answer was that the graph is not a canonical store; it is a projection.

The graph is an excellent planning surface, exploration surface, and a derived integration surface. It is great at helping traverse structure, discover relationships, and bound useful work. It is great at making meaning navigable. But for us it’s not where truth should live, and it is not where every contract in the system should collapse together.

Once we accepted that, which was surprisingly painful and anxiety-provoking, the architecture got cleaner. [Cue major-key background music]. We moved truth into durable artifacts and producer-owned contracts, and ensured the graph could be rebuilt from those artifacts (without using GraphAR, which was also painful, but we got a lot more flexibility). Projection became something we could replay, inspect, and change without fear of mutating the meaning of the whole system.

I don’t want to treat our journey as some sort of groundbreaking insight, but things did get easier. Services became easier to reason about because producer and consumer boundaries were sharper. Debugging was more straightforward because we could ask simpler questions: was the source wrong, the contract wrong, the projection wrong, or the read model wrong? (Sometimes the answer to that is “yes”).

Net: The graph is no longer the center of the system; it is one member of set of lenses over the system.

Research Tool: The Annotation Substrate

cahoover@gmail.com (Christopher Hoover) — Thu, 19 Mar 2026 00:00:00 +0000

Most ‌people ‌hear ‌“annotation” and picture a sticky note, a little comment bubble hanging off the margin. Extra metadata you tack on afterward. The kind of feature a team adds in Sprint 14 because a customer asked for “collaboration.”

We don’t treat it that way. When someone is working through dense material, legislation, regulatory filings, contracts, or even messy quantitative observations, the real value rarely sits in the raw source. It sits in the judgment and connections formed while reading it. How one amendment quietly collides with another. Whether a revised sentence is an actual policy shift or just cleanup. Why a sudden spike in a series is probably a reporting quirk, not the world changing overnight.

The annotation substrate

Substrate (noun): the base something lives on.

At RT, we’ve been building what we call an annotation substrate, a durable layer where human and (human-verified) machine judgments are treated as first-class objects. They have an identity. They have history. They have a lifecycle. This isn’t “notes on top of content,” it’s infrastructure that makes judgment sturdy enough to become part of system behavior.

For example: an analyst marks a statutory provision as ambiguous. The provision is the target. The justification might be a conflicting committee report, a related amendment, and an older analyst note that argued the opposite. Those aren’t the same kind of thing. They play different roles, so the system should represent them differently.

If you squash all of that into a single “comment on this highlighted span,” you lose what makes annotations searchable, composable, and reusable.

Durable annotations enable another navigation surface across the corpus, such as: show every provision marked as ambiguous; list findings that rely on this committee report; surface where analysts disagree; track what shifted after a particular amendment; pull every quantitative observation linked to this clause.

What about structured data?

The same idea extends to structured data.

We work with quantitative observations next to legal text, measures, time series, outcomes, analytic checkpoints, and so on. Analysts need to annotate those too: “This spike is a reporting artifact.” “This correlation stops holding after the 2019 rule change.” “This measure isn’t comparable after the statutory revision.”

That means a single annotation can say: This statistical trend (structured target) -> is explained by this clause (document evidence) -> and contradicted by this prior finding (another structured target).

Compounding impact

Annotations made over time (e.g. by a team) have compounding value for the exploration of a large corpus. You can start at a clause and jump to the metrics it might influence. Or begin with an anomaly in the numbers and move back to the governing language. You can trace where an earlier conclusion gets strengthened, weakened, or overturned as versions shift and sources change. You can disagree with annotations and track disagreements.

Still early, but the direction is clear

It’s early. The structured targeting layer still needs resolver APIs, selector schemas, and firmer calls around versioning. Plenty remains to be nailed down.

But the path is straightforward: one substrate across modalities, durable coordinates rather than brittle offsets, explicit evidence rather than collapsed comments, and judgment you can reuse.