A constrained property graph for Vallor: keep stable identity as columns, keep dynamic attributes as schema-typed JSONB, and add a first-class labeled edge primitive for relationships. A small set of primitives engineering owns — infinite per-customer process variation that lives in data, not migrations.
Three years in, the original assumption — that contract management is rigid and uniform — has proven false. Every company manages contracts differently, and the technology moves faster than our data model can flex. The variation is not noise; it is the domain.
Today, every new way a customer thinks about contracts becomes a schema change and an engineering ticket. At venture scale, that puts engineering on the critical path of every customer's idiosyncratic process map. This is the tech debt to pay down. Dynamic fields were the first response to exactly this pressure — field shapes we can hand to an LLM (which takes a JSON Schema) that differ wildly across industries and even across teams in one org.
Everything in the problem statement is one shape: a property graphProperty graph — nodes (entities) and edges (relationships) are both first-class and can both hold key/value properties. Unlike RDF triples, properties live on the element, not as separate statements. The model behind Neo4j and AWS Neptune.vs. EAV: keeps attributes and relationships in separate, typed structures — typed nodes connected by typed, labeled edges. The same contract, expressed for three different customers, changes only its edges and labels:
We have, piecemeal and without naming it, already built the four ingredients of this graph:
| Ingredient already in the repo | What it is today | What it becomes in the graph |
|---|---|---|
| dynamic_fields + extraction_schema | Per-org JSON-Schema metamodel + JSONB values on contract, redline_project, organization_company, task… | Node attributes, typed by the org's schema. Keep as-is. |
| task_entity | (task_id, entity_type, entity_id) — a polymorphic link, hardcoded to one source type. | The edge primitive, generalized to any source + a label. |
| organization_label | Per-org registry of customer-defined labels with type, category, cardinality (single/multi). | The relationship-type registry — the seed of an ontology. |
| entity_event | Polymorphic audit trail keyed by (entity_type, entity_id). | Proof polymorphic references work in our stack at scale. |
We did not build EAV. We independently evolved the scaffolding of a constrained property graph. The work is not "adopt a new pattern" — it is "promote four ad-hoc pieces into two deliberate primitives." That de-risks the effort: polymorphic refs, the access patterns, and the per-org metamodel are already proven in production.
EAV can model relationships (the value holds a reference to another entity — "EAV with relationships"). So the question is not "can it," it is "should it." EAV's defining move is collapsing attributes and relationships into one untyped (subject, predicate, object) table. That is exactly the move to avoid — the two have different needs:
So the recommendation is not EAV and not "keep extending dynamic_fields" (it structurally cannot link). It is: keep dynamic_fields for attributes, add a first-class labeled edge primitive for relationships, governed by a per-org registry that constrains which edges are legal — the same role extraction_schema plays for attributes.
EAV famously lacks a schema; our registry is the schema. Without it you get the OTLT failure modeEAV / OTLT failure mode — the "One True Lookup Table" anti-pattern: everything dissolves into one untyped table, nothing is discoverable, every query becomes bespoke self-joins. Well-documented precisely because teams keep getting burned.Celko, "SQL for Smarties"; the MUCK / OTLT critique — undiscoverable, unconstrained, unqueryable. The registry makes this a constrained graph.
| The customer's reality | How the graph represents it — zero schema change |
|---|---|
| Owner vs. manager | Same entities, different edge label. (person)-[owner]→(contract) vs (person)-[manager]→(contract). |
| No "owner" concept here | Their registry omits owner. Nothing forced, no null field. Absence costs nothing — you store edges that exist. |
| A team or division owns it | Polymorphic source: (team)-[owner]→(contract). Same edge type, different source node type. |
| 1 / N suppliers / a customer / a person counterparty | Cardinality is just "how many edges of that type exist." (company)-[supplier]→(contract) ×N, or (person)-[counterparty]→(contract). |
| Many documents = one contract | file becomes a node type. Many (file)-[part_of]→(contract) edges. "Contract = PDF" dissolves. |
| Files relate to companies, chats, files… | A file is a node; its relations are edges. "Polymorphic, graph, EAV, or join?" → all the same answer at different zoom: a node + a generalized polymorphic edge = a graph. |
This answers the venture-scale pain directly. Collapsing the world to a uniform vocabulary shrinks the LLM's surface to ~3 tools:
The LLM reads the org's ontology — node types + edge-type registry + attribute schemas, all data, all expressible as JSON SchemaWhy JSON Schema is the unlock — models consume and emit JSON Schema natively (structured-output / tool-arg format). The org's entire ontology becomes a prompt-able spec, and the LLM maps a customer's documents onto nodes + edges against it. The same bet dynamic_fields already made for attributes, extended to relationships.extraction_schema = JSON Schema Draft 2020-12 + x-metadata — and maps a customer's documents and intent onto nodes and edges. No bespoke per-customer code.
Engineering owns ~4 primitives. Customers' infinite process variation lives in the ontology + the graph instances — never in a migration. A new relationship kind ("outside counsel") is a registry edit, made by an admin or proposed by an LLM, not an engineering ticket.
| Dimension | Constrained property graph (recommended) | Classic EAV / triples |
|---|---|---|
| Attributes vs. relationships | Separate, each typed for its access pattern | Collapsed into one untyped table |
| Typing | Schema-typed JSON Schema per org + Zod | Stringly-typed one value column |
| "Chats/files for contract X" | Indexed edge lookup, both directions | Self-join soup |
| Attribute read | One row + GIN (co-located JSONB) | N rows reassembled per entity |
| Referential integrity | App-layer + registry (poly target — a wash) | App-layer (poly target — a wash) |
| Cardinality / required rules | Expressible in the registry | Not expressible |
| Fit with existing tooling | Reuses dynamic_fields, task_entity, entity_event | Third pattern; entities become islands |
| LLM surface | ~3 uniform ops over a JSON-Schema ontology | Generic triples; ambiguous to map reliably |
The graph takes the generality you want from EAV/graph while preserving the typing, tooling, and projection disciplines we've invested in. On the one dimension EAV could theoretically win — a real FK on the link — it doesn't: a polymorphic target can't carry a true FK in any encoding short of per-type tables, so EAV doesn't even collect its usual prize.
| Cost | What it means | Mitigation |
|---|---|---|
| Query complexity & perf | SELECT owner FROM contract becomes an edge join / recursive CTE. Hot read paths get more expensive. | Denormalized projections back into columns / search index — we already do this. |
| No real FKs | Polymorphic edges can't enforce target existence at the DB. | Registry + Zod validation; background GC for dangling edges. |
| Compile-time → runtime typing | Trade contract.owner: Person for runtime-validated edges. | Parse every edge against the registry at the boundary. |
| Paradigm migration | 3 years of code assumes contract.counterparty, contract = file, etc. | Strangler-fig (below). Never big-bang. |
| Over-generalization | A fully generic graph can become undiscoverable "soup." | The per-org registry constrains it; keep stable core as columns. |
The projection layer is where this architecture bites. Our existing dynamic_fields→column sync triggers are currently broken for 10/12 fields (snake- vs camel-case key mismatch; only name/title sync) and are being torn out in ENG-5965. Budget explicitly for the read-projection strategy. Classic EAV with self-joins would make this worse, not better.
Do not dissolve everything into nodes/edges. Keep genuinely-stable identity as real columns (an org is an org; a contract existing is a fact; organization_id, timestamps, tenant ownership). Only the variable stuff goes graph/dynamic: counterparties, roles, custom files, custom attributes. Drawing that boundary is the whole art — the OTLT anti-pattern is what happens when you erase it.
| Phase | Action | Outcome |
|---|---|---|
| 1 | Ship entity_edge + the per-org relationship-type registry. Generalize task_entity as the first consumer. | New edges exist alongside legacy columns. Nothing breaks. |
| 2 | Promote file to a first-class node; model contract↔file as part_of edges. Dual-write. | "Contract = PDF" retired behind the scenes. |
| 3 | Move counterparty / owner / manager onto edges. Project edges → legacy columns for back-compat reads. | Readers migrate incrementally; UI unchanged. |
| 4 | Expose the 3-op LLM vocabulary over the org ontology. Open registry editing to admins. | Customer process variation leaves the engineering critical path. |
| 5 | Retire legacy columns as readers cut over; keep only hybrid-boundary core columns. | Steady state: small primitives, data-driven ontology. |
The first draft PR is a thin vertical slice of Phase 1 + part of Phase 3 — enough to see the model working end to end for the two relationships you named:
In: entity_edge migration · Zod entity + relationship-type schemas in @vallor/types · access-controlled Kysely query helpers (create / list-by-source / list-by-target) · an oRPC procedure to relate a contract to a company and an owner · a minimal seed of the registry · unit tests. Out (for the first PR): UI, the LLM op vocabulary, file-as-node, and legacy-column projection — those are later phases.