ChronosGuard: auditing policies against the law as it stood on any date

Companies in regulated industries write policies that must follow the law — but the law keeps changing, published as PDF gazettes nobody re-reads. ChronosGuard ingests those PDFs, remembers when each rule started and stopped being valid, and uses AI to audit policy text clause by clause. Every finding ships with a word-for-word quote from the source law, verified by the server, so fabricated citations are impossible.

Type: Temporal RAG · multi-tenant SaaS
Stack: Python · FastAPI · Postgres + pgvector · Next.js
Tests: 178 across 5 lanes — CI spends $0 on AI
Unit cost: ~$0.03 per audit

ChronosGuard — a temporal compliance-RAG auditor showing a red VIOLATIONS FOUND rubber stamp for a data-retention policy audited against regulations in force on 25 May 2018, with two numbered ledger findings whose quotes are verified word-for-word in the source law, built with Python, FastAPI, and pgvector by Ali Jawwad

The problem: regulatory drift

Imagine a payment company whose policy says it may hold user funds for 7 business days. In 2024 the regulator allowed exactly that — compliant. In June 2026 an amendment cut the limit to 3 days — now in violation, and nobody noticed, because the change shipped as a PDF gazette. That slow, silent mismatch is regulatory drift, and catching it manually means a compliance officer re-reading hundreds of pages after every legal change.

Because ChronosGuard remembers when each rule applied, it also answers historical questions — “were we compliant in January 2025?” — which is exactly what audits and disputes demand. The same policy audited at two as-of dates correctly returns two different verdicts.

The time machine

Every legal rule is stored with an effective date and an expiration date; “what law applied on date X?” becomes a date comparison — simple, fast, testable. When a rule is replaced, the old one is never deleted: its end date is set, keeping history intact so past audits stay reproducible.

Exactly one function in the whole codebase answers “in force on date X,” and every feature — search, audits, the dashboard — calls it. One implementation means the logic can never quietly diverge, and a table-driven test suite pins the tricky edges: the start day counts, the end day doesn't, retroactive rules work, unreviewed documents stay invisible.

RAG with a lie detector

Grounded retrieval alone isn't enough for a compliance tool, so three safety layers sit on top. The AI never writes citations — it can only point at an excerpt by ID, and the citation text is filled in from the database, making invented sources structurally impossible. Every finding must include a word-for-word quote, which the server verifies against the real text; failed quotes are dropped and counted, and that counter is the hallucination alarm. And when no relevant law is found, the verdict is “insufficient evidence” — never “compliant,” because a false green checkmark is the worst possible failure.

Tenant isolation enforced by the database itself

Many companies share one database, so isolation can't depend on every developer remembering a WHERE clause. Postgres Row-Level Security filters rows at the database layer: each request sets its tenant on the connection and security policies do the rest. It fails closed — a missing tenant context returns zero rows, not everyone's rows. A dedicated, blocking test lane proves isolation by connecting as the real low-privilege database user.

The hardest bug in the project lived here: the background worker couldn't write its own results because it never goes through login and had no tenant context. The fix made the job queue a global table that carries the tenant ID, with the worker adopting that context per job, inside the job's transaction — and a test that proves the worker path is rejected without it.

Boring infrastructure, on purpose

The queue is a Postgres table claimed with FOR UPDATE SKIP LOCKED, with time-limited leases, a reaper for crashed workers, and 3 retries — no Redis, no broker. Audits return 202 with a run ID and the dashboard polls, because a verdict must be quote-verified as a whole before anyone sees it. CI runs on fake AI providers — a hash-based embedder whose similarity scores actually mean something, and a scriptable fake judge that can be forced to lie to test the quote-checker — so 178 tests cost $0 and never flake on an API hiccup.

One favorite war story: a one-line text-splitter bug merged a short 74-character clause into its neighbor, retrieval matched the wrong rule, and the end-to-end test flipped from VIOLATIONS_FOUND to COMPLIANT. In RAG systems, boring text-processing bugs change answers — end-to-end tests with known verdicts are what catch them.

Stack

Python
FastAPI
PostgreSQL
pgvector
OpenAI
Next.js

View the code on GitHub I offer this as a service: AI Integrations