How AI is used on this site — methodology and provenance
Core Legal Rules / Core Legal System. Living document. Last updated 2026-06-18.
This is the detailed companion to our short transparency statement. It is a living document: it is kept in version control, dated, and revised as the project changes. Its purpose is to let any reader — a student, a skeptic, a lawyer, a reviewer — see how the legal content on this site is made, checked, and corrected, without asking anyone to take a claim on faith.
We hold this document to the same standard we hold the rules: it states what is verifiable, marks what is recollection, and does not claim more than it can show.
1. Where this came from
The project began as a personal experiment, not a database. Someone with a long career in law set out to see whether they could pass a bar exam decades after first taking one, and asked a general-purpose AI assistant for practice questions. They missed roughly half — not from ignorance of law, but because the recall of conventionalized black-letter rules that the exam demands is a different skill from the practice of law. The insight that followed shaped everything since: the scarce thing isn't questions, it's an organized system for understanding, recognizing, and remembering the rules being tested.
That early work — identifying and articulating rules with an AI assistant — happened in 2025 and lived in chat sessions, not in a code repository, so its exact dates rest on recollection rather than record. From November 24, 2025 onward the work is captured in version control, where every change is timestamped and inspectable. We draw that line honestly: an undocumented genesis phase, then a documented phase we can stand behind in detail.
2. Which AI, for what
We disclose the models we use by role. We do not disclose proprietary prompts, scoring internals, or other trade secrets — but the principles below are complete.
- Genesis (2025). The first rule statements were generated with a general-purpose OpenAI assistant of the GPT‑4 era. We cannot identify the exact model version with certainty and do not claim to; the corpus has since been substantially revised, so the original output is a seed, not the present content.
- Architecture and verification design (June 2026). The system that organizes and checks the rules was designed with the help of a frontier Anthropic model available in limited release at the time. (For our internal record we have identified the specific model and the circumstances of its brief public availability; for this public document the durable description is "a frontier model.")
- The live AI tutor. The optional in-product tutor (under development and not yet publicly available) runs on current Anthropic Claude models. It is instructed to answer only about the rule you are viewing, to ground its answers in the displayed rule and cited authority, and not to invent rules or citations. It is support, not the core of the product.
- Ongoing maintenance. Ongoing periodic organization, revision, and verification runs are performed with current Claude models under human-defined process.
3. The Integrity and Transparency of the Process
The integrity and transparency of the process — how the system is designed, audited, and described to you — are the result of a collaboration between humans and AI. Humans do not personally certify that each rule is a correct statement of law. We say this plainly because the alternative claim, that a human expert has read and vouched for every rule, would be false, and false reassurance is worse than honest uncertainty.
This is a deliberate design choice, not a gap. A single human reviewing thousands of rules produces an unmeasured error rate: mistakes pass with the same confident signature as correct statements, and no one knows the miss rate. It is doubtful that the most learned legal scholar in any field has a knowledge base that would allow them to review every legal rule covered by this Application and verify its accuracy. Resort would have to be made to authoritative sources. Our aim is to put those authoritative sources, to the extent possible, when they can be derived from publicly available online resources in the hands of the user. A disclosed, sampled, measured process is more honest and, we believe, more accurate. The human role is auditor of process quality, not bottleneck.
4. How rules are verified
Verification is system-driven, with humans reserved for exceptions, audits, calibration, and policy. The pipeline is designed in stages:
- Authority corpus. Controlling texts — federal rules, statutes, constitutional provisions — are fetched once from official sources and stored with a content hash. The verifying models are never asked "what does this rule say" from memory; the official text is placed in front of them. This converts a task where AI is weak (open-domain recall) into one where it is strong (reading comprehension against a provided text).
- Claim decomposition. Each rule is split into atomic, typed claims (black-letter, application, common-law, exam-convention, pedagogy), because a single rule bundles many separately checkable assertions and scoring it as one blob hides the one that's wrong.
- Grounding / entailment. Each checkable claim is judged supported, contradicted, or not addressed by multiple independent verifiers — preferably across different model families — each of which must quote the exact span of source text it relied on. A claim its own cited authority doesn't support is flagged even if it happens to be true.
- Adversarial challenge. A separate model, in a separate role, tries to break each surviving claim and must cite the contradicting line to do so. Drafter, verifier, challenger, and judge are kept distinct so no model grades its own work.
- Routing and confidence. Claims that pass cleanly are staged; splits, challenges, and miscitations route to a human exception queue; contradictions go back to drafting with the citation that defeated them, so the system improves each cycle. Common-law and worked-example claims carry permanently lower confidence ceilings and explicit framing.
- Human layer. Humans adjudicate exceptions, audit a random sample of each batch, set policy on contestable doctrine, and sign off on the batch based on the measured sample — not on a read-through of every rule.
- Field verification. Every rule carries a way for users to challenge it. Field-reported errors are treated as ground truth that calibrates the whole system, and each adjudicated dispute joins a regression set that guards against future drift.
Current status (important). This architecture is partially implemented. As of this writing the site is a public beta, the verification pipeline is being built and calibrated on an initial Civil Procedure pilot, and most rules in the corpus have not yet completed full verification. We will surface each rule's verification state as the pipeline rolls out, and we will publish coverage figures (what share of rules are source-linked and multi-model verified) so progress is visible and honest about what remains.
5. How reliable is any legal source?
We did not want to measure ourselves against an imaginary error-free standard, so we looked at how reliable established sources actually are.
In the sciences, the picture is sobering. Stuart Ritchie's Science Fictions (2020) documents how fraud, bias, negligence, and hype reach even the most prestigious journals. Quantitatively, systematic reviews of the medical literature find that a striking share of cited "facts" do not actually say what the citing article claims they say: a 2017 recalculation put the quotation-error rate near 14.5%, with roughly two-thirds of those being major, and a 2025 meta-analysis of 46 studies and about 32,000 citations found roughly 16.9% incorrect, around half of them major.
Law is harder to score, and that itself is telling. American law reviews invest heavily in citation-checking — student editors verify every cite — and a 2023 Law Library Journal study calls them "a rare exception" to the high citation-error rates measured in other disciplines. But that is largely a statement about process and reputation, not a measured outcome: unlike medicine, legal scholarship has not subjected its own work to the same systematic error measurement, so there is no clean, field-wide figure to set beside the roughly 15% quotation-error rate found in medical articles. The absence of a number is not evidence of accuracy — it more likely means the field has examined itself less rigorously than science has examined its own flaws. Careful citation form is also narrower than truth: scholars have long cautioned that readers rely too readily on an author's characterization of a source rather than the source itself, and that empirical and statistical claims in student-edited journals are not validated the way scientific peer review attempts to be. The best-documented weakness is correction — a 2020 study by Janet Sinder found that legal scholarship's post-publication correction practices are ad hoc and opaque, so readers often cannot tell whether an article was corrected, which version is authoritative, or what changed.
That last gap is the one we take most seriously, because it is precisely the one a version-controlled, openly-logged system can close.
We hold AI to the same scrutiny rather than exempting it. A 2026 benchmark, LegalCiteBench, tested 21 language models on producing legal citations from memory; even the strongest scored below 7 out of 100 on citation recovery, and most returned confident but wrong authorities more than 94% of the time on retrieval-heavy tasks. That is not an argument against using AI — it is the exact reason we never let a model generate authority from memory. Our pipeline puts the controlling text in front of the model and asks it to read, not recall (see section 4).
The honest conclusion is not "AI errs and humans don't," nor the reverse. Every legal-knowledge system makes errors. The responsible question is whether a system has transparent sourcing, real verification, and visible, durable correction — and that is what we are building.
6. Honest limitations
- The rules are written to bar-exam convention and are deliberate simplifications; they may differ from the law of any given jurisdiction.
- AI verification's deepest risk is correlated error — models share training data, so a widely repeated misconception can pass agreement checks. Grounding in primary text, cross-family checks, deliberately planted test errors, and user reports are how we attack it, but it is a real and permanent risk we manage rather than eliminate.
- Common-law "majority rule" is genuinely contestable at the margins; we frame it as such rather than as settled law.
- Our honest comparator is not perfection. Commercial bar materials ship with errata; a lone human reviewer has an unmeasured error rate. The distinguishing property we aim for is an error rate that is measured, bounded, versioned, and falling.
7. Your data and privacy
During this public beta, you are not asked to create an account, register, or log in to read or browse the rules.
The only personal information we collect is what you choose to give us. If you submit a report or question, you may optionally include your email address, which we use solely to reply to you. As nearly all websites do, our servers also record basic technical information such as IP addresses and standard usage logs, which we use to operate, secure, and improve the site.
We use this information only for the internal purposes of Core Legal System LLC and its affiliates — running, protecting, and improving the service. We do not sell your information, and we do not share it with third parties for their own purposes, including advertising or data brokering.
To operate the site we rely on a small number of service providers — for example, website hosting and database services, and, for the optional AI tutor once it launches, an AI provider. These providers process data on our behalf to make the service work; they are not permitted to use it for their own purposes.
When the optional tutor becomes available, the question you ask and the relevant rule context will be sent to Anthropic's API to generate a response. Under the terms governing commercial API use, those inputs are not used to train the underlying models. We will keep this section current with our providers' terms and note any change here.
8. What we will not disclose, and why
Our intent is to disclose models, roles, methods, sources, limitations, and verification status. We do not disclose specific prompts, scoring weights, or other implementation details that have competitive value. Naming the boundary is itself part of being honest: total transparency about method does not require giving away the build.
9. Changelog
- 2026-06-18 — Initial draft created for review.
- 2026-06-18 — Added section 5 ("How reliable is any legal source?") with verified sources on scientific, legal-scholarship, and AI citation error.
