The accuracy problem in cemetery data

Cemetery records are inherently messy. Names change across documents. Dates contradict each other. Stones weather. Handwriting in old plot books is ambiguous. Any system that claims to digitize cemetery records must deal with this uncertainty honestly or it will produce a database that looks clean but is quietly full of errors.

GraveLedger chose a different approach: make uncertainty visible.

Confidence scoring

Every piece of extracted data in GraveLedger carries a confidence score from 0 to 100. A name extracted from a clean modern headstone might score 97. A date pulled from a weathered 19th-century limestone might score 45. Both are useful. But they are useful in different ways, and the system treats them differently.

Records above 80% confidence publish directly. Records below 80% enter a review queue where a human examiner sees the original photo alongside the extracted text and makes the final call. The threshold is not arbitrary — it reflects the point where OCR output becomes unreliable enough that human verification adds meaningful value.

Provenance tracking

Every record links back to its source. If a name came from a photograph, the photo is attached. If a date came from a county death index, that source is cited. If a community member contributed a correction, their input is logged.

This matters because cemetery data gets reused. Genealogy researchers, historians, and families will reference these records for decades. When a question arises about where a piece of data came from, the answer should be traceable — not "it was in the database."

The correction loop

No initial extraction is final. GraveLedger supports community corrections with a simple principle: corrections are additions, not replacements. When someone submits a correction, the system records the new reading alongside the original, with attribution for both. If the correction is clearly supported by the source material, the displayed value updates. If it is ambiguous, both readings remain visible.

This prevents the common problem where a well-meaning correction introduces a new error that overwrites the original data permanently.

What we do not do

We do not fabricate data to fill empty fields. Unknown values stay unknown.
We do not auto-approve low-confidence extractions to inflate record counts.
We do not merge records from different sources without explicit matching criteria.
We do not delete original readings when corrections are submitted.

Why this matters

A cemetery record database is only valuable if people trust it. Trust is not built by having the most records. It is built by being honest about what the records actually say, where they came from, and how confident the system is in each piece of data.

Every design decision in GraveLedger starts from that principle.

How GraveLedger handles data accuracy

The accuracy problem in cemetery data

Confidence scoring

Provenance tracking

The correction loop

What we do not do

Why this matters

Get the next article by email

Related articles

How AI-powered OCR is changing headstone digitization

How to read a weathered headstone without inventing details

The forgotten cemeteries of America and why they matter