Probabilistic record linkage

These pages present some introductory training material on probabilistic record linkage using the Fellegi Sunter model. Many of the articles are interactive.

This material presents a simplified version of the model used by Splink, a piece of probabalistic linkage software for which I'm lead developer.

Many of the graphics presented re-use Splink's graphical output, and the representation of model parameters used is the same as Splink's settings object

Training materials on probabilistic linkage

  1. An Interactive Introduction to Probabilistic Record Linkage
  2. The mathematics of the Fellegi Sunter model
  3. Visualising the Fellegi Sunter model
  4. Understanding match weights
  5. Dependencies between match weights
  6. m/u value interactive sandbox

Further reading

Articles about Splink

  1. Fuzzy Matching and Deduplicating Hundreds of Millions of Records using Apache Spark
  2. Splink: MoJ’s open source library for probabilistic record linkage at scale

Links to the software

  1. Splink homepage
  2. Splink training materials repo
  3. Try Splink live in your browser