Probabilistic record linkage
These pages present some introductory training material on probabilistic record linkage using the Fellegi Sunter model. Many of the articles are interactive.
This material presents a simplified version of the model used by Splink, a piece of probabalistic linkage software for which I'm lead developer.
Many of the graphics presented re-use Splink's graphical output, and the representation of model parameters used is the same as Splink's settings object.
Training materials on probabilistic linkage
Introductory Interactive Tutorial
- An Interactive Introduction to Record Linkage (Data Deduplication) in the Fellegi-Sunter framework
- Partial match weights
- m and u values in the Fellegi-Sunter model
- The mathematics of the Fellegi Sunter model
- Computing the Fellegi Sunter model
- Why Probabilistic Linkage is More Accurate than Fuzzy Matching For Data Deduplication
- The Intuition Behind the Use of Expectation Maximisation to Train Record Linkage Models
- An alternative way to think about predicted probabilities in the Fellegi Sunter model