Originally posted: 2023-10-18. Live edit this notebook here.
This visualisation aims to clarify the intuition behind the following formula from the Fellegi Sunter model, which was presented in the article on the maths of Fellegi Sunter in the main tutorial.
u probabilities are defined in relation to a scenario:
This article also explains why it's just a re-statement of Bayes Theorem.
Recall that the prior is the probability that two random records match, which is one of the parameters to be estimated.
We can visualise this parameter by showing how it divides the set of all pairwise record comparisons:
Using the definition of the
u probabilities above, we can further subdivide this space as follows:
Given our observation that the scenario holds, we can discard the areas in white as no longer applicable given this new information.
Turning this back into formulas we can write:
Substituting in numbers we have:
In a more general sense, these visualisations explains the intuition behind Bayes Theorem.
Recall that Bayes Theorm is:
In the context of record linkage, we can describe these parts as:
Prior: The overall proportion of comparisons which are matches
Evidence: We have observed that a scenario holds,
Likelihood: The probability that the scenario holds amongst matches, given by
So Bayes' Theorem is:
So Bayes Theorem is just our original formula:
See also this great video!