Sufficient statistic

Sufficiency captures all information about θ in the data

Image: Martin Behaim / Georg Glockendon, CC BY-SA 4.0, via Wikimedia Commons

Sufficient statistic

Sufficiency captures all information about θ in the data

A sufficient statistic for a model parameter contains all the information that the dataset provides about that parameter. This means that once you have computed the sufficient statistic, you don't need to look at the original data anymore to make inferences about the parameter.

The concept of sufficiency is closely related to the concepts of an ancillary statistic and a complete statistic. An ancillary statistic contains no information about the model parameters, while a complete statistic only contains information about the parameters and no ancillary information.

The concept of sufficiency was introduced by Sir Ronald Fisher in 1920. Despite falling out of favor in descriptive statistics due to its strong dependence on an assumption of the distributional form, it remained very important in theoretical work.

Example

Consider a sample dataset from a normal distribution with unknown mean μ and known variance σ². The sample mean X̄ is a sufficient statistic for μ because it contains all the information about μ that the data provides.

Understanding sufficiency is crucial for efficient data analysis, as it allows statisticians to summarize data without losing any relevant information about the parameters of interest.

Related concepts

One email a day: 5 concepts + the 5 stories that matter →

Swipe through 100 ML concepts daily

Open TickerNews