Frequentist Vs. Bayesian Methods¶
By: Chengyi (Jeff) Chen
Introduction¶
Maximum Likelihood Estimation¶
Derivation 1: KL Divergence¶
How to find the best ?¶
To learn the
We have thus arrived at Maximum Likelihood Estimation of parameters (you can read more about this derivation method here and here), a pointwise estimate of the parameters that maximizes the incomplete data likelihood (or complete data likelihood when we have no latent variables in the model).
Derivation 2: Posterior with Uniform Prior on Parameters¶
Why is MLE a “frequentist” inference technique?¶
The primary reason for why this technique is coined a “frequentist” method is because of the assumption that
Can we simply find the that maximizes ?¶
Unfortunately, because our model is specified with the latent variables
and hence, Maximum Likelihood Estimation becomes:
However, this marginalization is often intractable (e.g. if
Maximum A Posteriori¶
Derivation 1: Computationally Inconvienient to calculate the full Posterior ¶
Before continuing, realize that because
Note
Mathematical Notation
The math notation of my content, including the ones in this post follow the conventions in Christopher M. Bishop’s Pattern Recognition and Machine Learning. In addition, I use caligraphic capitalized roman and capitalized greek symbols like def p(Θ=θ)
).
https://pyro.ai/examples/intro_long.html#Background:-inference,-learning-and-evaluation
Objective:
Derivation 2:¶
Parameter Uncertainty¶
Frequentist: Uncertainty is estimated with confidence intervals
Bayesian: Uncertainty is estimated with credible intervals