Maximum Likelihood Estimation Conceptual Questions

Metronome · Sep 24, 2022

Suppose I am told the outcomes (

x_1

through

x_n

) of

n

rolls (

X_1

through

X_n

) of a fair,

\theta

-sided die, and want to find a point estimate of

\theta

.

Let's state the problem without using the lexeme "likely" in order to eliminate one layer of confusion, since its technical and colloquial meanings differ. It seems to me that the goal here is to choose the possible value of

\theta

which has the greatest probability of being its actual value, given the information I have about the rolls, and that this goal translates into the mathematical notation

\max_\theta \Pr(\theta | (\bigcap_{i = 1}^n X_i = x_i))

However, the prescription of the maximum likelihood principle, as I understand it, is to choose the possible value of

\theta

which makes the information I have about the rolls most probable, and that this prescription translates into the mathematical notation

\max_\theta \Pr((\bigcap_{i = 1}^n X_i = x_i) | \theta)

1) Is the point of the maximum likelihood principle that both of these maximizations produce the same answer? If not, then why is the latter the correct one to solve, even while the former seems to better capture the English description of the problem statement?

2) As explained here, the correct answer is that the point estimate of

\theta

is the greatest

x_i

observed among the rolls. However, this does not make intuitive sense for small values of

n

. For example, after a single roll is observed, say it's

x_1 = 15

, it seems more reasonable to guess that the die has

30

sides than

15

. Is this correct? Does MLE require a suitable sample size to arrive at the correct answer? Is there a more general way to solve this problem for any sample size that agrees with MLE when its sample size requirements are met?

Maximum Likelihood Estimation Conceptual Questions

Metronome

Junior Member