13 Jan 2020
•
Rajaraman Nived
•
Chandra Prafulla
•
Thangaraj Andrew
•
Suresh Ananda Theertha
Support size estimation and the related problem of unseen species estimation
have wide applications in ecology and database analysis. Perhaps the most used
support size estimator is the Chao estimator...Despite its wide spread use,
little is known about its theoretical properties. We analyze the Chao estimator
and show that its worst case mean squared error (MSE) is smaller than the MSE
of the plug-in estimator by a factor of $\mathcal{O} ((k/n)^4)$, where $k$ is
the maximum support size and $n$ is the number of samples. Our main technical
contribution is a new method to analyze rational estimators for discrete
distribution properties, which may be of independent interest.(read more)