See Schervish’s *Theory of Statistics*, sections 2.3.1 and 7.3.5, or Pawitan’s *In All Likelihood*, chapter 8, for a more intuitive introduction.

Observed information has the direct interpretation as the negative second derivative (or Hessian) of the log-likelihood, typically evaluated at the MLE. When the MLE is asymptotically normal, the Fisher information is the inverse of its covariance matrix, raising the question of whether we should use observed or expected information.

Efron, B., & Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information.

*Biometrika*,*65*(3), 457–483. doi:10.1093/biomet/65.3.457Provides “a large number of examples” to “supplement a small amount of theory” claiming that, in simple univariate cases, the observed information is a better covariance estimator than expected information.

Skovgaard, I. M. (1985). A Second-Order Investigation of Asymptotic Ancillarity.

*Annals of Statistics*,*13*(2), 534–551. doi:10.1214/aos/1176349537Section 6 considers Wald tests of hypotheses involving linear combinations of parameters of multivariate distributions. (This includes any univariate test you might want, for example.) Shows that using the observed information results in faster convergence of the test statistic to its expected chi-squared distribution, under various odd conditions on high-order derivatives of the density. This is an indirect way of showing that observed information may be more useful for tests and confidence interval estimation.

There is a connection between the Fisher information matrix and identifiability.

Catchpole, E. A., & Morgan, B. J. T. (1997). Detecting parameter redundancy.

*Biometrika*,*84*(1), 187–196. doi:10.1093/biomet/84.1.187Shows that, for exponential families, “a model is parameter redundant if and only if its derivative matrix is symbolically rank-deficient.”

Catchpole and Morgan point to Silvey,

*Statistical Inference*(1975), p. 81, which notes that for general models, singularity of the Fisher information matrix does not necessarily prove nonidentifiability. The connection between Fisher information and identifiability comes because the information is related to the matrix of second derivatives (the Hessian) of the log-likelihood, and a Taylor expansion of the log-likelihood at its maximum shows that a positive definite Hessian is sufficient for the maximum to be unique. But the Hessian could be singular and higher-order derivative terms still be positive, making the maximum unique.To summarize: non-singularity of the Fisher information is sufficient for identifiability, but not necessary.