See also spatiotemporal point processes.
I’ve written a full review of this topic; see Reinhart, A. (2018). A review of self-exciting spatio-temporal point processes and their applications. Statistical Science. https://arxiv.org/abs/1708.02647
Hawkes, A. G., & Oakes, D. (1974). A Cluster Process Representation of a Self-Exciting Process. Journal of Applied Probability, 11(3), 493–503. doi:10.2307/3212693
A stationary self-exciting point process with finite intensity can be represented as a Poisson cluster process (aka Poisson branching process). This can be useful for establish bounds of the process – for example, Lewis, P. A. W. (1969). Asymptotic properties and equilibrium conditions for branching Poisson processes. Journal of Applied Probability, 6(2), 355–371. doi:10.1017/S0021900200032873
The basic approach comes from Epidemic-Type Aftershock Models, where earthquakes are caused by some constant background process and then induce further aftershocks when they arrive. There’s a whole series of papers by Ogata; some highlights from the field:
The ETAS has been adapted to epidemiology (see also epidemic models):
Meyer, Sebastian, Elias, J., & Höhle, M. (2011). A Space-Time Conditional Intensity Model for Invasive Meningococcal Disease Occurrence. Biometrics, 68(2), 607–616. doi:10.1111/j.1541-0420.2011.01684.x
Related MSc thesis Meyer, S (2010). Spatio-temporal infectious disease epidemiology based on point processes (PhD thesis). Ludwig-Maximilians-Universität München. https://epub.ub.uni-muenchen.de/11703/1/MA_Meyer.pdf
Swapping out kernels for spatial influence, with brief mention of scoring rules for one-ahead predictions (sect. 3.3): Meyer, Sebastian, & Held, L. (2014). Power-law models for infectious disease spread. Annals of Applied Statistics, 8(3), 1612–1639. doi:10.1214/14-AOAS743
Detecting clustering while accounting for spatial heterogeneity using the model: Meyer, Sebastian, Warnke, I., Rössler, W., & Held, L. (2016). Model-based testing for space-time interaction using point processes: An application to psychiatric hospital admissions in an urban area. Spatial and Spatio-Temporal Epidemiology, 17(C), 15–25. doi:10.1016/j.sste.2016.03.002
A self-exciting point process can be interpreted as a Poisson cluster process, as mentioned above. It could be interesting to decluster it, meaning to remove the events which were “excited” by another, and leave only the background events which occurred spontaneously. (In the earthquake literature, this means removing the aftershocks and keeping only the main shocks.) Stochastic declustering is this procedure.
Zhuang, J., Ogata, Yosihiko, & Vere-Jones, D. (2002). Stochastic Declustering of Space-Time Earthquake Occurrences. Journal of the American Statistical Association, 97(458), 369–380. doi:10.1198/016214502760046925
A procedure based on estimating the chance that each event was stimulated (comparing the intensity contribution from every other crime to the intensity contribution from the background), then thinning the events based on these probabilities.
Zhuang, J., Ogata, Yosihiko, & Vere-Jones, D. (2004). Analyzing earthquake clustering features by using stochastic reconstruction. Journal of Geophysical Research, 109(B5), 1–17. doi:10.1029/2003JB002879
Application of this procedure to an earthquake dataset to test hypotheses about the background and clustering processes. By declustering the process, we get the links between background events and their offspring, and can compare the time and distance distributions between them to see if they match the model’s assumptions. However, this uses the model’s assumptions for the declustering process – a bit tautological, and I suspect the declustered process will look very good even if the model does not fit well at all.
Xu, L., Duan, J. A., & Whinston, A. (2014). Path to purchase: A mutually exciting point process model for online advertising and conversion. Management Science, 60(6), 1392–1412. doi:10.1287/mnsc.2014.1952Uses Bayesian mutually exciting point processes to model different types of ad clicks, as well as purchase events on a retail website, to see what kinds of ad clicks trigger purchases and what types of ad clicks excite future ad clicks.
[To read] Shizhe Chen, Daniela Witten, and Ali Shojaie (2017), “Nearly assumptionless screening for the mutually-exciting multivariate Hawkes process”, Electronic Journal of Statistics 11 (1). doi:10.1214/17-EJS1251
How do we evaluate predictions made by a self-exciting point process model?
Vere-Jones, D. (1998). Probabilities and Information Gain for Earthquake Forecasting. Computational Seismology, 30, 248–263.
Basic idea: if you’re predicting whether or not a certain type of event will occur in a certain time interval, run many simulations over that interval, calculate the probability of the event occurring in those simulations, and use a scoring rule to compare to the actual outcome. Repeat over many time intervals.
Makes an interesting point about the ETAS models: they get worse scores for background events than an ordinary Poisson process, since the Poisson process estimates a higher mean event rate to account for the clustering, and the ETAS model has a lower mean background rate and explicit clustering. Since ETAS predicts aftershocks, it’d be more fair to start evaluation periods immediately after a main shock (which does limit the usefulness for predicting main shocks…).
Harte, D., & Vere-Jones, D. (2005). The Entropy Score and its Uses in Earthquake Forecasting. Pure and Applied Geophysics, 162(6), 1229–1253. doi:10.1007/s00024-004-2667-2
Reviews the entropy score (log score) and how it can be used to evaluate predictions from point process models. The log-likelihood turns out to estimate the expected information gain per event, so likelihood ratios (on a separate test set) can be used to compare models. Goodness-of-fit tests can be done by comparing the likelihood on the test set to the likelihood on simulated datasets drawn from the model.
Daley, D. T., & Vere-Jones, D. (2004). Scoring Probability Forecasts for Point Processes: The Entropy Score and Information Gain. Journal of Applied Probability, 41, 297–312. http://www.jstor.org/stable/3215984