Predictive policing

Alex Reinhart – Updated September 29, 2017 notebooks ·

See also Policing, Predicting recidivism.


Crime tends to concentrate at places, so we find the places and direct policing. A very straightforward intervention-oriented approach.

Crime concentration

The core hypothesis, that crime is concentrated at small places, tends to come from statistics like these: “Property crime in Vancouver is highly concentrated in a small percentage of street segments and intersections, as few as 5% of street segments and intersections in 2013 depending on the crime type”, from Andresen, M. A., Linning, S. J., & Malleson, N. (2017). Crime at Places and Spatial Concentrations: Exploring the Spatial Stability of Property Crime in Vancouver BC, 2003-2013. Journal of Quantitative Criminology, 33(2), 255–275. doi:10.1007/s10940-016-9295-8

However, 5% is less impressive when you realize there were 18,445 street segments and intersections on which crime could occur, and only 1700 or so burglaries in a given year, so a completely uniform spread of crimes could still only hit 9% or so of the map.

The classic source is usually Weisburd, David, Bushway, S., Lum, C., & Yang, S.-M. (2004). Trajectories of crime at places: A longitudinal study of street segments in the city of Seattle. Criminology, 42(2), 283–322. doi:10.1111/j.1745-9125.2004.tb00521.x This shows an interesting trajectory analysis over several years, and the fundamental crime concentration claim comes from 29,849 street segments and around 100,000 crimes per year, 50% of which is contained in maybe 5% of the segments. This is interesting, but doesn’t determine if crime is more concentrated than we’d expect from simple population density and mapping reasons (e.g. some street segments never experience crime because they’re interstate on-ramps or small access roads).

One attempt to solve these issues is Hipp, J. R., & Kim, Y.-A. (2016). Measuring Crime Concentration Across Cities of Varying Sizes: Complications Based on the Spatial and Temporal Scale Employed. Journal of Quantitative Criminology, 1–38. doi:10.1007/s10940-016-9328-3 This discusses the issue of random variation causing concentration and the possibility of measuring concentration relative to the concentration we’d expect just from a uniform spread of crime across the map. Instead of proposing a measure which does so, however, they propose metrics which try to avoid upward-biased estimates of concentration—pick the top cells from last year and see what fraction of crime is contained in them this year, for example, which tries to smooth out random variation in concentration. They claim, by linear regression against the number of crimes and the number of possible locations for crime, that this measure accounts for most of the concentration we expect from having few crimes in a large city, but I don’t find this terribly convincing.

Finding hotspots

Usually clustering methods or kernel densities: pick the areas with clusters or the highest crime density. There are conflicting results on what works best, but I don’t like the metrics anyway; the PAI and RRI don’t seem to measure useful quantities, particularly when you arbitrarily choose your threshold for defining “hotspot” and don’t compare across a range of thresholds, ROC-style.

For evaluation metrics:

Explaining hotspots

Experimental trials

Various experiments have tested whether directing patrols to hotspots reduces crime, to generally positive results.

But trying to solve community problems may be better than just saturation patrol:

One curious trial finds increases in crime when hotspot patrols are predictable: [To read] Ariel, B., & Partridge, H. (2016). Predictable Policing: Measuring the Crime Control Benefits of Hotspots Policing at Bus Stops. Journal of Quantitative Criminology, 1–25. doi:10.1007/s10940-016-9312-y

Risk Terrain Modeling

A spatial technique to identify spatial features which lead to crime. Works by identifying risk factors (bars, foreclosures, schools, etc.), mapping these, and then seeing how well they predict crime.

The initial iteration just added up the number of risk factors, then used a logistic regression to predict presence or absence of crime: Kennedy, L. W., Caplan, J. M., & Piza, E. L. (2010). Risk Clusters, Hotspots, and Spatial Intelligence: Risk Terrain Modeling as an Algorithm for Police Resource Allocation Strategies. Journal of Quantitative Criminology, 27(3), 339–362. doi:10.1007/s10940-010-9126-2

Model selection was just “which logistic regression has the biggest slope”, which naturally biases it to the models with fewer risk factors, since their risk values have a smaller range (as just a count of present factors) and hence must have a larger slope. Variable selection used a bunch of univariate chi-squareds, and I’m dubious about using p values to decide which variable predicts best.

Then came an update which uses elastic net penalized regression to fit a Poisson model, picking the best penalty via cross-validation, then further reducing the model with stepwise regression and BIC. (Why not just adjust the penalty parameter for more sparsity?) Features were included as three binary variables for proximity (within 426, 852, or 1278 feet) and three different kernel densities (with those three bandwidths), for reasons I do not understand: Kennedy, L. W., Caplan, J. M., Piza, E. L., & Buccine-Schraeder, H. (2016). Vulnerability and Exposure to Crime: Applying Risk Terrain Modeling to the Study of Assault in Chicago. Applied Spatial Analysis and Policy, 9(4), 529–548. doi:10.1007/s12061-015-9165-z

Other spatial methods

Near repeats

Crimes tend to be followed by nearby crimes, e.g. from a burglar returning to an area to try a new target.

Counting repeats

A bunch of papers use the Knox test, a permutation test that compares the number of crimes nearby in space and time with the permutation null. Requires discrete choice of cutoffs for “nearby”, so claims of distances of effects are really claims about the power of the test. (If significance is only found within 200m, would it be found at 300m if we had more data?) Implemented in the Near Repeat Calculator, widely used.

Another approach models choice of houses to burgle with a multinomial logit, where the outcome is the choice of house: Ratcliffe, J. H., & Rengert, G. F. (2008). Near-Repeat Patterns in Philadelphia Shootings. Security Journal, 21(1-2), 58–76. doi:10.1057/

K functions

Ripley’s K function provides a continuous analog of the Knox test statistic. It’s a normalized count of the average number of points within a given distance of an arbitrary event, so it’s function of distance instead of having an arbitrary cutoff; a natural space-time generalization counts the average number within a given distance and a given time. Plotting these gives a sense of the scale and decay of near-repeat effects.

Used to compare before and after stop-and-frisk events: Wooditch, A., & Weisburd, David (2016). Using Space-Time Analysis to Evaluate Criminal Justice Programs: An Application to Stop-Question-Frisk Practices. Journal of Quantitative Criminology, 32(2), 191–213. doi:10.1007/s10940-015-9259-4

Heterogeneity vs. state dependence

Burglaries are the most common crime studied, presumably because the theory is clear: burglars like returning to areas they’re familiar with. But this is easily confounded with spatial heterogeneity: some places are better to burgle than others, regardless of whether they were recently burgled. This seems connected to the state dependence vs. heterogeneity problem, Heckman, J. J. (1991). Identifying the hand of past: Distinguishing state dependence from heterogeneity. The American Economic Review, 81(2), 75–79.


Self-exciting point process models

See also Self-exciting point processes.

It’d be useful to combine hotspot models and near-repeat effects. As Gorr has pointed out, hotspots can be either chronic (like the methods above try to find) or temporary, caused by, say, a new burglar hitting several houses in an area. Gorr, W. L., & Lee, Y. (2015). Early Warning System for Temporary Crime Hot Spots. Journal of Quantitative Criminology, 31(1), 25–47. doi:10.1007/s10940-014-9223-8

Mohler and colleagues have a series of papers on self-exciting models for crime, which allow both chronic hotspots and self-exciting temporary clusters:

Their methods have been adapted by others. (See also the Epidemic/endemic models section of Self-exciting point processes for application to epidemiology.)

There are also modeling approaches that aren’t self-exciting:

Other prediction methods


Crime is, naturally, affected by the weather.

Predictive policing and the law

A series of papers on how predictive policing interacts with the Fourth Amendment:

First, it’s surprising to see that courts already have recognized an implied Fourth Amendment exception for “high-crime areas”, which contribute to finding reasonable suspicion for a stop and search: Ferguson, A. G. (2011). Crime Mapping and the Fourth Amendment: Redrawing “High-Crime Areas”. Hastings Law Journal, 63(1), 179–232.

Next, more on the concerns caused by data and predictive policing being used to justify searches: