Cluster analysis is a technique of exploratory spatial data analysis that identifies spatial clusters, or areas of high or low risk that are surrounded by areas of non-random similar or dissimilar risk (Murray, McGuffog, Western, and Mullins 2001; Anselin, Cohen, Cook, Gorr, and Tita 2001; Besag and Newell 1991). Spatial cluster analysis can be employed in any field where the identification of points or areas with statistically significant high or low rates is paramount to understanding the location and characteristics of a phenomenon. The use of spatial cluster analysis is widespread, including in the fields of archaeology, ecology, economics, and genetics, and it is a useful investigative technique whenever aetiology is expected to vary due to geographic attributes (Kulldorff, Tango, and Park 2003; Marshall 1991). This extends to investigations of crime in general, and drug offences in particular, as these are known to have a spatial dimension such that some areas exhibit high drug offence rates while other areas exhibit low drug offence rates (Ratcliffe and Breen 2011; Robinson and Rengert 2006; Rengert, Chakravorty, Bole, and Henderson 2000; Chakravorty 1995).
The applications of spatial clustering in the crime context are numerous from both a practical and academic perspective. In practice, the high economic and societal costs of crime make it imperative that crime prevention and enforcement be efficient and effective (Sharpe 2000). Certainly, one way to address this is to make the role of police more place-specific (Lawton, Taylor, and Luongo 2005) through an improved understanding of the geographic distribution of high crime rates, allowing for enforcement to be targeted to crime hotspots or high risk crime places, and prevention efforts to be strategically tailored to neighbourhood-scale characteristics of crime hotspots (Grubesic and Murray 2001; Brantingham, Brantingham, and Taylor 2005; Braga 2001; Braga 2005). For example, large clusters may be used to inform broad scale policing and crime prevention such as police patrols, as well as identify socio-economic or environmental characteristics that may influence cluster location. In contrast, small clusters may be more suitable to be targeted with resource-intensive policing and crime prevention initiatives not feasible on a larger scale because of high resource requirements.
Academically, local cluster analysis is an apt starting point for systematic inquiry. It aids in identifying the presence of geographic patterns (for example, spatial autocorrelation--the degree to which observations atone location are similar or dissimilar to observations nearby [Burra, Jerrett, Burnett, and Anderson 2002]) that can provide insight for hypothesis generation and form the basis for unique statistical tests such as spatial regression models. In addition, clusters can be investigated as to how they work within existing theoretical frameworks of environmental criminology as well as in relation to neighbourhood-scale mechanisms thought to influence crime, such as collective efficacy or institutional resources (Townsley 2009). McCord and Ratcliffe (2007), for example, explore the interaction between social disorganization and routine activity theories and the location of drug offences in Philadelphia, Pennsylvania. In this case, the authors calculated a location quotient to investigate if drug arrests were clustered around criminogenic locations; however, local spatial cluster detection methods could also be employed to visualize the location of statistically significant high drug offence clusters and identify, for instance, land use variables to include in confirmatory analysis (McCord and Ratcliffe 2007).
There are two shortcomings in how spatial cluster detection techniques were used in previous studies of crime that this study seeks to address. First, despite the numerous advantages of employing spatial cluster detection in crime analysis, no local cluster detection method has been proven preferable to others, in this context. Because there are many different methods and thus many possible resulting clusters, understanding which method most suitably models the phenomena under study and best informs practical applications is important. Second, in studies of criminal geography, it is not uncommon to suggest the presence of a hotspot or cluster without employing statistical spatial cluster detection methods. For example, in Charron's (2009) overview of crime in Toronto, he frequently refers to clusters (i.e., "Most of the smaller shopping centres represent secondary clusters of property crime"), but determines these on visual observation alone and cannot, therefore, infer the significance of the clusters. It is entirely possible that visually observed clusters are a product of map or data characteristics (e.g., scale, legend, use of colour, or measurement/non-measurement of offence count or rate) and not of the disproportionate distribution of crime. In light of these two shortcomings, this article has two research objectives: first, to identify the locations of drug offence rate hotspots in Toronto; and second, to highlight the advantages and limitations of four local cluster detection methods in their application to studies of crime and practical efforts including policing.
Fist, this article will provide a brief introduction to cluster analysis, with a specific focus on four local cluster analysis methods: (1) spatial scan statistic based on Euclidean radius (SSS), (2) spatial scan statistic based on non-Euclidean contiguity (SSS-contiguity), (3) flexibly shaped scan statistic (FSS), and (4) local Moran's I (LMI). Next will be a brief discussion of the data, including an overview of Toronto and the prevalence of drug offences in the city. Results of cluster analysis will follow, and finally, a discussion that explores the results of the four cluster analysis methods and potential applications of these methods to inform future research, crime prevention, and police operations.
Local cluster analysis approaches
There are two broad classes of spatial cluster detection: global and local. Global methods measure the average tendency of data to disprove the null hypothesis of spatial randomness over the entire study region but do not indicate the specific location or significance of individual clusters (Chakravorty 1995; Burra et al. 2002). Local methods, on the other hand, identify individual clusters (hot spots) as they process subsets of global data and identify neighbouring areas that exhibit disproportionately high or low risk relative to a null hypothesis of spatial randomness (Anselin 1995; Anselin et al. 2001; Kulldorff et al. 2003). In general, local methods are more advantageous than global methods because they identify the specific location of clusters and measure significance against the null hypothesis for all detected clusters. Notable local cluster detection methods beyond the ones examined here include the Getis and Ord's Gi* statistic (Getis and Ord 1992), which is similar to the local Moran's I, but instead of comparing neighbouring values with the overall mean, the Gi* statistic analyses the sum of neighbouring values.
The spatial scan statistic (SSS) (Kulldorff 1997) is one of the most widely used statistical methods for local cluster detection in epidemiological studies (Song and Kulldorff 2003; Fang Yan, Liang, de Vlas, Feng, Han, Zhao, Xu, Bian, Yang, Gong, Richardus, and Cao 2006) and crime research (Ceccato 2005). The SSS imposes a circular scan window of a given radius centred on an area centroid. The radius increases to a specified limit, usually 50% of population at risk. For each scan window, as the window both increases in size and moves across area centroids, a likelihood ratio, which is primarily a function of the number of cases and population at risk within and outside the window, is calculated. The most likely cluster comprises the areas contained within the scan window that possess the greatest likelihood ratio, and secondary clusters are ranked according to likelihood ratio. In this study, SSS analysis used a discrete Poisson model, where the expected number of cases is assumed to be proportional to area population at risk. The maximum circular scan window size was set to 50% of the population at risk.
The SSS can be modified to include a non-Euclidean neighbour contiguity file (SSS-contiguity). In this case, the scan window centres on the target area centroid and radius increases to include the neighbours specified in the contiguity matrix. This variation on the SSS limits the maximum scan window size and thus, the size of detected clusters. As with the SSS method, the most likely cluster is the group of areas with the highest likelihood ratio. Most commonly, contiguity is specified through queen relationships (e.g. Cohen and Tita 1999), which include all neighbours that share a vertex. The construction of the contiguity matrix is paramount to the accuracy and relevance of cluster detection, since it is a...