Using Geographic Information Systems (GIS)

Read this article. When you read the section on Centroid (geometric and population-weighted), think about the location of your local supermarket. Where is it in relation to customers, suppliers, or other partners?

Data collection

Data acquisition

One of the greatest challenges facing GIS users is the acquisition of detailed data sources that contain locational and attribute information on the built environment. Spatial data can be acquired using primary or secondary data collection methods. Primary data are often collected using two common methods: 1) "psychometric" based on surveys of individuals who report on characteristics of the environmental feature of interest; and/or 2) "ecometric" though direct or "systematic social" observations undertaken by fieldwork auditors who visit neighbourhoods to make observations or to complete an audit tool. More recently, tools that enable the direct integration of collected spatial data into GIS have been developed including Global Positioning Systems (GPS) and remote sensing (captured remotely using satellites to identify green space, topography etc.). Secondary spatial data are collected by external sources and include administrative data (e.g. from a census), commercial data (e.g. from market research companies), internet resources (e.g. company websites or Google street view), and phone directories (e.g. yellow pages). Commercial data are increasingly being acquired by researchers as a key data source for identifying features of the built environment. Compared to primary data these, and other secondary data sources, may be relatively cost-effective to obtain and can usually be sourced for specific study areas or across a large geographical area (e.g. nationwide). Where secondary data are utilised, it is important to record the steps taken in this process (in the form of metadata) so future users can accurately interpret and use these data and that the process can be replicated by other researchers. A key drawback of secondary data sources is that they are often not designed for the analytical purposes for which they are being used and therefore may not entirely meet the needs of the researcher. Therefore, in order to ensure their accuracy, validation against primary data is often preferable. Discordance between data collected in the field (primary data) and secondary data are mainly due to three possible errors: 1) facilities included in the commercial database are not found in the field; 2) facilities are included in the commercial database but not considered to be the same service type when identified in the field; 3) facilities found in the field were not in the commercial database. Specific results on the accuracy of secondary data sources have previously been reported for physical activity facilities and the food environment. To summarise, findings suggest most sources of secondary data have sufficient error to potentially introduce bias into analyses. Both primary and secondary data often require manual geocoding to transpose the data into a GIS compatible format.


Geocoding

Geocoding is the process of matching raw address information (e.g. the household addresses of study participants or the addresses of neighbourhood resources such as supermarkets) with a digital spatial dataset that includes all addresses within the area of interest mapped to latitude and longitude coordinates. Geocoding is often preceded by data acquisition whereby data are acquired from primary or secondary sources. Geocoding is prone to a number of errors which can bias estimates of the associations between the built environment and health. The first source of error relates to the match rate which is the percentage of addresses that are successfully geocoded. Higher match rates are achieved when the raw address file is accurate and the digital data set is comprehensive and regularly updated. Low match rates may occur because of incomplete address information and errors such as incorrect street suffixes, mis-spelling of street names, suburbs, and postal area information. Second, even when high match rates are achieved, addresses may be geocoded to the incorrect location. This error may arise because of inaccuracies in the raw address and spatial digital files or the program settings (i.e. the criteria used to define a match such as sensitivity to spelling of street names).


Global Positioning System (GPS)

A Global Position System (GPS) is a device that uses a satellite system to pinpoint a stationary location on the earth to a latitude and longitude coordinate. In environment and health work, it is a valuable tool for field auditors that can facilitate the accurate and precise primary data acquisition of the location of features within the built environment such as food stores, parks or outdoor advertising. GPS devices also enable investigators to track the mobility patterns of individuals through the environment to develop measures of their travel routes and activity spaces. These technologies have recently been coupled with devices such as accelerometers (that provide objective measures of physical activity) so that the precise location where the physical activity is occurring is also captured. Given the high cost of the equipment, these data are often costly to collect, especially when seeking sufficient numbers to power epidemiological analyses. Further, GPS technologies are at the developmental stage and challenges remain including signal loss, slow location detection, precision of the device, battery power, and study participants forgetting to switch on the device. These factors may affect the completeness and accuracy of the GPS data. However, to aid new users, data collection and cleaning protocols to reduce the severity of these potential issues have been developed.