Institute of Ecosystem Studies

2009 BES Annual Meeting Presentation and Poster Abstracts

A comparison of three models describing the spatial distribution of lead in urban residential soils of Baltimore, Maryland
Schwarz, Kirsten
Co-Authors: Kirsten Schwarz, Steward T.A. Pickett, Richard G. Lathrop, Kathleen C. Weathers, Richard V. Pouyat and Mary L. Cadenasso

Abstract: Lead contamination of Baltimoreís residential soil is widespread, highly variable, and a potential public health concern. With the inception of Marylandís plan to eliminate childhood lead poisoning by 2010, attention has been focused on old lead-based paint sources. Lead-based paint is an important factor in childhood lead exposure; however, it is not the only source of lead in the environment. Soil contaminated with lead from past use of leaded gasoline, deteriorating lead-based paint and industrial sources is also an important source of lead in the environment. Intensive sampling of 61 residential properties in Baltimore City revealed that 53% had soil lead levels that exceeded the United States Environmental Protection Agency reportable limit of 400 ppm. We used these data as the input to a predictive model to describe the spatial distribution of lead in urban residential soils. Here we present three different modeling approaches within a geographic information systems (GIS) environment: a traditional general linear model (GLM), and two machine learning techniques: Classification and Regression Trees (CART) and Random Forests (RF). The GLM revealed that housing age, distance to road, distance to building, and the interactions between distance to road and housing age, and distance to building and distance to road explained 38% of the variation in the data. The CART model confirmed the importance of these variables, with housing age, distance to building, and distance to major road networks determining the terminal nodes of the CART model. Using the same three predictor variables, the RF model explained 42% of the variation in the data. An independent dataset was used to evaluate the accuracy of the models. The overall accuracy, which is a measure of agreement between the model and independent dataset, was 89.66% for the GLM model, 82.76% for the CART model, and 72.41% for the RF model. This research highlights the usefulness of empirical models to predict the spatial distribution of lead in urban residential soils. Empirically- based GIS models have the potential to assist public health officials and city agencies in focusing efforts on contaminated soil removal and remediation.