|تعداد مشاهده مقاله||1,253,180|
|تعداد دریافت فایل اصل مقاله||1,037,496|
Application of multivariate statistics and geostatistical techniques to identify the spatial variability of heavy metals in groundwater resources
|Caspian Journal of Environmental Sciences|
|مقاله 2، دوره 13، شماره 4، زمستان 2015، صفحه 333-347 اصل مقاله (641 K)|
|نوع مقاله: Research Paper|
|The performance of geostatistical and spatial interpolation techniques for estimation of spatial variability of heavy metals and water quality mapping of groundwater resources in Ramiyan district (Golestan province- Iran) were investigated. 24 spring/well water samples were collected and the concentration of heavy metals (Ni, Co, Pb, Cd and Cu) was determined using Differential Pulse Polarography. Multivariate and geostatistical methods have been applied to differentiate the influences of natural processes and human activities as to the pollution of heavy metals in groundwater across the study area. The results of the Cluster Analysis and Factor Analysis show that Ni and Co are grouped in the factor F1, whereas, Pb and Cd in F2 and Zn and Cu in F3. The probability of presence of elevated levels for the three factors was predicted by utilizing the most appropriate Variogram Model, whilst the performance of methods, was evaluated by using Mean Absolute Error, Mean Bias Error and Root Mean Square Error. The spatial structure results show that the variograms and cross-validation of the six variables can be modeled with three methods, namely,the Radial Basis Fraction, Inverse Distance Weight and Ordinary Kriging. Moreover, results illustrated that Radial Basis Fraction method was the best as it had the highest precision and lowest error. The Geographic Information System can fully display spatial patterns of heavy metal concentrations, in groundwater resources of the study area.|
|Groundwater resources؛ Heavy metals contamination؛ Geostatistical؛ Multivariate statistics؛ Interpolation؛ Spatial mapping|
Application of multivariate statistics and geostatistical techniques to identify the spatial variability of heavy metals in groundwater resources
F. Khanduzi, A. Parizanganeh*, A. Zamani
Department of Environmental Sciences, Faculty of Science, Environmental Science Research Laboratory, University of Zanjan, Zanjan, Iran
* Corresponding author’s E-mail: firstname.lastname@example.org
(Received: Feb. 29.2015 Accepted: July. 22.2015)
The performance of geostatistical and spatial interpolation techniques for estimation of spatial variability of heavy metals and water quality mapping of groundwater resources in Ramiyan district (Golestan province- Iran) were investigated. 24 spring/well water samples were collected and the concentration of heavy metals (Ni, Co, Pb, Cd and Cu) was determined using Differential Pulse Polarography. Multivariate and geostatistical methods have been applied to differentiate the influences of natural processes and human activities as to the pollution of heavy metals in groundwater across the study area. The results of the Cluster Analysis and Factor Analysis show that Ni and Co are grouped in the factor F1, whereas, Pb and Cd in F2 and Zn and Cu in F3. The probability of presence of elevated levels for the three factors was predicted by utilizing the most appropriate Variogram Model, whilst the performance of methods, was evaluated by using Mean Absolute Error, Mean Bias Error and Root Mean Square Error. The spatial structure results show that the variograms and cross-validation of the six variables can be modeled with three methods, namely,the Radial Basis Fraction, Inverse Distance Weight and Ordinary Kriging. Moreover, results illustrated that Radial Basis Fraction method was the best as it had the highest precision and lowest error. The Geographic Information System can fully display spatial patterns of heavy metal concentrations, in groundwater resources of the study area.
Key words: Groundwater resources, Heavy metals contamination, Geostatistical, Multivariate statistics, Interpolation, Spatial mapping.
Water is the basic requirement for all life on earth and an increase in the population and urbanization necessitates growth of agricultural and industrial sectors, increasing demand for fresher water. When surface water is not available; the alternative is to depend on Groundwater (GW) (Subramani et al., 2012). A variety of natural and human factors, affects the quality and use of water resources. Heavy metals are among the major pollutants of these sources (Marcovecchio et al., 2007). Many human activities, such as agriculture, mining and the combustion of fossil fuels, release heavy metals into the environment. Thereby, with an increase in their concentration and a decrease in the capacity of soils towards heavy metals, these leach into the soil solution and GW and then they accumulate in living tissues among people through the food chain (Mantovi et al., 2003; Lei et al., 2008), in addition to being sensitive indicators for monitoring changes in the aqueous environment. In environmental monitoring, such as groundwater quality investigations, the collected data may harbor signiﬁcant uncertainty, including complex or extremely complicated variations in the observed values of measurable characteristics, of the investigated medium or pollution sources in time and space (Yeh et al., 2006). Geostatistics, is a spatial statistical technique used in environmental monitoring, which is applied to analyze and map distributions of pollutant concentrations and their spatial and temporal variations. It is more widely used to analyze the collected data from groundwater resources (Yu et al., 2003; Yeh et al., 2006; Nas & Berkta, 2006; Khodapanah & Sulaiman, 2009; Uyan & Cay, 2010; Amin et al., 2010; Belkhiri et al., 2011; Sarukkalige, 2012). Furthermore, the application of different multivariate statistical techniques helps in the interpretation of complex data matrices, for a better understanding of water quality of the studied systems. These methods allow identification of possible factors/sources which inﬂuence the water systems and offer a valuable tool for a reliable management of water (Shrestha & Kazama, 2007; Iscen et al., 2008; Ogunribido & Kehinde–Philips, 2011; Li et al., 2012; Bajpayee et al., 2012). Multivariate geostatistical methods combine the advantages of geostatistical techniques and multivariate analysis, while incorporating spatial or temporal correlations and multivariate relationships to detect and map the varied sources of spatial variation on different scales (Smyth & Istok, 1989; Einax & Soldt, 1999; Yeh et al., 2006; Zheng et al., 2008; Lin et al., 2009). Excavation of coal mines, agricultural activities and development of industrial parks in Ramiyan, in Golestan Province (Iran), provoke evaluation of contaminations resulting from these activities. The lack of a systematic investigation of the probable contamination by heavy metals in Ramiyan, urges an assessment of the quality of groundwater sources in this area.
The aquifer is the main source for drinking and irrigation critical for the local residents. 24 well/spring samples were collected and analyzed by voltametric method for determination of such heavy metals. The presence and concentration of heavy metals were determined and the results were compared to the maximum contaminant level, specified by WHO and the Institute of Standards and Industrial Research of Iran (ISIRI). This study aims at investigating the contents of Cu, Ni, Zn, Cd, Pb and Co in the groundwater resources of Ramiyan, including the analysis of their spatial distribution as well as unveiling their possible sources by integrating multivariate statistical and geostatistical methods.
MATERIAL AND METHODS
Golestan Province is located in the Southeast of the Caspian Sea in Northern Iran. The study area is Ramiyan district, with an area of 780.73 km2 situated between 54˚ 45´ and 55˚ 15´ east longitude and 36˚ 48´ and 37˚ 12´ north latitude. The main activity carried out in this area is agriculture and the main crops grown are wheat, oilseeds, rice and garden products (Mosaedi & Gharib, 2008). Due to the presence of coal mines, industrial and mining activities have also been developed across the study area.
The samples for the assessment of groundwater pollution with heavy metals were collected from twenty four stations (wells/springs) in the study area (Fig 1, Table 1). The sampling was carried out in summer 2012 and from each station three replicate samples were selected for analysis. The glassware and vessels were treated in 10% (v/v) nitric acid solution for 24 hrs and were washed with distilled and de-ionized water. The samples were collected in polypropylene containers, labeled and a few drops of HNO3 (ultrapure grade) of pH < 2 were added immediately, to prevent the loss of metals, bacterial and fungal growth. These were then stored in a refrigerator.
Multivariate and geostatistical analysis
The multivariate analysis provides techniques, such as the Principle Component Analysis (PCA), Factor Analysis (FA) and Cluster Analysis (CA) for classifying the inter-relationship of measured variables (Zamani et al., 2012). The Cluster Analysis was performed on the data, by utilizing the Ward Method and Squared Euclidean Distance characteristic. Multivariate geostatistical methods combine the advantages of geostatistical techniques and multivariate analysis, whereas, the geostatistical techniques have been applied to illustrate the incorporating spatial or temporal correlations and multivariate relationships, in order to map the various sources of spatial variation on divergent scales (Faccinelli et al., 2001). Geostatistics is presented as a collection of techniques for solving estimation problems involving spatial variables. It includes a variety of tools such as interpolation, integration and differentiation of hydro-geologic parameters to produce the prediction surface and other derived characteristics from measurements at known locations (Sahoo & Jha, 2014).
The first step in the geostatistical estimation, is a provision of a model that can facilitate the computation of semivariogram value for any possible sampling intervals. The most commonly used models are the Spherical, Exponential, Gaussian and Pure Nugget effect (Isaaks & Srivastava, 1989). The semivariogram plays a fundamental role in the analysis of geostatistical data by employing the Kriging Method. Prior to performing Kriging, a valid semivariogram model has to be selected and the model parameters have to be estimated (Pang et al., 2009). An experimental semivariogram is calculated as follows: (1)
Where, denotes the semivariogram, is the spatial interval, which is designated as lag; is the observed paired data, when the interval and are the measured values, when the Z(x) values are as xi+h, respectively. Valid models which are commonly fitted to the experimental semi variograms include the spherical, Gaussian and exponential functions. These are characterized by a sill, which represents the covariance accounted for by the model and a range that signifies the extent of spatial correlation. The value of the semi variograms is referred to as the nugget effect, where the model approaches the abscissa. These significant geostatistical parameters can indicate the spatial variation and relativity of regionalized variables under a certain scale (Yang et al., 2009).
Fig 1. Location map of Ramiyan and the sampling points.
Kriging Method was used as estimating tool in sustainable management of groundwater. It is a geostatistical interpolation technique that considers both the distance and the degree of variation between known data points when estimating values in unknown areas (Sahoo & Jha, 2014). This technique is an exact interpolation estimator, which is used to detect the best linear unbiased estimate. The optimum linear unbiased estimator must have a minimum variance of error of estimation (Einax & Soldt, 1999; Ahmadi & Sedghamiz, 2008).
In order to estimate the values of some locations which are not sampled, it is necessary to solve the following linear equation:
denotes the estimate of the unknown value and are the weights of known neighboring points .
Kriging is an estimating method that is stable on weighty mobile average coincident. This estimator is known as a best unbiased linear estimator. Spherical, circular, Gaussian and exponential functions are available models when the Kriging method is ordinary (Nas, 2009). Goovaerts describes the detail of the method (Goovaerts, 1997). Because it uses statistical models, it allows a variety of map outputs, including predictions, prediction standard errors, probability, and quantile maps. Among the various forms of Kriging, ordinary Kriging has been used widely as a reliable estimation method (Nas, 2009). In interpolation with the Inverse Distance Weighted (IDW) method, a weight is attributed to the point to be measured. In other words weight is the function of inverse distance and closer points have more influence in estimating unknown points (Eslami et al., 2013). The amount of this weight depends on the distance of the point to another unknown point.
These weights are controlled on the bases of power ten. So, with an increase of power, the effect of the points (that are farther) diminishes, whilst a lesser power distributes the weights more uniformly between neighboring points. In this method the distance between the points counts, so that, the points of equal distance have equal weights (Balakrishnan et al., 2011). The weight factor is determined based on the distance between the data points as follows:
Where designates the weight of point which is the distance between point i and the unknown point, which is the weight on the bases of power ten and n is the number of data points (Karandish & Shahnazari, 2014). Kriging in geostatistics is similar to inverse distance weighting except that the weights are based not only on the distance between the measured sampling points but also on the overall spatial arrangement among the sampling points. The basic assumption in kringing is that the sampling points that are close to each other are similar than those that are away. Kriging is regarded as an optimal spatial interpolation method, which is a type of weighted moving average (Gorai & Kumar, 2013). The Radial Basis Functions (RBF) Methods are a series of exact interpolation techniques, where the surface must go through for each measured sample value. The basis of each function has a different shape and results in a slightly different interpolation surface (Kazemi Poshtmasari et al., 2012). RBF Methods predict values that can vary above the maximum or below the minimum of the measured values. For all RBF Methods, there is a parameter that controls the smoothness of the resulting surface. The estimated values of the methods are based on a mathematical function that minimizes the overall surface curvature, generating surfaces that are quite smooth. The differences among them are slight, so the generated surfaces are almost similar. A formula f, which minimizes the following factor [eq. (4)], is an example of the RBF technique and more specifically of the exact SP line method (Karydas et al., 2009):
(4) Where signifies the source of random error, is the measured value of an attribute at point and epsilon is the associated random error. The term represents the smoothness of the function f and the second term represents its proximity to the data (Karydas et al., 2009).
The adequacy and validity of the developed semivariogram models was tested satisfactorily by a technique called cross-validation. The idea of cross-validation consists of removing a datum at a time from the data set and reestimating this value from remaining data by using different variogram models. The interpolated and actual values are compared, and the model that yields the most accurate predictions is retained (Burrough & McDonnell, 1998; Karimi Nezhad et al., 2012 ;). In this paper, to compare the applied Interpolation methods, a cross validation was performed by utilizing the Mean Bias Error (MBE), Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) of the statistical parameters. When MAE and MBE shift to zero, the applied method simulates the fact well. Finally, we used the RMSE to evaluate the model performances in the cross-validation mode. Each of these measures is such ‘dimensioned’ that, it expresses an average interpolator error in the units of the variable of interest. The smallest RMSE indicates the most accurate predictions. This method was recently adopted by many researchers (Twomey & Smith, 1996; Willmott & Matsuura, 2006; Kazemi Poshtmasari et al., 2012; Karandish & Shahnazari, 2014). These parameters are calculated according to the following equation Nos. (5 to 7):
Where Z (xi) is the observed value at point xi, Z*(xi) is the predicted value at point x and N denotes the number of samples.
RESULTS AND DISCUSSION
The extent of heavy metal contamination
The results of the analysis of target metal ions i.e., Co, Ni, Zn, Cd and Pb in samples from 24 wells/springs under study are given in Table (2). The results show that Co, Ni, Pb and Cd are evident in 100% of the samples and Zn and Cu are detected in 96% and 88% of the samples, respectively. The concentration of investigated metals (in µg/L) in the samples were found to be below their MCL and in the ranges of 5.69 -92.44 for Zn, 1.23 -7.06 for Pb, 0.14-8.40 for Cu, 0.01-0.99 for Cd, 1.23 -21.79 for Ni and 0.49 -7.79 for Co. The geographical location of the sampling stations and the average concentrations of metals at each station are shown in Table (1).
Classification survey of heavy metals by the Cluster Analysis Method
Two main groups of elements have been determined using the Cluster Analysis Method, one group includes Ni and Co and the other comprises of Pb, Cd, Zn and Cu (Fig. 2).
Principal component analysis and factor analysis
The major objective of the Factor Analysis (FA) is to reduce the contribution of less significant variables so as to further simplify even more of the data structure given by the PCA. This goal can be achieved by rotating the axis defined by the PCA and the construction of new variables, which are also called Varifactors (Shrestha & Kazama, 2007). Prior to such analysis, the raw data is commonly normalized to avoid misclassifications, due to the varied order of magnitude and range of variation of the analytical parameters (Tabachnick & Fidell, 2007). This process reduces the dimensionality of data by a linear combination of original data, to generate new latent variables which are orthogonal and uncorrelated to each other (Nkansah et al., 2010). According to the results of the Eigen values in Table (3), three factors are extracted from the available data set, which accounts for over 82.07% of all the data variation. The common factors were extracted by means of the maximum-likelihood method with the Varimax-rotation.
Nickel and cobalt, contained in the first factor, are typical emitted elements of electronic plants. The second factor includes cadmium and lead elements which are emitted by the agricultural activities and the metallurgical plant.
The third factor is loaded with zinc and copper, which are emissions of batteries, pigments and fungicides. The heavy metal grouping has been explored in plotting the first three principle components generated from these parameters (Fig. 3).
Table 1. GPS location and concentration of heavy metals in sampling stations.
Table 2. Summary of statistics of heavy metal contents in water samples (µg/L).
Table 3. Rotated component matrix of three-factor model.
†Extraction method: Principle component analysis. Rotation method: Varimax with Kaiser Normalization.
Fig. 2. Dendrogram of heavy metal concentrations in groundwater samples.
Fig. 3. Component plot in rotated space for heavy metals (Factor loading, factor 1 vs. factor 2 vs. factor 3, Rotation: varimax normalized, extraction: principle component).
Spatial structure analysis
The geostatistical analysis is to be assumed that the distribution behavior of the metal ions in the sampling stations is normal. The random and normal distribution assumptions were checked by the (K-S) (Kolmogorov-Smirnov) Methods. Alternatively, the homogeneity and normal distribution in the data, can be achieved by transforming the obtained data to another mathematically presentation, which lowers the difference between the data. This can be achieved by using the logarithmic form of data.
The normality of heavy metal data set was checked by the Kolmogorov–Smirnov Test. It is often observed that environmental variables are lognormal (McGrath et al., 2004), and data transformation is necessary to normalize such data sets. The normality tests of the six heavy metals for the 24 samples were performed as described by K-S test. It was detected that only Cu and Zn were in accordance with the normal distribution using K-S (p>0.05) before data transformation. To further normalize the data logarithmic transformation was utilized (Table 4).
After the logarithmic transformation of the original data, a normal distribution can be obtained. Thus, the following calculations must be performed on the logarithms of the data. After normalizing the data Semivariogram parameters were generated for each theoretical model.
Then, the confidence level of all variograms was evaluated using the ratio of nugget variance to sill which is regarded as a criterion for classifying the spatial dependence of ground water quality parameters. If this ratio is less than 25%, then the variable has strong spatial dependence; if the ratio is between 25 and 75%, the variable has moderate spatial dependence and the ratio greater than 75%, represents weak spatial dependence (Taghizadeh et al, 2008).
The most appropriate theoretical model was selected, which was based on highest R2 and lowest RSS (Table 5).
Table 4. Normal distribution behaviors of heavy metal concentration.
Table 5. Summary of the most appropriate models for different heavy metals of GW.
The attributes of the semivariograms for each factor are summarized in Table (5). Semivariograms show that the first and second factors are appropriate with the Exponential Model, whereas, the third factor fits well with the Gaussian Model. The values of R2 illustrate that the semivariogram models give good descriptions of the spatial structure of the heavy metals of groundwater. The nugget/sill ratios can be regarded as the criterion to classify the spatial dependence of data sets (Liu et al., 2009). The ratio of nugget to sill (RNS) can be used to express the extent of spatial autocorrelations of environmental factors, for example, groundwater heavy metal concentrations, in this study. A low RNS indicates the strong spatial autocorrelations of heavy metal concentrations in groundwater sources, while a high RNS indicates that random effects play an important role in spatial heterogeneity of heavy metals (Zheng et al., 2008). The RNS of six heavy metals demonstrate weak spatial correlations for all factors. Cross-validation permits the determination as to which model provides the best predictions (Adhikary et al., 2012).
Table 6. Geostatistical analyses of heavy metals in groundwater (Ramiyan area).
The applicability of different semivariogram models is tested by cross-validation and best model is selected (Table 6). In this study, ordinary kriging (OK), IDW and RBF were utilized to estimate six heavy metal concentrations. Comparisons between different methods were carried out by the MAE, MBE, and RMSE statistical parameters. In this research, the Radial Basis Functions Method (Inverse Multiquadric Model) was found to be the most suitable method for the estimation of Ni mapping. Whereas, statistics for the geostatistical method also show that Ordinary Kriging for Pb (Exponential Model), Zn and Cu (Gaussian Model); the Inverse Distance Weighted method for Co (power 2) and Cd (power 3) provides a much better estimation for results of concentrations, than the other methods (Table 6).
After plotting the values of heavy metal concentrations of groundwater for various sample locations, drinking water quality maps for heavy metal concentrations, can be drawn to demonstrate locations, where the water is almost clean or to some extent at risk (Fig 4).
Filled contour map of Co Filled contour map of Ni
Filled contour map of Cd Filled contour map of Pb
Filled contour map of Cu Filled contour map of Zn
Fig. 4. Filled contour maps of heavy metals in sampling groundwater.
Due to the complexity and a large variation of environmental data sets, the application of geostatistical and multivariate statistical methods is recommended.
The main objective of this study was to determine the best estimators for providing heavy metals maps in ground water resources in Ramyian district. The application of multivariate statistical and geostatistical methods were performed on six heavy metals and three principal components were identified, so as to represent the variability of heavy metals in groundwater sources. From the spatial distributions of 6 heavy metals, it was evident that the parent materials and anthropogenic factors played important roles in heavy metal concentrations of GW in Ramiyan. The effects of these two factors varied with that of the heavy metals. The results of the Cluster Analysis (CA) and Factor Analysis (FA) on the heavy metals, showed that Ni and Co was grouped in factor F1, Pb and Cd in F2 and Zn and Cu in F3. The probability of the presence of elevated levels of the heavy metals studied in the groundwater was predicted by using the best-fit semivariogram model. The performance of methods was evaluated by utilizing the Mean Average Error (MAE), Mean Bias Error (MBE), and Root Mean Square Error (RMSE). Moreover, results showed that Radial Basis Functions (RBF), Inverse distance weighted (IDW) and Ordinary Kriging (OK) methods were the best methods employed to estimate the Ni; Co and Cd; Pb, Zn and Cu mappings, respectively. The Geographic Information System (GIS) can fully display the spatial patterns and relationships among landscape indices and heavy metal concentrations, in the groundwater of this area of study. Application of different multivariate statistical techniques interprets complex data matrices and better understanding of water quality. Although the concentrations of investigated metals in the collected samples were found to be below their maximum contaminant level values reported by WHO and ISIRI but the source of heavy
metals contamination should be investigated specially in hot points within the studied area.
Sincere gratitude to Rural Water and Wastewater Company (Golestan Province, Iran) for partial financial support (Grant Number 4987). The authors gratefully acknowledge Younes Khosravi’s contribution to this work.
تعداد مشاهده مقاله: 532
تعداد دریافت فایل اصل مقاله: 814