Home      Research      Publications      Links      About Me      Other Stuff
Project Page

Effects of Anthropogenic Bomas on Savanna Vegetation, Insect Communities, and Herbivore Activity

Summary

The Boma project investigates the role of human-made livestock kraals (bomas) in shaping the savanna landscape. These enclosures were traditionally created by cutting down acacia trees in areas usually located on slightly sloping terrain, and arranging them so to create a circular fence that protects livestock from predators. The dung deposited while the boma is in use increases nutrient levels present in the soil, and the effects are noticeable even decades after the sites have been abandoned. In particular, the grass growing in glades left from bomas is likely more palatable to herbivores than the surrounding savanna landscape. This possibly results in different feeding behaviors inside and outside these nutrient hotspots, and thus has possible consequences in the movement of animals across this landscape made up of patches of extra-palatable grasses within a typical savanna landscape. This study extends previous works on the same topic by proposing a rigorous quantitative analysis at three levels: plants, insects and herbivore behavior, using state-of-the-art computational techniques to automatize the data analysis process.

On-Field Data Collection

Four relatively old (30+ years) glades have been identified and data have been collected during three weeks in January 2012 at the Mpala Ranch on the Laikipia Plateau, Kenya. Two of these sites are located on red soil area, while the other two are on black cotton soil.

To collect vegetation data, samples have been collected along three radial 100 mt. long transects spanning the interior, transition, to outside of each glade. The recorded vegetation data include:

Insects have been sampled along the transects in order to determine their abundance with respect to morphological classes. A herbivore exclosure straddles each glade site on the red soil, providing the chance to determine a baseline of no-herbivory, allowing us to quantify the impact of herbivore on vegetation. The sites are highlighted in the map below.



View Mpala Glade Sites. in a larger map

Analysis of Vegetation Data: Using Machine Learning to Determine the Boundaries of Boma Effects

In previous work regarding savanna glade hotspots, the edges of bomas are determined by analyzing the change in grass species composition. In our work, we identify these boundaries by using all the collected variables, but leaving out the grass species. The purpose is to find a sort of “boma signature”, function of the vegetative features, in the form of a descriptive model. The benefit of this approach is twofold:

  1. (1) it provides a description of how the vegetation responds to the high flux of nutrients over an extended period of time, and
  2. (2) the model can be used in a predictive fashion to detect the edge of other bomas.

As a first step, k-means clustering was used in order to determine if the values of the collected variables could be naturally divided into two groups corresponding to points inside and outside a boma. Even after some preliminary dimensionality reduction, the resulting clusters did not appear to define a region closer to the center of the glade that could be interpreted as the inside of the boma.

Given this result, we adopted supervised classification techniques in order to investigate how the features of the vegetation we collected vary between the inside and the outside of a boma. Intuitively, the question we posed was “Assuming that we know the edge of a boma, what are the characteristics of the vegetation that can be used to distinguish between points inside and outside such boundary?” We used Linear Discriminant Analysis and Random Forests for this purpose. Obviously, the data needs to be labeled (“IN” or “OUT”) before being fed to the algorithms. We considered three possible ways to define the edge of the old boma:

  1. Observing aerial images, set the edge of the boma along the transects where the glade ends and the acacia trees begin;
  2. Identify a clear discontinuity in species composition along the transect. For about half of the transects, a clear discontinuity could not be observed. In such cases, the edge was set using method (1).
  3. Run LDA for each possible position of the edge along the transects –excluding points too close to the ends. The edge was set to the point in which the accuracy of the LDA model was the highest. This third method can be viewed as a brute-force iterative application of a supervised classification method to provide an answer to an unsupervised classification question, i.e. “Where is the edge of the boma?”

Preliminary Results

The three methods above generate edge points along the transects that are close in the majority of cases. In the subsequent analyses, we use the third type of edge.

LDA and RF were applied to each individual boma site. Additionally, models were built for data aggregated on the basis of soil type and herbivore presence. Moreover, the algorithms were applied to the entirety of the collected data. The accuracy obtained by the resulting models, computed as explained in the previous section, are reported in the table below.

The accuracy for RF models are in some cases substantially lower than the corresponding LDA models. Remember that the accuracy was computed in a different way for the two algorithms, i.e. 10-fold cross-validation for LDA and out-of-bag classifier for RF.



Variable Importance

The importance of each variable is showed in comparative plots in the figure below.

From the variable importance plots, we observe that the relative importance of predictors in the models generated by the two algorithms agrees in most cases. When a large number of data points is available, such as in the global case, the relative weight of the variables is basically the same. We would like to point out that this result is per-se quite remarkable, since the two methods are profoundly different in nature: LDA generates a parametric (linear) model in a deterministic way, while RF is a nonparametric method whose power is based on the random choices that differentiate the classification trees belonging to the forest.

It is clear from the plots that, for the sites on red soil, the primary predictor is the below ground biomass (BGB). The minus sign indicates that the BGB decreases going from the center of the boma to the outside. For the bomas on black cotton soil, the most discriminating variable seems to be the above ground biomass (AGB), although the LDA and RF models do not agree on the best predictor site B2: RF seems to suggest a clear dominance of BGB, while AGB is the most important variable for the LDA model. We notice that for the LDA model, there is not a large difference in predictive power between the first three variables, indicating that a linear combination of them with approximately equal coefficients provides the most discriminant direction in the multi-dimensional space. The results for the samples collected in the exclosures are more difficult to interpret. It might be that the number of samples is not sufficient for the analysis to be statistically significant, and the high abundance of forbs clamped the data collection process. However, it is noticeable that this is the only case in which the root-to-shoot ratio appears to be important in the classification.



“No Sharp Edges”

From the clustering analysis previously discussed, a compatible hypothesis is that there is no sharp edge that can be observed in the vegetation variables that have been collected. The histogram of the position of the misclassified samples in the figure below seems to corroborate this hypothesis.

For both algorithms, the misclassified samples are concentrated near the boundary of the boma. If the two classes (inside vs. outside) were clearly separated in terms of predictors, the misclassified samples would be expected to be more uniformly distributed along the transects. The increased density around the edge indicates that the classifiers “struggle” to find the correct class for those samples. Again, note that the two methods are extremely different in nature, yet they show quite the same behavior with respect to the location of the misclassified samples.

Presentation Slides