Table of Contents
Section: Diversity analysis
Comparing diversity between communities
All diversity measures suffer from the same problem – they depend on the sampling effort, i.e. energy and time spent by a researcher to discover all species present in a community. The higher the sampling effort, the more species may be discovered. The sampling effort may include the area (or volume) surveyed, the number of individuals, or the number of sampling units (e.g. vegetation plots). If the diversity comparison between communities should be fair, the sampling effort behind their data should be comparable. If the sampling effort is not the same, one solution is to focus on Shannon or Simpson diversity instead of species richness. Species richness is the measure most sensitive to rare species, since each species has equal weight when calculating richness. Rare species (i.e. those represented by few individuals, small biomass or low cover) are those most easily undetected and require disproportionately greater sampling effort than common species. Focusing on Shannon or Simpson diversity shifts the focus on common and dominant species that have higher probability that they will not go undetected.
An alternative solution is to standardize sampled communities to the same sampling effort. In some cases, this may need to be done already in the field (e.g. sampling vegetation by plots of the same site), but sometimes we can standardize data even during the analysis. Below, we will introduce the method of rarefaction and extrapolation curves, which can standardize data with uneven numbers of individuals (or sampling units) or different levels of completeness.
Abundance data vs replicated incidence data
Aim of diversity comparison is usually to compare the true diversity of the communities. However, it is usually impossible (or impractical) to survey the community in its entirety, and we need to rely on less complete data, where limited number of individuals or sampling plots were recorded. For the purpose of further analyses, two alternative types of community data can be recognized: abundance data, containing the number of individuals for each species, and replicated incidence data, containing a set of samples in each of which species' presences and absences were reported. In reality, the third type of data, with multiple samples each containing counts of individuals for species can also be available; such data can be transformed to either abundance data (by pooling individuals from all samples together), or to replicated incidence data (by converting abundances of species in each sample into presences only).
The figure below shows the difference between abundance and replicated incidence data sampled in the same community. The true community has five species (A, B, C, D and E), and we survey it within a limited area (black rectangle). In the case of surveying abundance data, we record a certain number of individuals and determine each into a species; as a result, we get a vector containing the abundance of individuals for each species. In the case of replicated incidence data, we record a set of samples (e.g. vegetation plots), and for each sample, we record which species are present; as a result, we obtain a set of samples, which can be pooled into a vector containing the frequency of each species. (Note that in the example below, for replicated incidence data, we should do more than four samples to obtain more precise estimates of species abundance distribution).
Accumulation and rarefaction curves
In the case of abundance data, the more individuals we observe, the more species we are likely to end up with. This can be visualized on the accumulation curve (figure below, from Gotelli & Colwell 2001). When we record the first individual, we have one species. With every new individual, the number of species increases or does not (since the new individual is of the species we already recorded). The jagged curve represents one scenario of randomly choosing individual by individual from our sampled pool. The smooth rarefaction curve, in contrast, represents the means of repeated re-sampling of all pooled individuals. It can be constructed either by repeated subsampling from the pool of individuals for each number of individuals on the x-axis, or it can be calculated using an algebraic equation. Similar accumulation and rarefaction curves can also be constructed for replicated incidence data (instead of individuals, the x-axis represents the number of samples, each containing presence of one or more species).