Theory, Examples & Exercises
- Constrained ordination
Ordination (from Latin ordinatio, putting things into order, or German die Ordnung, order) is a multivariate analysis, which searches for a continuous pattern in multivariate data, usually the data about species composition of community samples (sample × species matrix). We can imagine such multivariate data as samples located in multidimensional hyperspace, where each dimension is defined by an abundance of one species (example for a community of two samples with three species is on Fig. 1).
The main assumption of ordination is that analyzed data are redundant, i.e. they contain more variables than is necessary to describe the information behind, and we can reduce the number of these variables (and dimensions) without loosing too much information. For example, in the case of species composition data, often some of the species are ecologically similar (e.g. species which prefer to grow in wet instead of dry habitat), meaning that the dataset contains several redundant variables (species) telling the same story. Or, to explain the redundancy in another way, from occurrence (or absence) of one species we can often predict occurrence (or absence) of several other species (e.g. if the sample includes species of wet habitats, we may expect that species preferring dry habitats will not be present, while other wet-loving species may occur).
Since multidimensional space is not easy to display, describe or even just imagine, it is worth to reduce it into few main dimensions, while preserving maximum information. This also means that if the individual variables are completely independent of each other (e.g. each species have entirely different preferences), then ordination is not likely to find some reasonable reduction of the multidimensional space since each dimension (species) has its meaning.
What ordination method does can be formulated in two alternative ways:
Table 1 summarizes individual ordination methods.
Ordination methods in Table 1 can be divided according to two criteria: whether their algorithm includes also environmental variables along to the species composition data: unconstrained do not, constrained do, and what type of species composition data is used for analysis: raw data (sample-species matrix of species composition), pre-transformed raw data (e.g. using Hellinger transformation), or distance matrix (sample-sample symmetric matrix of distances between samples).
Ordination axes are not constrained by environmental factors. The method aims to uncover the main gradients (directions of changes) in species composition data, and returns unconstrained ordination axes, which corresponds to the directoins of greatest variability within the dataset. Optionally, these gradients can be post hoc (after the analysis) interpreted by environmental variables (if these are available). Environmental variables do not enter the ordination algorithm. Unconstrained ordination is primarily an exploratory analytical method, used to explore the pattern in multivariate data; it generates hypotheses, but does not test them.
Ordination axes are constrained by environmental factors. It relates the species composition directly to the environmental variables and extracts the variance in species composition which is directly related to the environment. Environmental variables directly enter the algorithm, and the onstrained ordination axes corresponds to the directions of the variability in data which is explained by these environmental variables. The method is usually used as confirmatory analysis, i.e. it is able to test the hypotheses about the relationship between environmental factors on species composition (unlike unconstrained ordination, which is exploratory). It decomposes the total variance in species composition data into a fraction explained by environmental variables (related to constrained ordination axes) and not explained by environmenta variables (realted to unconstrained ordination axes). It offers several interesting opportunities when it comes to explanatory variables: forward selection (the selection of important environmental variables by excluding those which are not relevant for species composition), Monte Carlo permutation test (a test of significance of the variance explained by environmental factors) and variance partitioning (partitioning of the variance explained by different groups of environmental variables).
Methods based on analysis of raw sample-species matrices with abundance or presence/absence data. Within these methods, two categories are traditionally recognized, differing by an assumption of species response along the environmental gradient:
This category includes linear raw-data-based ordination methods (PCA, RDA), applied on sample×species data transformed by Hellinger (or one of several other) transformations. The Euclidean distance (implicit for PCA/RDA (Fig. 3) when applied on Hellinger-transformed species composition data results into Hellinger distance, which is more suitable for ecological data, because (contrary to Euclidean distance) it is asymetric (ignores double zeros). Legendre & Gallagher (2001) consider this as a preferable way how to analyse heterogeneous data (otherwise not suitable for linear methods) using linear ordinations1). Additionally to Hellinger transformation, the other suitable transformation is chord transformation, and other possible (but less suitable) transformations are species profile transformation, chi-square distance and chi-square metric transformations.
Methods using the matrix of distances between samples measured by distance coefficients, and projecting these distances into two- or more-dimensional ordination diagrams. Distance-based RDA (db-RDA) is the combination of PCoA, applied on raw data using selected distance measure, and RDA applied on eigenvectors resulting from PCoA. It offers an alternative to RDA (based on Euclidean distances) and tb-RDA (based on Hellinger distances if transformed by Hellinger transformation), with a freedom to choose distance measure suitable for investigated data2).
In the case that we sampled rather a short fraction of the environmental gradient (short red line segment at the left figure of Fig. 4), we may assume that species response (although fundamentally unimodal) can be modeled as linear (yellow line segment). In the case of the long gradient (figure at right), to model species response as linear would be wrong (right figure of Fig. 4).
To decide whether to apply linear or unimodal ordination method on your data, you can use the rule of thumb introduced by Lepš & Šmilauer (2003): first, calculate DCA (detrended by segments) on your data, and check the length of the first DCA axis (which is scaled in units of standard deviation, S.D.). The length of first DCA axis > 4 S.D. indicates heterogeneous dataset on which unimodal methods should be used, while the length < 3 S.D. indicates homogeneous dataset for which linear methods are suitable (see Fig. 5. In the gray zone between 3 and 4 S.D., both linear and unimodal methods are OK. Note that while linear methods should not be used for heterogeneous data, unimodal methods can be used for homogeneous data, but linear methods, in this case, are more powerful and should be preferred. Alternatively, if your data are heterogeneous, but you still want to use linear ordination methods (PCA, RDA), apply them on Hellinger transformed species composition data to calculate ordination based on Hellinger distances (as recommended e.g. by Legendre & Gallagher (2001)).