Theory, Examples & Exercises
- Constrained ordination
Correspondence analysis (CA, previously know also as reciprocal averaging and several other names), is a unimodal unconstrained ordination method. In the space of all ordination axes, it preserves chi-square distances among samples, which does not suffer from the double-zero problem but is blamed by some for being too much influenced by rare species (which is perhaps not true, see below). The data must be non-negative abundances or presences-absences. Correspondence analysis suffers from creating often strong arch artefact in ordination diagrams, which is caused by non-linear correlation between first and higher axes. Arch can be removed by detrending, which is the base of detrended correspondence method (DCA). Distribution of samples along the first (D)CA axis is used as a base of TWINSPAN classification algorithm.
Although nowaday's software is using matrix algebra to calculate CA, the original algorithm is based on reciprocal averaging of column and row scores, which starts from random values, and by interative row- and column-averaging converge into a unique solution, which represents the sample and species scores.
It has the following five calculation steps:
After calculating the sample and species scores for the first axis, one can continue to the second and higher axes, while maintaining linear independence from all previously calculated axes.
The following table (modified Table 4-5 from Šmilauer & Lepš 2014) shows a simple example how to calculate sample and species scores“
| Calculation steps:
1. Initial scores (0, 4, and 10)
2. Species scores:
u.WA1Cirsium = (0*0 + 0*4 + 3*10)/(0 + 0 + 3) = 30
u.WA1Glechoma = (5*0 + 2*4 + 1*10)/(5 + 2 + 1) = 2.25
u.WA1Rubus = (6*0 + 2*4 + 0*10)/(6 + 2 + 0) = 1
u.WA1Urtica = (8*0 + 1*4 + 0*10)/(8 + 1 + 0) = 0.444
3. Sample scores:
x.WA1Sample 1 = (0*10 + 5*2.25 + 6*1 + 8*0.444)/(0 + 5 + 6 + 8) = 1.095
x.WA1Sample 2 = (0*10 + 2*2.25 + 2*1 + 1*0.444)/(0 + 2 + 2 + 1) = 1.389
x.WA1Sample 3 = (3*10 + 1*2.25 + 0*1 + 0*0.444)/(3 + 1 + 0 + 0) = 8.063
4. Rescale to the original range (0-10 here)
5. Continue by step 2 until the values converge.
Important property of this algorithm is that it actually does not depends on the arbitrary choice of initial scores, as can be seen on Fig. 1 (in the example table above, the initial scores were preselected in the way that the convergence is faster; if they are random values, the convergence will still occur but will happen later).
CA algorithm has, however, two unpleasent properties: it produces more or less pronounced arch artefact, and it compresses the samples at the 1st-axis ends relative to the middle (see example on Fig. 2).
A detrended version of correspondence analysis (DCA) attempts to remove the arch effect from ordination (Fig. 3. The method was (and still is) very popular, especially among vegetation ecologists, because it gives often rather meaningful distribution of samples in ordination diagrams. Additionally, it has one interesting property: the length of the first axis (in SD units) refers to the heterogeneity or homogeneity of the dataset, and can be used to decide whether data should be analysed by linear (axis shorter than 3 SD) or unimodal (axis longer than 4 SD) ordination methods (details here). However, detrending (by segments) resembles using a hammer on data - arch is hammered by cutting the first axis into segments and moving the sample points up and down along the second axis (you may see rescaling from CA to DCA here). For this and other reasons, the method is criticized and not recommended for use by some of the researchers (see e.g. Legendre & Legendre 1998, Borcard et al. 2011, or Jari Oksanen), while defended by others (e.g. ter Braak & Šmilauer 2015).
In CA, both objects and species are represented by points in the ordination diagram (compare to PCA, where species/descriptors are vectors and sites are points). Similarly to PCA, two types of scaling are available (Borcard et al. 2011):