Theory, R functions & Examples
Section: Ordination analysis
Let's use data from river valley again and calculate tb-PCA on them, i.e. PCA on Hellinger standardised raw species composition data. Note that because the original data represent percentage cover, we will log-transform them first, before applying the Hellinger standardisation:
vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1) vltava.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-env.txt') library (vegan) PCA <- rda (X = decostand (log1p (vltava.spe), method = 'hellinger'))
envfit calculates multiple regression of environmental variable with ordination axes (environmental variable is used as dependent and selected ordination axes as explanatory variables). Significance is tested by permutation test. Vectors (for continual variables) and centroids (for categorical variables) can be projected onto ordination diagram using
ef <- envfit (PCA ~ ASPSSW + SOILDPT + pH, data = vltava.env, perm = 999) ef
***VECTORS PC1 PC2 r2 Pr(>r) ASPSSW -0.87822 0.47826 0.4769 0.001 *** SOILDPT 0.94893 0.31549 0.2891 0.001 *** pH 0.38093 0.92460 0.3885 0.001 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Permutation: free Number of permutations: 999
The variable which is the most strongly related to the first two ordination axes (judged by the value of r2 in the
envfit table) is the folded aspect (ASPSSW), followed by soil pH and soil depth. Soil pH is strongly related to the second ordination axis, while soil depth and folded aspect to the first one (according to the coefficients in PC1 and PC2 columns).
The result table contains the following columns:
envfit executed three permutation tests, all three with highly significant results. When the number of tested variables increases, correction for multiple testing is desirable. In the case above, we may extract the significance values from
ef object and apply Bonferroni correction using the function
ef.adj <- ef pvals.adj <- p.adjust (ef$vectors$pvals, method = 'bonferroni') ef.adj$vectors$pvals <- pvals.adj ef.adj
In the code above, I first copied the original
ef object created by function
envfit into new,
ef.adj object. Then I extracted the significance values from
ef$vectors$pvals) and applied function
p.adjust with Bonferroni's method of adjustment. By these adjusted values I replaced the original P-values in the copy of original results
ef.adj. Newly printed results show adjusted P-values:
***VECTORS PC1 PC2 r2 Pr(>r) ASPSSW -0.87822 0.47826 0.4769 0.003 ** SOILDPT 0.94893 0.31549 0.2891 0.003 ** pH 0.38093 0.92460 0.3885 0.003 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Permutation: free Number of permutations: 999
Alternatively, you can use custom-build function
p.adjust.envfit (definition here), giving exactly the same results as above.
First, source the definition of this custom made function from this website:
... and just use it (the results should be the adjusted results above):
ef.adj <- p.adjust.envfit (ef) ef.adj
To draw the variables onto the ordination diagram, use the function
plot on the object returned by the function
ef.adj above, both being of the class
.envfit does not need to be typed) is a low-level graphical function, which adds the vectors (for quantitative variables) or centroids (for qualitative variables) of the environmental variables into already existing ordination diagram:
ordiplot (PCA, display = 'sites') plot (ef)
In the case of PCA, the coordinates of the arrow heads on the ordination axes represent the values of Pearson's correlation coefficient between the variable and given ordination axis (in the case of other unconstrained ordination - CA, DCA or NMDS - the situation is not that straightforward, since the multiple regression is weighted). To confirm this, let's calculate the correlation of each environmental variable with each ordination axis.
To get scores of samples (or species) on ordination axes, we need function
scores (applicable on object created by
vegan ordination functions:
scores.pca <- scores (PCA, display = 'sites', choices = 1:2) scores.pca
PC1 PC2 1 -0.36770382 0.178399044 2 -0.46395458 0.209974936 3 -0.26899925 0.127517062 4 -0.32793000 0.254319805 5 0.03691218 0.536838689 6 0.22455684 0.622463000 ...
display specifies that we want scores of
species, and argument
choices points to ordination axes we want to extract - the first two).
Correlation between each environmental variable and each axis scores can be calculated as follows:
PC1 PC2 ASPSSW -0.6065085 0.3302958 SOILDPT 0.5101798 0.1696164 pH 0.2374247 0.5762849
We get exactly the same values if we take results of the function
envfit and multiply each value in the column PC1 and PC22) by it's appropriate sqrt (r2):
arrow_heads <- ef$vectors$arrows # extracts matrix of coordinates of arrow heads from ef r2 <- ef$vectors$r # extracts vector of r2 for each env. variable arrow_heads * sqrt (r2)
PC1 PC2 ASPSSW -0.6065085 0.3302958 SOILDPT 0.5101798 0.1696164 pH 0.2374247 0.5762849 attr(,"decostand")  "normalize"
Note that these values are identical to Pearson's correlation coefficients in the table above. But this is true only for PCA or tb-PCA, in which ordination scores are standardized to unit variance. In case of other unconstrained ordinations (CA, DCA), standard deviations of individual axes differ, and regression coefficients in multiple regression cannot be directly converted into Pearson's correlation coefficients. Additionally, unimodal methods (DCA, CA) are using weighted multiple regression, with weights of samples reflecting their importance in analysis3).
This can be done using function
vegan. It draws the surface of fitted environmental variable into ordination diagram using GAM models. This is fancy option in case that you don't expect linear relationship between ordination axis and environmental variable, or you don't want to relate the environmental variables directly to ordination axes (the latter may seem as relevant argument in case of NMDS, which is basically a method free of ordination axes4)).
To add e.g. pH into PCA ordination diagram using data from Vltava as in the example above, use the function
ordisurf. This function is both high- and low-level graphical function, meaning that it can either created the whole new ordination diagram and add the surface in it, or it can only add the surface into already existing ordination diagram (use argument
add = TRUE):
ordisurf (PCA, vltava.env[, 'pH'], main = 'pH + SOILDPT') ordisurf (PCA, vltava.env[, 'SOILDPT'], add = T, col = 'green') legend ('topleft', col = c('red', 'green'), lty = 1, legend = c('pH', 'SOILDPT'))
In this script, the first
ordisurf draws new ordination diagram and adds pH values in it (including the title using argument
main). The second application of
ordisurf adds the surface of soil depth variable, using different color (argument
col). We may also add the
legend to ease the interpretation of the surface colors.
envfitcalls this 'direction cosines which are the coordinates of the heads of unit length vectors', see
envfit, use the function
str (ef)in this case. The object is a list of lists, containing all relevant information. The printed version which is printed into command line when we type
efis the product of the function
print.envfit, which is automatically called when the
envfitobject is evaluated.