Introduction
Theory, R functions & Examples
Section: Ordination analysis
This is direct continuation of Example 1 in tb-RDA section. We used Vltava river valley dataset to calculate tb-RDA (on log-transformed and then Hellinger transformed species composition data) with two measured variables (pH and SOILDPT) as explanatory. They explained 8.9% of the variance, and we may ask: if variables explain this amount of variance (which may seem rather low, less than 10%), is it enough to report this result?
To decide, we need to do two more things. One is to test whether the analysis is significant, meaning that this amount of variance is high enough compared to the variance generated (in average) by two random variables (this is the topic for Example 2 below). The other thing is to compare the value of explained variation (R2 = 8.9%) with the variation which would be explained by two best variables, i.e. variables which represent the strongest gradient along which the species composition is changing. As you can see in the theoretical part, we can create such variable for our dataset by applying first unconstrained ordination on our data and extract the first few unconstrained ordination axes (i.e. sample scores along them). These axes represent the main directions along which species composition changes the most rapidly, which is exactly what we need. Then we can use these axes as explanatory in the constrained version of the same ordination (here tb-RDA) to see how much variance they can explain. The number of axes I should take depends on the number of real variables I want to compare the variance with (in this case two).
Let's get data in and calculate tb-RDA on two explanatory variables (this is essentially repeating steps in Example 1 from tb-RDA:
vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1) vltava.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-env.txt') spe <- vltava.spe env <- vltava.env[, c('pH', 'SOILDPT')] library (vegan) spe.log <- log1p (spe) # species data are in percentage scale which is strongly rightskewed, better to transform them spe.hell <- decostand (spe.log, 'hell') # we are planning to do tb-RDA, this is Hellinger pre-transformation tbRDA <- rda (spe.hell ~ pH + SOILDPT, data = env) # calculate tb-RDA with two explanatory variables
I can use the function RsquareAd
from vegan
to extract the explained variance directly from the ordination object:
R2.obs <- RsquareAdj (tbRDA)$r.squared R2.obs
[1] 0.08868879
Result is 0.089 = 8.9%. Note that the function RsquareAdj
returns a list with two components, r.squared
and adj.r.squared
, the first referring to the R2 we are interested now, and the second to adjusted R2, i.e. R2 corrected for the number of samples and number of explanatory variables (we will use this later).
Now we need to calculate the unconstrained version of tb-RDA on the same data and extract the first two ordination axes. The unconstrained version of tb-RDA is tb-PCA (i.e. PCA calculated on pre-transform data, where the pre-transformation should be the same in both analyses):
tbPCA <- rda (spe.hell) tbPCA
Call: rda(X = spe.hell) Inertia Rank Total 0.7048 Unconstrained 0.7048 96 Inertia is variance Eigenvalues for unconstrained axes: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 0.09197 0.06075 0.04684 0.03537 0.02650 0.02361 0.02093 0.02035 (Showed only 8 of all 96 unconstrained eigenvalues)
For comparison, let's also include the summary of tb-RDA method:
tbRDA
Call: rda(formula = spe.hell ~ pH + SOILDPT, data = env) Inertia Proportion Rank Total 0.70476 1.00000 Constrained 0.06250 0.08869 2 Unconstrained 0.64226 0.91131 94 Inertia is variance Eigenvalues for constrained axes: RDA1 RDA2 0.04023 0.02227 Eigenvalues for unconstrained axes: PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 0.07321 0.04857 0.04074 0.03144 0.02604 0.02152 0.01917 0.01715 (Showed only 8 of all 94 unconstrained eigenvalues)
You can see that total inertia (variance of the whole dataset) is identical in both analyses (0.70476). This is the variance we aim to explain.
Extract the two PCA axes from tbPCA
object and use them as explanatory variables in tbRDA on the same data to see how much variance they explain:
PCA12 <- scores (tbPCA, display = 'sites', choices = 1:2) tbRDA_PCA12 <- rda (spe.hell ~ PCA12) RsquareAdj (tbRDA_PCA12)$r.squared
[1] 0.2167075
Two first tb-PCA axes, if used as explanatory in the tb-RDA on the same dataset, explain 21.7% of the variance. Note that this is the same number we would get by simply checking the amount of variance represented by the first two ordination axes in tb-PCA in the summary above: eigenvalue of PCA1 = 0.09197, eigenvalue of PCA1 = 0.06075, total variance (inertia) = 0.70476, recalculated into percentage: (0.09197+0.06075)/0.70476 = 21.7%. In fact, we don't need to really to the whole procedure (extract tb-PCA axes and use them as explanatory in tb-RDA), we can simply check the variance of n axes in unconstrained ordination (n = number of explanatory variables) and compare it with real variance explained by environmental variables.
The comparison here is: 8.9% explained by real variables (pH and SOILDPT) vs 21.7% which would be explained by two best, not correlated variables (if we had them). We see that measured variables explain something over 40% of variation they could (8.9/21.7 = 0.41), which is not bad. Remember important difference: PCA axes are (from definition) not correlated, while our real variables often will be (as in this case: cor.test (~ pH + SOILDPT, data = env)
shows that Pearson's correlation coefficient between pH and SOILDPT is r = 0.273, and this correlation is significant at P = 0.0069).
The next step is to test whether the variation explained by our variables is significant - this is a topic for Example 2 below.
This example directly follows the Example 1 in RDA & tb-RDA section and Example 1 in Explained variance section, consider checking them first. We used Vltava river valley dataset, and two field-measured environmental variables, soil pH and soil depth (pH
and SOILDPT
), to explain variance in species composition. We found that they can explain 8.9% of the overall variance; when compared to the variance which can be maximally explained by two explanatory variables (21.7% explained by two tb-PCA axes, see here) this sounds not bad (it is more than 40% of variance we can maximally explain in this dataset with two variables). But is the result significant? By significant, I mean: is the variance considerably higher than would be the average variance explained by two random variables not related to species composition? This is the task for the Monte Carlo permutation test (check the Theory part to see how it works).
First, get the data and calculate tb-RDA on them (this is esentially repeating beginning of Example 1 above:
vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1) vltava.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-env.txt') spe <- vltava.spe env <- vltava.env[, c('pH', 'SOILDPT')] library (vegan) spe.log <- log1p (spe) spe.hell <- decostand (spe.log, 'hell') tbRDA <- rda (spe.hell ~ pH + SOILDPT, data = env) R2.obs <- RsquareAdj (tbRDA)$r.squared R2.obs
[1] 0.08868879
In the next step, calculate variance explained by randomized env. variables
env.rand <- env[sample (1:97),] # the function "sample" will reshuffle the rows with environmental variabels tbRDA.rand <- rda (spe.hell ~ pH + SOILDPT, data = env.rand) RsquareAdj (tbRDA.rand)$r.squared
This value represents the variance explained by two random explanatory variables. (Note that we randomize the rows of the environmental matrix, and not each variable independently; in this way, if environmental variables are correlated with each other, this correlation will be preserved.) My result was 0.01854349, but this value will change in each run. We need to do enough repetitions (permutations) to get an idea about the distribution of this values (null model). We can use the “for” loop for it, or, as in this case, function “replicate”, with two arguments: n
= number of replicates, and expr
= expression to be replicated (if more lines of script are involved, this expression needs to be enclosed in curly brackets {}):
n.perm <- 99 # set the number of permutations R2.rand <- replicate (n = n.perm, expr = { env.rand <- env[sample (1:97),] tbRDA.rand <- rda (spe.hell ~ pH + SOILDPT, data = env.rand) RsquareAdj (tbRDA.rand)$r.squared })
The vector R2.rand
contains 99 values of variance explained by random variables. In the next step, we will merge them with the observed R2 (R2.obs
), since this is considered to be also the part of null distribution (and this is also the reason why the numbers of permutation are usually ending with 9):
R2 <- c (R2.rand, R2.obs)
Draw the histogram of values and highlight the observed R2 there:
hist (R2, nclass = 100) # ; argument "nclass" separates the bars into 100 categories (compare with hist without this argument) abline (v = R2.obs, col = 'red') # red line to indicate where in the histogram is the observed value
Already now we can see that our observed value (0.089, red vertical line) is far higher than all values generated by randomized env. variables (range (R2.rand)
shows that these values are between 0.014 and 0.033, but again, this range will change in another run of the analysis, and will likely increase if the number of permutations increases, since there will be a higher chance to get values more different from the mean).
To calculate the significance, we need to know what is the number of cases in which the variance explained by random explanatory variables was higher or equal to the observed one (explained by real variables). Remember that R.obs is the part of our null distribution (we added it there), so we always obtain at least one value in such comparison (the R.obs itself).
P <- sum (R2 >= R2.obs)/(n.perm + 1) # 0.01
You can see that our observed R2 (R2.obs) is far higher than any R2 generated by a random variable, so the calculation of P-value contains only one value equal to or higher than observed R2 itself. The resulting P-value is 0.01, which is the lowest P-value we can get with the Monte Carlo permutation test based on 99 permutations:
1/(99 + 1) # 0.01
If we are happy with rejecting the null hypothesis at the alpha level of 5% (P < 0.05), then this could be sufficient. But if we need to get a lower P-value (e.g. because we are going to correct them for multiple testing, as in the case of forward selection later), we need to increase the number of permutations. As we saw in the theoretical part, the relationship between the number of permutations and the minimal P-value we can obtain is Pmin = 1/(number_of_permutations + 1). If you hope to be able to reject the null hypothesis at P < 0.001, you need at least 1/0.001-1 = 999 permutations (in that case the lowest P-value will be 0.001) or better more (e.g. 1499 permutations, with the lowest P-value 1/(1499+1) = 0.00067).
In vegan
, the test of the significance for constrained ordination is done by function anova
(this may be a bit confusing name, since it is not really calculating ANOVA)
anova (tbRDA, permutations = 99)
Permutation test for rda under reduced model Permutation: free Number of permutations: 99 Model: rda(formula = spe.hell ~ pH + SOILDPT, data = env) Df Variance F Pr(>F) Model 2 0.06250 4.574 0.01 ** Residual 94 0.64226 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The P-value is the same (P = 0.01) as in our manual Monte-Carlo permutation test above. Still, 99 permutations is really low number; consider using at least the default 999 permutations to get P-values close to 0.001.
In this example, as the test statistic in the Monte Carlo permutation test we used explained variation (R2). This is OK if we do not have any covariables in the analysis; a more general approach (the one which is used in anova
function of vegan) is to use so called pseudo-F value (see Theory part for more details).
The dataset Carpathian spring fen meadows consists of three data matrices - species matrix with bryophytes, species matrix with vascular plants and environmental matrix with chemical variables and slope. We ignore bryophytes here, and focus on relationship between vascular plants (vasc
) and chemical variables (chem
).
# Carpathian wetlands - import data vasc <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vasc_plants.txt', row.names = 1) chem <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/chemistry.txt', row.names = 1)
For the purpose of tb-RDA, we first pre-transform the species data using Hellinger's transformation (tb-RDA), and remove slope
from the matrix of evironmental variabels (slope of the site is not a chemical variable):
library (vegan) # if you haven't use it up to now vasc.hell <- decostand (vasc, 'hell') # pre-transform species data by Hellinger transformation chem <- chem[, -15] # remove slope rda.vasc <- rda (vasc.hell ~ ., chem)
To see the output, we can either directly print the rda.vasc
object, or apply the function summary
on it to get more extended information (it prints also all the scores, which may get quite long, so wrap it into header
function to shorten those tables into only first six rows):
head (summary (rda.vasc))
Call: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Partitioning of variance: Inertia Proportion Total 0.5674 1.0000 Constrained 0.2061 0.3632 Unconstrained 0.3613 0.6368 Eigenvalues, and their contribution to the variance Importance of components: RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 RDA9 RDA10 RDA11 RDA12 RDA13 RDA14 PC1 PC2 PC3 Eigenvalue 0.0978 0.02165 0.01501 0.01077 0.01060 0.009272 0.008199 0.00615 0.005908 0.005756 0.004406 0.004055 0.003383 0.003123 0.03260 0.03204 0.02084 Proportion Explained 0.1724 0.03815 0.02645 0.01899 0.01868 0.016340 0.014450 0.01084 0.010410 0.010140 0.007770 0.007150 0.005960 0.005500 0.05746 0.05647 0.03672 Cumulative Proportion 0.1724 0.21051 0.23696 0.25595 0.27463 0.290970 0.305420 0.31626 0.326680 0.336820 0.344590 0.351730 0.357700 0.363200 0.42066 0.47713 0.51386 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20 Eigenvalue 0.01711 0.01638 0.01426 0.01370 0.01295 0.01166 0.01106 0.01057 0.009716 0.008781 0.008481 0.008224 0.007753 0.00759 0.007313 0.007163 0.006564 Proportion Explained 0.03015 0.02887 0.02513 0.02415 0.02282 0.02055 0.01950 0.01862 0.017120 0.015480 0.014950 0.014490 0.013660 0.01338 0.012890 0.012620 0.011570 Cumulative Proportion 0.54401 0.57287 0.59800 0.62215 0.64497 0.66552 0.68502 0.70364 0.720760 0.736240 0.751180 0.765680 0.779340 0.79272 0.805610 0.818230 0.829800 PC21 PC22 PC23 PC24 PC25 PC26 PC27 PC28 PC29 PC30 PC31 PC32 PC33 PC34 PC35 PC36 Eigenvalue 0.006371 0.005808 0.005689 0.00547 0.005307 0.004855 0.00475 0.004443 0.00403 0.003576 0.003554 0.003397 0.003139 0.002931 0.002827 0.00276 Proportion Explained 0.011230 0.010240 0.010030 0.00964 0.009350 0.008560 0.00837 0.007830 0.00710 0.006300 0.006260 0.005990 0.005530 0.005160 0.004980 0.00486 Cumulative Proportion 0.841030 0.851260 0.861290 0.87093 0.880280 0.888840 0.89721 0.905040 0.91215 0.918450 0.924710 0.930700 0.936230 0.941400 0.946380 0.95124 PC37 PC38 PC39 PC40 PC41 PC42 PC43 PC44 PC45 PC46 PC47 PC48 PC49 PC50 PC51 PC52 Eigenvalue 0.002552 0.002361 0.002284 0.002046 0.001903 0.001807 0.001691 0.00157 0.001559 0.001434 0.001328 0.001304 0.00110 0.00103 0.0009236 0.0008693 Proportion Explained 0.004500 0.004160 0.004030 0.003610 0.003350 0.003180 0.002980 0.00277 0.002750 0.002530 0.002340 0.002300 0.00194 0.00182 0.0016300 0.0015300 Cumulative Proportion 0.955740 0.959900 0.963930 0.967530 0.970890 0.974070 0.977050 0.97982 0.982570 0.985090 0.987430 0.989730 0.99167 0.99349 0.9951100 0.9966500 PC53 PC54 PC55 Eigenvalue 0.0007434 0.0006739 0.0004857 Proportion Explained 0.0013100 0.0011900 0.0008600 Cumulative Proportion 0.9979600 0.9991400 1.0000000 Accumulated constrained eigenvalues Importance of components: RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 RDA9 RDA10 RDA11 RDA12 RDA13 RDA14 Eigenvalue 0.0978 0.02165 0.01501 0.01077 0.01060 0.009272 0.008199 0.00615 0.005908 0.005756 0.004406 0.004055 0.003383 0.003123 Proportion Explained 0.4746 0.10504 0.07282 0.05228 0.05144 0.044990 0.039790 0.02984 0.028670 0.027930 0.021380 0.019680 0.016420 0.015150 Cumulative Proportion 0.4746 0.57961 0.65243 0.70471 0.75615 0.801140 0.840920 0.87077 0.899440 0.927370 0.948750 0.968430 0.984850 1.000000 Scaling 2 for species and site scores * Species are scaled proportional to eigenvalues * Sites are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 2.501418 Species scores RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 Acer_sp. -0.0154645 -0.0007631 -0.0075229 -0.008245 0.001732 0.007874 AchiMill 0.0003371 0.0002172 0.0099623 -0.023240 -0.010564 0.017989 AgroCani 0.2989206 -0.0534899 -0.0003443 0.019328 -0.002615 -0.012205 AgroCapi 0.0439209 0.0196134 0.0014274 -0.007778 -0.018888 0.021651 AgroStol -0.1074882 -0.0278701 0.0070877 -0.050120 -0.016408 -0.003469 AjugRept -0.0607449 0.0359534 -0.0898934 -0.005017 -0.001849 -0.002417 .... Site scores (weighted sums of species scores) RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 S1 -0.13618 0.34012 -0.648298 0.3345 -0.08732 0.29732 S2 -0.02351 0.50122 0.483985 -0.4592 -0.34310 0.01176 S3 -0.36203 -0.29731 0.187990 -0.2157 0.37425 0.24345 S4 -0.45847 -0.75706 0.028869 0.1514 -0.12375 0.24845 S5 -0.34538 0.01186 -0.543172 0.0689 -0.09529 -0.01918 S6 -0.38319 -0.12405 0.008091 0.3563 0.37374 0.10958 .... Site constraints (linear combinations of constraining variables) RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 S1 -0.29599 0.51952 -0.676450 0.3432 -0.11594 0.041619 S2 0.13970 0.17128 0.057787 -0.2379 -0.43640 0.078915 S3 -0.37947 -0.26887 -0.021117 0.1957 0.09659 0.112267 S4 -0.42693 -0.41047 0.057856 0.2340 0.15766 0.017807 S5 -0.32293 0.13820 -0.465399 0.3952 0.13787 -0.020922 S6 -0.09608 -0.05028 0.002395 0.1068 0.12036 0.002839 .... Biplot scores for constraining variables RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 Ca -0.88402 -0.2568958 -0.10774 0.09646 0.02262 -0.022930 Mg -0.81588 -0.0021299 -0.22725 -0.33887 0.08347 -0.034787 Fe 0.12061 0.4707166 0.03481 0.12330 0.24032 -0.144807 K -0.32015 0.3987824 -0.07212 0.02156 -0.19731 -0.273450 Na -0.51577 0.4008411 0.03295 -0.06688 -0.13851 -0.227552 Si -0.29676 0.4289340 -0.67314 0.06990 -0.21501 -0.089828 SO4 -0.34191 -0.3049719 -0.25253 -0.19042 -0.24876 0.064491 PO4 0.21537 -0.0008569 0.21286 0.11162 0.12566 -0.044834 NO3 0.15926 -0.4121772 -0.09585 0.23946 -0.48853 0.245962 NH3 0.44013 -0.2471674 -0.22154 -0.02665 0.38679 0.488924 Cl -0.03537 -0.2772850 -0.08676 0.18063 0.31496 -0.002797 Corg 0.69131 -0.0833783 0.05578 0.28857 0.25577 -0.260926 pH -0.67211 0.0469155 0.35972 0.16870 -0.03229 0.552316 conduct -0.82805 0.0162761 0.06142 0.17723 0.14450 0.280628
There are 14 environmental variables, which means that we got 14 constrained ordination axes (named RDA1 to RDA14). Unconstrained axes are named as PC axes. The table Proportion Explained
with the partitioning of the variance between constrained axes (i.e. environmental variables) and unconstrained axes (i.e. variance not explained by environmental factors) shows that the proportion of the total variance, which is explained by all environmental factors, is 36.3% (0.3632
at the row Constrained
and column Proportion
). The first constrained axis explains 17.2%, while the second just 3.8% (Importance of components
> Proportion Explained
, columns RDA1
and RDA2
); it seems that those 14 variables represent mostly a single strong environmental gradient represented by the first constrained axis.
What we calculated here is a global model - it includes all available environmental variables. The next question is whether the global model is significant:
anova (rda.vasc)
Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) Model 14 0.20608 2.2407 0.001 *** Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Indeed it is significant (P = 0.001, which is the lowest P-value we can obtain with the default number of permutations, 999). Alternatively, we may be interested in the significance of only the first constrained axis. This is achieved by adding the argument first = TRUE
:
anova (rda.vasc, first = TRUE)
Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) RDA1 1 0.09780 14.887 0.001 *** Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Or, we may calculate the significance of each constrained axis independently. This can be done by adding argument by = “axis”
(by default, the argument by
is set to NULL
, see the helpfile ?anova.cca
). The calculation takes rather long (there are 14 constrained axes, each tested separately); you may speed it up using parallel calculation if your computer has more than one core (use the argument parallel
with the number of available cores, four in the case of my computer). Also, we will directly store the results into an object (test_axis
) so as we can call it later for further processing:
test_axis <- anova (rda.vasc, by = 'axis', parallel = 4)
Permutation test for rda under reduced model Forward tests for axes Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) RDA1 1 0.09780 14.8869 0.001 *** RDA2 1 0.02165 3.2950 0.008 ** RDA3 1 0.01501 2.2844 0.213 RDA4 1 0.01077 1.6399 0.930 RDA5 1 0.01060 1.6137 0.913 RDA6 1 0.00927 1.4113 0.975 RDA7 1 0.00820 1.2481 0.997 RDA8 1 0.00615 0.9362 1.000 RDA9 1 0.00591 0.8993 1.000 RDA10 1 0.00576 0.8761 1.000 RDA11 1 0.00441 0.6707 1.000 RDA12 1 0.00406 0.6173 1.000 RDA13 1 0.00338 0.5150 1.000 RDA14 1 0.00312 0.4754 1.000 Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Our impression from above was right - here, only first two axes are significant, second not so strongly as the first one. In fact, the test above is running 14 independent tests of significance, the number high enough to have a problem with multiple testing issue (more tests = more significant results). To correct for multiple testing, consider applying some correction (e.g. Bonferroni, or, as in this case, Holm's correction). Significances are stored in the component Pr(>F)
of test_axis
object (a list), and the function conducting correction is p.adjust
:
test_axis.adj <- test_axis test_axis.adj$`Pr(>F)` <- p.adjust (test_axis$`Pr(>F)`, method = 'holm') test_axis.adj
Permutation test for rda under reduced model Forward tests for axes Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) RDA1 1 0.09780 14.8869 0.014 * RDA2 1 0.02165 3.2950 0.052 . RDA3 1 0.01501 2.2844 1.000 RDA4 1 0.01077 1.6399 1.000 RDA5 1 0.01060 1.6137 1.000 RDA6 1 0.00927 1.4113 1.000 RDA7 1 0.00820 1.2481 1.000 RDA8 1 0.00615 0.9362 1.000 RDA9 1 0.00591 0.8993 1.000 RDA10 1 0.00576 0.8761 1.000 RDA11 1 0.00441 0.6707 1.000 RDA12 1 0.00406 0.6173 1.000 RDA13 1 0.00338 0.5150 1.000 RDA14 1 0.00312 0.4754 1.000 Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
After adjusting the P-values for multiple testing issue, the second axis became only marginally significant (note that your P-values may slightly differ, since this is permutation test, each time running with different set of permuted environmental variables).
Other testing option is to test each term (constraining variable) separately in the sequence of adding them into the model - this is done adding the argument by = “terms”
. It sequentially adds each variable one by one in order in which they enter the model formula, and for each calculates partial explained variation and its significance with previous variables as covariables (i.e. for the first variable it is marginal variation explained by this variable and its significance, for the second variable it is partial variation explained by the second variable after removing the variation of the first variable as covariable, etc.). Since variables in the dataset are ordered without exact meaning, this option is not useful here. More meaningful is the last option, by = “margin”
, testing the variance explained by each explanatory variable with all the others used as covariables (independently from their order in the model):
test_margin <- anova (rda.vasc, by = 'margin', parallel = 4)
Permutation test for rda under reduced model Marginal effects of terms Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) Ca 1 0.01460 2.2228 0.007 ** Mg 1 0.00973 1.4816 0.084 . Fe 1 0.00729 1.1096 0.258 K 1 0.00668 1.0162 0.375 Na 1 0.00565 0.8603 0.651 Si 1 0.01237 1.8828 0.026 * SO4 1 0.00739 1.1246 0.291 PO4 1 0.00467 0.7108 0.872 NO3 1 0.00908 1.3828 0.108 NH3 1 0.01066 1.6231 0.042 * Cl 1 0.00584 0.8887 0.617 Corg 1 0.00781 1.1881 0.217 pH 1 0.00850 1.2938 0.146 conduct 1 0.00959 1.4604 0.078 . Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
If we correct for multiple testing (not done here), we will see that only calcium (Ca
) remains marginally significant, i.e. it can explain some important extra variance which other variables cannot.
And if we want to know the variance each variable can explain independently from others (i.e. the variance each variable will explain if it is the only variable in the model, without any covariables)? What I do below is a bit complex R coding exercise and I won't explain it in detail, but what I did is to make a loop (using apply
) which will take each of the 14 variables one by one, calculate tb-RDA on it using vasc.hell
data, test it's significance, return all values in one summary table, correct them for the multiple-testing issue by Holm's correction, and order the table by decreasing value of variance explained by each variable:
test_each <- t(apply (chem, 2, FUN = function (x) as.matrix (anova (rda (vasc.hell ~ x))[1:4][1,]))) test_each <- as.data.frame (test_each) names (test_each) <- c("Df", "Variance", "F", "Pr(>F)") test_each.adj <- test_each test_each.adj$`Pr(>F)` <- p.adjust (test_each$`Pr(>F)`, method = 'holm') test_each.adj[order (test_each.adj$Variance, decreasing = TRUE),]
Df Variance F Pr(>F) Ca 1 0.078859882 10.9763368 0.014 conduct 1 0.069408621 9.4774885 0.014 Mg 1 0.068014761 9.2612406 0.014 Corg 1 0.050866878 6.6963607 0.014 pH 1 0.049697951 6.5277054 0.014 Na 1 0.032285586 4.1026474 0.014 NH3 1 0.026047765 3.2718455 0.016 Si 1 0.021008459 2.6145234 0.016 SO4 1 0.018808570 2.3313586 0.036 K 1 0.017124964 2.1161781 0.040 NO3 1 0.012759735 1.5643455 0.236 Fe 1 0.010942632 1.3371875 0.384 PO4 1 0.009863593 1.2029965 0.384 Cl 1 0.006910560 0.8383942 0.736
Some variables (at the bottom of the table) do not explain significant part of variance, some do (with calcium, Ca
, explaining the most). What I did is in fact the first step of forward selection, which aims to select the subset of variables that explain the most variance in the model (in the first step the variables are sorted by explained variance, the first of them is tested, and if significant, it is included as the first selected variables into the model). This will be the topic for Example 1 in Variable selection section.