# Analysis of community ecology data in R

David Zelený

### Site Tools

en:monte_carlo_examples

Section: Ordination analysis

## Monte Carlo permutation test in constrained ordination

### Example 1: Is the variance explained by environmental variables significant?

This example directly follows the Example 1 in RDA & tb-RDA section and Example 1 in Explained variance section, consider checking them first. We used Vltava river valley dataset, and two field-measured environmental variables, soil pH and soil depth (pH and SOILDPT), to explain variance in species composition. We found that they can explain 8.9% of the overall variance; when compared to the variance which can be maximally explained by two explanatory variables (21.7% explained by two tb-PCA axes, see here) this sounds not bad (it is more than 40% of variance we can maximally explain in this dataset with two variables). But is the result significant? By significant, I mean: is the variance considerably higher than would be the average variance explained by two random variables not related to species composition? This is the task for the Monte Carlo permutation test (check the Theory part to see how it works).

First, get the data and calculate tb-RDA on them (this is esentially repeating beginning of Example 1 above:

vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1)
spe <- vltava.spe
env <- vltava.env[, c('pH', 'SOILDPT')]
library (vegan)
spe.log <- log1p (spe)
spe.hell <- decostand (spe.log, 'hell')
tbRDA <- rda (spe.hell ~ pH + SOILDPT, data = env)
R2.obs <- RsquareAdj (tbRDA)$r.squared R2.obs [1] 0.08868879 In the next step, calculate variance explained by randomized env. variables env.rand <- env[sample (1:97),] # the function "sample" will reshuffle the rows with environmental variabels tbRDA.rand <- rda (spe.hell ~ pH + SOILDPT, data = env.rand) RsquareAdj (tbRDA.rand)$r.squared

This value represents the variance explained by two random explanatory variables. (Note that we randomize the rows of the environmental matrix, and not each variable independently; in this way, if environmental variables are correlated with each other, this correlation will be preserved.) My result was 0.01854349, but this value will change in each run. We need to do enough repetitions (permutations) to get an idea about the distribution of this values (null model). We can use the “for” loop for it, or, as in this case, function “replicate”, with two arguments: n = number of replicates, and expr = expression to be replicated (if more lines of script are involved, this expression needs to be enclosed in curly brackets {}):

n.perm <- 99  # set the number of permutations
R2.rand <- replicate (n = n.perm, expr = {
env.rand <- env[sample (1:97),]
tbRDA.rand <- rda (spe.hell ~ pH + SOILDPT, data = env.rand)
RsquareAdj (tbRDA.rand)$r.squared }) The vector R2.rand contains 99 values of variance explained by random variables. In the next step, we will merge them with the observed R2 (R2.obs), since this is considered to be also the part of null distribution (and this is also the reason why the numbers of permutation are usually ending with 9): R2 <- c (R2.rand, R2.obs) Draw the histogram of values and highlight the observed R2 there: hist (R2, nclass = 100) # ; argument "nclass" separates the bars into 100 categories (compare with hist without this argument) abline (v = R2.obs, col = 'red') # red line to indicate where in the histogram is the observed value Already now we can see that our observed value (0.089, red vertical line) is far higher than all values generated by randomized env. variables (range (R2.rand) shows that these values are between 0.014 and 0.033, but again, this range will change in another run of the analysis, and will likely increase if the number of permutations increases, since there will be a higher chance to get values more different from the mean). To calculate the significance, we need to know what is the number of cases in which the variance explained by random explanatory variables was higher or equal to the observed one (explained by real variables). Remember that R.obs is the part of our null distribution (we added it there), so we always obtain at least one value in such comparison (the R.obs itself). P <- sum (R2 >= R2.obs)/(n.perm + 1) # 0.01 You can see that our observed R2 (R2.obs) is far higher than any R2 generated by a random variable, so the calculation of P-value contains only one value equal to or higher than observed R2 itself. The resulting P-value is 0.01, which is the lowest P-value we can get with the Monte Carlo permutation test based on 99 permutations: 1/(99 + 1) # 0.01 If we are happy with rejecting the null hypothesis at the alpha level of 5% (P < 0.05), then this could be sufficient. But if we need to get a lower P-value (e.g. because we are going to correct them for multiple testing, as in the case of forward selection later), we need to increase the number of permutations. As we saw in the theoretical part, the relationship between the number of permutations and the minimal P-value we can obtain is Pmin = 1/(number_of_permutations + 1). If you hope to be able to reject the null hypothesis at P < 0.001, you need at least 1/0.001-1 = 999 permutations (in that case the lowest P-value will be 0.001) or better more (e.g. 1499 permutations, with the lowest P-value 1/(1499+1) = 0.00067). In vegan, the test of the significance for constrained ordination is done by function anova (this may be a bit confusing name, since it is not really calculating ANOVA) anova (tbRDA, permutations = 99) Permutation test for rda under reduced model Permutation: free Number of permutations: 99 Model: rda(formula = spe.hell ~ pH + SOILDPT, data = env) Df Variance F Pr(>F) Model 2 0.06250 4.574 0.01 ** Residual 94 0.64226 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The P-value is the same (P = 0.01) as in our manual Monte-Carlo permutation test above. Still, 99 permutations is really low number; consider using at least the default 999 permutations to get P-values close to 0.001. In this example, as the test statistic in the Monte Carlo permutation test we used explained variation (R2). This is OK if we do not have any covariables in the analysis; a more general approach (the one which is used in anova function of vegan) is to use so called pseudo-F value (see Theory part for more details). ### Example 2: What can be tested? The relationship between species composition of plants in Carpathian wetlands and environment (tb-RDA) The dataset Carpathian spring fen meadows consists of three data matrices - species matrix with bryophytes, species matrix with vascular plants and environmental matrix with chemical variables and slope. We ignore bryophytes here, and focus on relationship between vascular plants (vasc) and chemical variables (chem). # Carpathian wetlands - import data vasc <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vasc_plants.txt', row.names = 1) chem <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/chemistry.txt', row.names = 1)  For the purpose of tb-RDA, we first pre-transform the species data using Hellinger's transformation (tb-RDA), and remove slope from the matrix of evironmental variabels (slope of the site is not a chemical variable): library (vegan) # if you haven't use it up to now vasc.hell <- decostand (vasc, 'hell') # pre-transform species data by Hellinger transformation chem <- chem[, -15] # remove slope rda.vasc <- rda (vasc.hell ~ ., chem) To see the output, we can either directly print the rda.vasc object, or apply the function summary on it to get more extended information (it prints also all the scores, which may get quite long, so wrap it into header function to shorten those tables into only first six rows): head (summary (rda.vasc)) Call: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Partitioning of variance: Inertia Proportion Total 0.5674 1.0000 Constrained 0.2061 0.3632 Unconstrained 0.3613 0.6368 Eigenvalues, and their contribution to the variance Importance of components: RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 RDA9 RDA10 RDA11 RDA12 RDA13 RDA14 PC1 PC2 PC3 Eigenvalue 0.0978 0.02165 0.01501 0.01077 0.01060 0.009272 0.008199 0.00615 0.005908 0.005756 0.004406 0.004055 0.003383 0.003123 0.03260 0.03204 0.02084 Proportion Explained 0.1724 0.03815 0.02645 0.01899 0.01868 0.016340 0.014450 0.01084 0.010410 0.010140 0.007770 0.007150 0.005960 0.005500 0.05746 0.05647 0.03672 Cumulative Proportion 0.1724 0.21051 0.23696 0.25595 0.27463 0.290970 0.305420 0.31626 0.326680 0.336820 0.344590 0.351730 0.357700 0.363200 0.42066 0.47713 0.51386 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20 Eigenvalue 0.01711 0.01638 0.01426 0.01370 0.01295 0.01166 0.01106 0.01057 0.009716 0.008781 0.008481 0.008224 0.007753 0.00759 0.007313 0.007163 0.006564 Proportion Explained 0.03015 0.02887 0.02513 0.02415 0.02282 0.02055 0.01950 0.01862 0.017120 0.015480 0.014950 0.014490 0.013660 0.01338 0.012890 0.012620 0.011570 Cumulative Proportion 0.54401 0.57287 0.59800 0.62215 0.64497 0.66552 0.68502 0.70364 0.720760 0.736240 0.751180 0.765680 0.779340 0.79272 0.805610 0.818230 0.829800 PC21 PC22 PC23 PC24 PC25 PC26 PC27 PC28 PC29 PC30 PC31 PC32 PC33 PC34 PC35 PC36 Eigenvalue 0.006371 0.005808 0.005689 0.00547 0.005307 0.004855 0.00475 0.004443 0.00403 0.003576 0.003554 0.003397 0.003139 0.002931 0.002827 0.00276 Proportion Explained 0.011230 0.010240 0.010030 0.00964 0.009350 0.008560 0.00837 0.007830 0.00710 0.006300 0.006260 0.005990 0.005530 0.005160 0.004980 0.00486 Cumulative Proportion 0.841030 0.851260 0.861290 0.87093 0.880280 0.888840 0.89721 0.905040 0.91215 0.918450 0.924710 0.930700 0.936230 0.941400 0.946380 0.95124 PC37 PC38 PC39 PC40 PC41 PC42 PC43 PC44 PC45 PC46 PC47 PC48 PC49 PC50 PC51 PC52 Eigenvalue 0.002552 0.002361 0.002284 0.002046 0.001903 0.001807 0.001691 0.00157 0.001559 0.001434 0.001328 0.001304 0.00110 0.00103 0.0009236 0.0008693 Proportion Explained 0.004500 0.004160 0.004030 0.003610 0.003350 0.003180 0.002980 0.00277 0.002750 0.002530 0.002340 0.002300 0.00194 0.00182 0.0016300 0.0015300 Cumulative Proportion 0.955740 0.959900 0.963930 0.967530 0.970890 0.974070 0.977050 0.97982 0.982570 0.985090 0.987430 0.989730 0.99167 0.99349 0.9951100 0.9966500 PC53 PC54 PC55 Eigenvalue 0.0007434 0.0006739 0.0004857 Proportion Explained 0.0013100 0.0011900 0.0008600 Cumulative Proportion 0.9979600 0.9991400 1.0000000 Accumulated constrained eigenvalues Importance of components: RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 RDA9 RDA10 RDA11 RDA12 RDA13 RDA14 Eigenvalue 0.0978 0.02165 0.01501 0.01077 0.01060 0.009272 0.008199 0.00615 0.005908 0.005756 0.004406 0.004055 0.003383 0.003123 Proportion Explained 0.4746 0.10504 0.07282 0.05228 0.05144 0.044990 0.039790 0.02984 0.028670 0.027930 0.021380 0.019680 0.016420 0.015150 Cumulative Proportion 0.4746 0.57961 0.65243 0.70471 0.75615 0.801140 0.840920 0.87077 0.899440 0.927370 0.948750 0.968430 0.984850 1.000000 Scaling 2 for species and site scores * Species are scaled proportional to eigenvalues * Sites are unscaled: weighted dispersion equal on all dimensions * General scaling constant of scores: 2.501418 Species scores RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 Acer_sp. -0.0154645 -0.0007631 -0.0075229 -0.008245 0.001732 0.007874 AchiMill 0.0003371 0.0002172 0.0099623 -0.023240 -0.010564 0.017989 AgroCani 0.2989206 -0.0534899 -0.0003443 0.019328 -0.002615 -0.012205 AgroCapi 0.0439209 0.0196134 0.0014274 -0.007778 -0.018888 0.021651 AgroStol -0.1074882 -0.0278701 0.0070877 -0.050120 -0.016408 -0.003469 AjugRept -0.0607449 0.0359534 -0.0898934 -0.005017 -0.001849 -0.002417 .... Site scores (weighted sums of species scores) RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 S1 -0.13618 0.34012 -0.648298 0.3345 -0.08732 0.29732 S2 -0.02351 0.50122 0.483985 -0.4592 -0.34310 0.01176 S3 -0.36203 -0.29731 0.187990 -0.2157 0.37425 0.24345 S4 -0.45847 -0.75706 0.028869 0.1514 -0.12375 0.24845 S5 -0.34538 0.01186 -0.543172 0.0689 -0.09529 -0.01918 S6 -0.38319 -0.12405 0.008091 0.3563 0.37374 0.10958 .... Site constraints (linear combinations of constraining variables) RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 S1 -0.29599 0.51952 -0.676450 0.3432 -0.11594 0.041619 S2 0.13970 0.17128 0.057787 -0.2379 -0.43640 0.078915 S3 -0.37947 -0.26887 -0.021117 0.1957 0.09659 0.112267 S4 -0.42693 -0.41047 0.057856 0.2340 0.15766 0.017807 S5 -0.32293 0.13820 -0.465399 0.3952 0.13787 -0.020922 S6 -0.09608 -0.05028 0.002395 0.1068 0.12036 0.002839 .... Biplot scores for constraining variables RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 Ca -0.88402 -0.2568958 -0.10774 0.09646 0.02262 -0.022930 Mg -0.81588 -0.0021299 -0.22725 -0.33887 0.08347 -0.034787 Fe 0.12061 0.4707166 0.03481 0.12330 0.24032 -0.144807 K -0.32015 0.3987824 -0.07212 0.02156 -0.19731 -0.273450 Na -0.51577 0.4008411 0.03295 -0.06688 -0.13851 -0.227552 Si -0.29676 0.4289340 -0.67314 0.06990 -0.21501 -0.089828 SO4 -0.34191 -0.3049719 -0.25253 -0.19042 -0.24876 0.064491 PO4 0.21537 -0.0008569 0.21286 0.11162 0.12566 -0.044834 NO3 0.15926 -0.4121772 -0.09585 0.23946 -0.48853 0.245962 NH3 0.44013 -0.2471674 -0.22154 -0.02665 0.38679 0.488924 Cl -0.03537 -0.2772850 -0.08676 0.18063 0.31496 -0.002797 Corg 0.69131 -0.0833783 0.05578 0.28857 0.25577 -0.260926 pH -0.67211 0.0469155 0.35972 0.16870 -0.03229 0.552316 conduct -0.82805 0.0162761 0.06142 0.17723 0.14450 0.280628 There are 14 environmental variables, which means that we got 14 constrained ordination axes (named RDA1 to RDA14). Unconstrained axes are named as PC axes. The table Proportion Explained with the partitioning of the variance between constrained axes (i.e. environmental variables) and unconstrained axes (i.e. variance not explained by environmental factors) shows that the proportion of the total variance, which is explained by all environmental factors, is 36.3% (0.3632 at the row Constrained and column Proportion). The first constrained axis explains 17.2%, while the second just 3.8% (Importance of components > Proportion Explained, columns RDA1 and RDA2); it seems that those 14 variables represent mostly a single strong environmental gradient represented by the first constrained axis. What we calculated here is a global model - it includes all available environmental variables. The next question is whether the global model is significant: anova (rda.vasc) Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) Model 14 0.20608 2.2407 0.001 *** Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Indeed it is significant (P = 0.001, which is the lowest P-value we can obtain with the default number of permutations, 999). Alternatively, we may be interested in the significance of only the first constrained axis. This is achieved by adding the argument first = TRUE: anova (rda.vasc, first = TRUE) Permutation test for rda under reduced model Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) RDA1 1 0.09780 14.887 0.001 *** Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Or, we may calculate the significance of each constrained axis independently. This can be done by adding argument by = “axis” (by default, the argument by is set to NULL, see the helpfile ?anova.cca). The calculation takes rather long (there are 14 constrained axes, each tested separately); you may speed it up using parallel calculation if your computer has more than one core (use the argument parallel with the number of available cores, four in the case of my computer). Also, we will directly store the results into an object (test_axis) so as we can call it later for further processing: test_axis <- anova (rda.vasc, by = 'axis', parallel = 4) Permutation test for rda under reduced model Forward tests for axes Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) RDA1 1 0.09780 14.8869 0.001 *** RDA2 1 0.02165 3.2950 0.008 ** RDA3 1 0.01501 2.2844 0.213 RDA4 1 0.01077 1.6399 0.930 RDA5 1 0.01060 1.6137 0.913 RDA6 1 0.00927 1.4113 0.975 RDA7 1 0.00820 1.2481 0.997 RDA8 1 0.00615 0.9362 1.000 RDA9 1 0.00591 0.8993 1.000 RDA10 1 0.00576 0.8761 1.000 RDA11 1 0.00441 0.6707 1.000 RDA12 1 0.00406 0.6173 1.000 RDA13 1 0.00338 0.5150 1.000 RDA14 1 0.00312 0.4754 1.000 Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Our impression from above was right - here, only first two axes are significant, second not so strongly as the first one. In fact, the test above is running 14 independent tests of significance, the number high enough to have a problem with multiple testing issue (more tests = more significant results). To correct for multiple testing, consider applying some correction (e.g. Bonferroni, or, as in this case, Holm's correction). Significances are stored in the component Pr(>F) of test_axis object (a list), and the function conducting correction is p.adjust: test_axis.adj <- test_axis test_axis.adj$Pr(>F) <- p.adjust (test_axis$Pr(>F), method = 'holm') test_axis.adj Permutation test for rda under reduced model Forward tests for axes Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) RDA1 1 0.09780 14.8869 0.014 * RDA2 1 0.02165 3.2950 0.052 . RDA3 1 0.01501 2.2844 1.000 RDA4 1 0.01077 1.6399 1.000 RDA5 1 0.01060 1.6137 1.000 RDA6 1 0.00927 1.4113 1.000 RDA7 1 0.00820 1.2481 1.000 RDA8 1 0.00615 0.9362 1.000 RDA9 1 0.00591 0.8993 1.000 RDA10 1 0.00576 0.8761 1.000 RDA11 1 0.00441 0.6707 1.000 RDA12 1 0.00406 0.6173 1.000 RDA13 1 0.00338 0.5150 1.000 RDA14 1 0.00312 0.4754 1.000 Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 After adjusting the P-values for multiple testing issue, the second axis became only marginally significant (note that your P-values may slightly differ, since this is permutation test, each time running with different set of permuted environmental variables). Other testing option is to test each term (constraining variable) separately in the sequence of adding them into the model - this is done adding the argument by = “terms”. It sequentially adds each variable one by one in order in which they enter the model formula, and for each calculates partial explained variation and its significance with previous variables as covariables (i.e. for the first variable it is marginal variation explained by this variable and its significance, for the second variable it is partial variation explained by the second variable after removing the variation of the first variable as covariable, etc.). Since variables in the dataset are ordered without exact meaning, this option is not useful here. More meaningful is the last option, by = “margin”, testing the variance explained by each explanatory variable with all the others used as covariables (independently from their order in the model): test_margin <- anova (rda.vasc, by = 'margin', parallel = 4) Permutation test for rda under reduced model Marginal effects of terms Permutation: free Number of permutations: 999 Model: rda(formula = vasc.hell ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct, data = chem) Df Variance F Pr(>F) Ca 1 0.01460 2.2228 0.007 ** Mg 1 0.00973 1.4816 0.084 . Fe 1 0.00729 1.1096 0.258 K 1 0.00668 1.0162 0.375 Na 1 0.00565 0.8603 0.651 Si 1 0.01237 1.8828 0.026 * SO4 1 0.00739 1.1246 0.291 PO4 1 0.00467 0.7108 0.872 NO3 1 0.00908 1.3828 0.108 NH3 1 0.01066 1.6231 0.042 * Cl 1 0.00584 0.8887 0.617 Corg 1 0.00781 1.1881 0.217 pH 1 0.00850 1.2938 0.146 conduct 1 0.00959 1.4604 0.078 . Residual 55 0.36133 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 If we correct for multiple testing (not done here), we will see that only calcium (Ca) remains marginally significant, i.e. it can explain some important extra variance which other variables cannot. And if we want to know the variance each variable can explain independently from others (i.e. the variance each variable will explain if it is the only variable in the model, without any covariables)? What I do below is a bit complex R coding exercise and I won't explain it in detail, but what I did is to make a loop (using apply) which will take each of the 14 variables one by one, calculate tb-RDA on it using vasc.hell data, test it's significance, return all values in one summary table, correct them for the multiple-testing issue by Holm's correction, and order the table by decreasing value of variance explained by each variable: test_each <- t(apply (chem, 2, FUN = function (x) as.matrix (anova (rda (vasc.hell ~ x))[1:4][1,]))) test_each <- as.data.frame (test_each) names (test_each) <- c("Df", "Variance", "F", "Pr(>F)") test_each.adj <- test_each test_each.adj$Pr(>F) <- p.adjust (test_each$Pr(>F), method = 'holm') test_each.adj[order (test_each.adj$Variance, decreasing = TRUE),]
        Df    Variance          F Pr(>F)
Ca       1 0.078859882 10.9763368  0.014
conduct  1 0.069408621  9.4774885  0.014
Mg       1 0.068014761  9.2612406  0.014
Corg     1 0.050866878  6.6963607  0.014
pH       1 0.049697951  6.5277054  0.014
Na       1 0.032285586  4.1026474  0.014
NH3      1 0.026047765  3.2718455  0.016
Si       1 0.021008459  2.6145234  0.016
SO4      1 0.018808570  2.3313586  0.036
K        1 0.017124964  2.1161781  0.040
NO3      1 0.012759735  1.5643455  0.236
Fe       1 0.010942632  1.3371875  0.384
PO4      1 0.009863593  1.2029965  0.384
Cl       1 0.006910560  0.8383942  0.736

Some variables (at the bottom of the table) do not explain significant part of variance, some do (with calcium, Ca, explaining the most). What I did is in fact the first step of forward selection, which aims to select the subset of variables that explain the most variance in the model (in the first step the variables are sorted by explained variance, the first of them is tested, and if significant, it is included as the first selected variables into the model). This will be the topic for Example 1 in Variable selection section.

### Example 3: How to use the function "how" from "vegan" to define permutation schemas?

Examples below show how to formulate arguments of how function in the library vegan to define alternative permutation schemas for sampling designs discussed in the section Theory. All examples are using 12 cases (samples). The function shuffleSet is generating three possible sets of permutations to illustrate how it works. Helping function gl can generate factor with given number of levels and replicates (to define blocks in randomized block design, and “whole-plots” in split-plot design).

#### Completely randomized design (12 samples)

h_random <- how (within = Within (type = 'free'))
shuffleSet (n = 12, nset = 3, control = h_random)
No. of Permutations: 3
No. of Samples: 12 (Randomised)

1  2  3 4  5  6  7 8 9 10 11 12
p1  2 12  1 6 10  4  7 5 8 11  3  9
p2 12  4 10 5  9  1  3 8 6  7  2 11
p3  1  9  5 2  4 12 11 8 3  7 10  6

#### Randomized block design (3 blocks, each with 4 treatments)

blck <- gl (n = 3, k = 4)  # n = number of levels (blocks), k = number of replications (individual treatments within blocks)
h_block <- how (within = Within (type = 'free'), blocks = blck)
shuffleSet (n = 12, nset = 3, control = h_block)
No. of Permutations: 3
No. of Samples: 12 (Nested in: blocks; Randomised)
Restricted by Blocks: blck (3 blocks)

1 2 3 4 5 6 7 8  9 10 11 12
p1 4 1 2 3 7 6 8 5  9 12 11 10
p2 2 4 3 1 8 5 7 6  9 12 10 11
p3 4 2 1 3 5 6 8 7 12  9 10 11

#### Hierarchical design (split-plot) with 4 split-plots nested within 3 whole-plots

##### Shuffle only split-plots ("samples"):
strt <- gl (n = 3, k = 4)  # define strata (similar to blocks)
h_split <- how (within = Within (type = 'free'), plots = Plots (strata = strt, type = 'none'))
shuffleSet (n = 12, nset = 3, control = h_split)
No. of Permutations: 3
No. of Samples: 12 (Nested in: plots; Randomised)
Restricted by Plots: strt (3 plots)

1 2 3 4 5 6 7 8  9 10 11 12
p1 2 3 1 4 7 8 6 5  9 10 11 12
p2 3 4 1 2 6 7 5 8 12 10 11  9
p3 4 1 3 2 6 7 5 8 11 10  9 12
##### Shuffle only whole-plots ("plots"):
h_whole <- how (within = Within (type = 'none'), plots = Plots (strata = strt, type = 'free'))
shuffleSet (n = 12, nset = 3, control = h_whole)
No. of Permutations: 3
No. of Samples: 12 (Nested in: plots; )
Restricted by Plots: strt (3 plots; Randomised)

1  2  3  4 5  6  7  8 9 10 11 12
p1 5  6  7  8 1  2  3  4 9 10 11 12
p2 1  2  3  4 9 10 11 12 5  6  7  8
p3 9 10 11 12 1  2  3  4 5  6  7  8
##### Shuffle both split- and whole-plots ("samples" and "plots"):
h_split_whole<- how (within = Within (type = 'free'), plots = Plots (strata = strt, type = 'free'))
shuffleSet (n = 12, nset = 3, control = h_split_whole)
No. of Permutations: 3
No. of Samples: 12 (Nested in: plots; Randomised)
Restricted by Plots: strt (3 plots; Randomised)

1  2 3  4  5 6  7  8  9 10 11 12
p1 10 11 9 12  3 1  2  4  7  6  8  5
p2  6  8 5  7  3 4  2  1 10 12  9 11
p3  5  7 8  6 11 9 10 12  3  2  4  1

#### Linear transect (series with 12 samples) with mirroring (circular permutation):

h_series <- how (within = Within (type = 'series', mirror = TRUE))
shuffleSet (n = 12, nset = 3, control = h_series)
No. of Permutations: 3
No. of Samples: 12 (Sequence; mirrored)

1  2  3  4  5  6  7  8 9 10 11 12
p1  5  4  3  2  1 12 11 10 9  8  7  6
p2 12 11 10  9  8  7  6  5 4  3  2  1
p3  3  2  1 12 11 10  9  8 7  6  5  4

#### Rectangular grid with 3 rows and 4 columns (toroidal shift with mirroring):

h_grid <- how (within = Within (type = 'grid', ncol = 4, nrow = 3, mirror = TRUE))
shuffleSet (n = 12, nset = 3, control = h_grid)
No. of Permutations: 3
No. of Samples: 12 (Spatial grid: 3r, 4c; mirrored)

1 2 3 4 5 6  7  8  9 10 11 12
p1 2 1 3 5 4 6  8  7  9 11 10 12
p2 5 4 6 8 7 9 11 10 12  2  1  3
p3 5 6 4 2 3 1 11 12 10  8  9  7