User Tools

Site Tools


en:data:simul

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:data:simul [2014/12/20 23:00]
David Zelený [Scripts for creating simulated datasets]
en:data:simul [2018/04/01 10:58] (current)
David Zelený [Data for download]
Line 5: Line 5:
 {{:​obrazky:​species_response_curves.png?​400|Simulated response of species along virtual gradient (only first 50 species are displayed)}} {{:​obrazky:​species_response_curves.png?​400|Simulated response of species along virtual gradient (only first 50 species are displayed)}}
  
-These simulated community datasets represent the model of community, which is fully based on the ecological niche theory. Unimodal species response curves are randomly distributed along one (or two, respectively) virtual ecological gradients, reflecting the probability of species occurrence in given part of the gradient (species response curve is based on Beta function). Each species is defined by its ecological optimum along the gradient, niche width, maximum probability of occurrence and few other parameters. In the next step, random positions along gradient are generated, and within each position ("plot") are collected individual species in the following way: first, random number is generated, corresponding to the number of individuals in given plot; than, each individual is randomly assigned to a species and probability of the assignment to given species is weighted by probability of this species occurrence in particular part of the gradient. One species could be hence assigned to more individuals per plot, if its probability of occurrence in given part of the gradient is high. In case of two virtual gradients, the probability of occurrence for particular species is given by multiplying the probabilities of given species along each of the gradient. For details, see the scripts below.+These simulated community datasets represent the model of community, which is fully based on the ecological niche theory. Unimodal species response curves are randomly distributed along one (or two, respectively) virtual ecological gradients, reflecting the probability of species occurrence in given part of the gradient (species response curve is based on Beta function). Each species is defined by its ecological optimum along the gradient, niche width, maximum probability of occurrence and few other parameters. In the next step, random positions along gradient are generated, and within each position ("sample") are collected individual species in the following way: first, random number is generated, corresponding to the number of individuals in given sample; than, each individual is randomly assigned to a species and probability of the assignment to given species is weighted by probability of this species occurrence in particular part of the gradient. One species could be hence assigned to more individuals per sample, if its probability of occurrence in given part of the gradient is high. In case of two virtual gradients, the probability of occurrence for particular species is given by multiplying the probabilities of given species along each of the gradient. For details, see the scripts below.
  
 ===== Parameters of the files ===== ===== Parameters of the files =====
Line 11: Line 11:
   *simul2 - 2 gradients of different length (5000 a 2000 units), 500 samples, 300 species   *simul2 - 2 gradients of different length (5000 a 2000 units), 500 samples, 300 species
   *simul3 - 2 gradients of the same length (5000 units), 500 samples, 300 species   *simul3 - 2 gradients of the same length (5000 units), 500 samples, 300 species
-  *simul.short - 2 gradients of different length, both rather short (1100 and 800 units), 70 samples, 300 species, samples are distributed evenly along the gradient (distance among samples ​is 100 units) +  *simul.short - 2 gradients of different length, both rather short (1100 and 800 units), 70 samples, 300 species, samples are distributed evenly along the gradient (distances between ​samples ​along each gradient are exactly ​100 units) 
-  *simul.long - 2 gradients of different length, both rather long (5500 and 4000 units), 70 samples, 300 species, samples are distributed evenly along the gradient (distance among samples ​is 100 units)+  *simul.long - 2 gradients of different length, both rather long (5500 and 4000 units), 70 samples, 300 species, samples are distributed evenly along the gradient (distances between ​samples ​along each gradient are exactly ​100 units)
  
-Note: number of species (e.g. 300) is a parameter set up for simulated models - number of species in the resulting community matrix ​doesn'​t ​have to fit to number of simulated species, ​as some of the species ​with low abundance ​were not "​sampled"​.+Note: the number of species (e.g. 300) is a parameter set up for simulated models - the number of species in the resulting community matrix ​does not have to fit to number of simulated species, ​because ​some of the less abundant ​species were not "​sampled"​.
  
 +===== Environmental variables =====
 +|< 100% 150px ->|
 +^ Name of variable ​     ^ Description ​                                                                                           ^
 +| gradient ​             | position of the sample along the virtual gradient (for datasets with only one gradient) ​               |
 +| gradient1, gradient2 ​ | position of each sample along the first and second virtual gradient (for datasets with two gradients) ​ |
 +| group                 | classification of samples by modified Twinspan into four groups ​                                       |
 +
 +===== Species attributes ===== 
 +|< 100% 150px ->|
 +| optimum ​                    | position of species optima along the virtual gradient (for datasets with only one virtual gradient) ​              |
 +| niche.width ​                | width of the species niche in units of the virtual gradient (for dataset with only one virtual gradient) ​         |
 +| optimum1, optimum2 ​         | position of species optima along the first and second virtual gradient (for datasets with two virtual gradients) ​ |
 +| niche.width1,​ niche.widht2 ​ | width of the species niche along the first and second virtual gradient (for datasets with two virtual gradients) ​ |
  
  
 ===== Data for download ===== ===== Data for download =====
-Files with -spe contains ​presence-absence matrix of species data, files with -env contains ​position of simulated sample along virtual gradient (analogy to measured environmental variable), files with -specivalues ​contain information about position of species optima along the gradient and niche width (both in arbitrary gradient units). ​+Files containing ''​-spe''​ in the name represent ​presence-absence matrix of species data, files with ''​-env''​ contain ​position of simulated sample along virtual gradient (analogy to measured environmental variable), files with ''​-specvalues'' ​contain information about position of species optima along the gradient and niche width (both in arbitrary gradient units). ​ 
 + 
 +|< 100% 150px 100px - >| 
 +^ File name                                                                                                           ^ File type                 ^ Description ​                                                           ^ 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul1-spe.txt|simul1-spe.txt]] ​               | tab-delimited txt format ​ | Sample × species matrix (500 samples in rows, 296 species in columns) ​ | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul1-env.txt|simul1-env.txt]] ​               | tab-delimited txt format ​ | Environmental variable matrix (samples in rows, variables in columns) ​ | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul1-specvalues.txt|simul1-specvalues.txt]] ​ | tab-delimited txt format ​ | Species attribute matrix (species in rows, attributes in columns) ​     | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul2-spe.txt|simul2-spe.txt]] ​               | tab-delimited txt format ​ | Sample × species matrix (500 samples in rows, 282 species in columns) ​ | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul2-env.txt|simul2-env.txt]] ​               | tab-delimited txt format ​ | Environmental variable matrix (samples in rows, variables in columns) ​ | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul2-specvalues.txt|simul2-specvalues.txt]] ​ | tab-delimited txt format ​ | Species attribute matrix (species in rows, attributes in columns) ​     | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul3-spe.txt|simul3-spe.txt]] ​               | tab-delimited txt format ​ | Sample × species matrix (500 samples in rows, 279 species in columns) ​ | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul3-env.txt|simul3-env.txt]] ​               | tab-delimited txt format ​ | Environmental variable matrix (samples in rows, variables in columns) ​ | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul3-specvalues.txt|simul3-specvalues.txt]] ​ | tab-delimited txt format ​ | Species attribute matrix (species in rows, attributes in columns) ​     | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul.short-spe.txt|simul.short-spe.txt]] ​     | tab-delimited txt format ​ | Sample × species matrix (70 samples in rows, 300 species in columns) ​  | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul.short-env.txt|simul.short-env.txt]] ​     | tab-delimited txt format ​ | Environmental variable matrix (samples in rows, variables in columns) ​ | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul.long-spe.txt|simul.long-spe.txt]] ​       | tab-delimited txt format ​ | Sample × species matrix (70 samples in rows, 300 species in columns) ​  | 
 +| [[https://​raw.githubusercontent.com/​zdealveindy/​anadat-r/​master/​data/​simul.long-env.txt|simul.long-env.txt]] ​       | tab-delimited txt format ​ | Environmental variable matrix (samples in rows, variables in columns) ​ |
  
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul1-spe.csv|simul1-spe.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul1-env.csv|simul1-env.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul1-specvalues.csv|simul1-specvalues.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul2-spe.csv|simul2-spe.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul2-env.csv|simul2-env.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul2-specvalues.csv|simul2-specvalues.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul3-spe.csv|simul3-spe.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul3-env.csv|simul3-env.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul3-specvalues.csv|simul3-specvalues.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul.short-spe.csv|simul.short-spe.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul.short-env.csv|simul.short-env.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul.long-spe.csv|simul.long-spe.csv]] 
-  *[[http://​www.davidzeleny.net/​anadat-r/​data-download/​simul.long-env.csv|simul.long-env.csv]] 
 ===== Script for direct import of data to R ===== ===== Script for direct import of data to R =====
 <code rsplus> <code rsplus>
-simul1.spe <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul1-spe.txt',​ row.names = 1) +simul1.spe <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul1-spe.txt',​ row.names = 1) 
-simul1.env <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul1-env.txt',​ row.names = 1) +simul1.env <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul1-env.txt',​ row.names = 1) 
-simul1.specvalues <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul1-specvalues.txt',​ row.names = 1)+simul1.specvalues <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul1-specvalues.txt',​ row.names = 1)
  
-simul2.spe <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul2-spe.txt',​ row.names = 1) +simul2.spe <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul2-spe.txt',​ row.names = 1) 
-simul2.env <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul2-env.txt',​ row.names = 1) +simul2.env <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul2-env.txt',​ row.names = 1) 
-simul2.specvalues <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul2-specvalues.txt',​ row.names = 1)+simul2.specvalues <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul2-specvalues.txt',​ row.names = 1)
  
-simul3.spe <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul3-spe.txt',​ row.names = 1) +simul3.spe <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul3-spe.txt',​ row.names = 1) 
-simul3.env <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul3-env.txt',​ row.names = 1) +simul3.env <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul3-env.txt',​ row.names = 1) 
-simul3.specvalues <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul3-specvalues.txt',​ row.names = 1)+simul3.specvalues <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul3-specvalues.txt',​ row.names = 1)
  
-simul.short.spe <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul.short-spe.txt',​ row.names = 1) +simul.short.spe <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul.short-spe.txt',​ row.names = 1) 
-simul.short.env <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul.short-env.txt',​ row.names = 1)+simul.short.env <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul.short-env.txt',​ row.names = 1)
  
-simul.long.spe <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul.long-spe.txt',​ row.names = 1) +simul.long.spe <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul.long-spe.txt',​ row.names = 1) 
-simul.long.env <- read.delim ('http://www.davidzeleny.net/​anadat-r/​data-download/​simul.long-env.txt',​ row.names = 1)+simul.long.env <- read.delim ('https://raw.githubusercontent.com/​zdealveindy/anadat-r/master/​data/​simul.long-env.txt',​ row.names = 1)
   ​   ​
 </​code>​ </​code>​
Line 62: Line 78:
  
 **Notes:​** ​ **Notes:​** ​
 +  * Simulated data along one and two ecological gradients, respectively,​ can be now prepared faster using functions ''​simul.comm''​ and ''​simul.comm.2''​ from packages ''​[[https://​github.com/​zdealveindy/​theta|theta]]''​ and ''​[[https://​github.com/​zdealveindy/​weimea|weimea]]'',​ respectively.
   * Function ''​compas''​ in package ''​[[http://​commecol.r-forge.r-project.org/​|CommEcol]]''​ written by Adriano Sanches Melo should be direct implementation of Minchin'​s software COMPAS (Minchin 1987). The principles are similar (actually Fridley et al. 2007 paper cites the Minchin'​s paper using COMPAS), but ''​compas''​ allows generation of community matrix in more than two dimensions, and adding quantitative and qualitative noise.   * Function ''​compas''​ in package ''​[[http://​commecol.r-forge.r-project.org/​|CommEcol]]''​ written by Adriano Sanches Melo should be direct implementation of Minchin'​s software COMPAS (Minchin 1987). The principles are similar (actually Fridley et al. 2007 paper cites the Minchin'​s paper using COMPAS), but ''​compas''​ allows generation of community matrix in more than two dimensions, and adding quantitative and qualitative noise.
   * Even more comprehensive is package ''​[[http://​www.fromthebottomoftheheap.net/​2014/​07/​31/​simulating-species-abundance-data-with-the-coenocliner-package/​|coenocliner]]''​ developed by Gavin Simpson - apart to Minchin'​s model, it can simulate bunch of other types of community data along coenocline.   * Even more comprehensive is package ''​[[http://​www.fromthebottomoftheheap.net/​2014/​07/​31/​simulating-species-abundance-data-with-the-coenocliner-package/​|coenocliner]]''​ developed by Gavin Simpson - apart to Minchin'​s model, it can simulate bunch of other types of community data along coenocline.
 ===== References ===== ===== References =====
-  * FridleyJ.D., VandermastD.B., KuppingerD.M., MantheyM., and Peet, R.K. (2007) Co-occurrence-based assessment of habitat generalists and specialists:​ a new approach for the measurement of niche width. //Journal of Ecology// 95: 707-722 [[http://​plantecology.syr.edu/​fridley/​Fridley_ea2007_JEcol.pdf|pdf]] [[http://​onlinelibrary.wiley.com/​store/​10.1111/​j.1365-2745.2007.01236.x/​asset/​supinfo/​JEC1236SA2.txt?​v=1&​s=3fda4b1028ceed2c5d157c056f83e3d8802a8dfe|Appendix S2]] +  * Fridley J.D., Vandermast D.B., Kuppinger D.M., Manthey M. Peet, R.K. (2007)Co-occurrence-based assessment of habitat generalists and specialists:​ a new approach for the measurement of niche width. //Journal of Ecology// 95: 707-722 [[http://​plantecology.syr.edu/​fridley/​Fridley_ea2007_JEcol.pdf|pdf]] [[http://​onlinelibrary.wiley.com/​store/​10.1111/​j.1365-2745.2007.01236.x/​asset/​supinfo/​JEC1236SA2.txt?​v=1&​s=3fda4b1028ceed2c5d157c056f83e3d8802a8dfe|Appendix S2]] 
-  * MinchinP.R. (1987) Simulation of multidimensional community patterns: towards a comprehensive model. //​Vegetatio//​ 71: 145-156.+  * Minchin P.R. (1987)Simulation of multidimensional community patterns: towards a comprehensive model. //​Vegetatio//​ 71: 145-156.
en/data/simul.1419087627.txt.gz · Last modified: 2017/10/11 20:36 (external edit)