User Tools

Site Tools


en:similarity

Ecological resemblance

Theory

Ecological resemblance between two samples is the basic tool how to handle multivariate ecological data. It refers to similarity or dissimilarity between samples in terms of their species composition - two samples sharing the same species in the same abundances has the highest similarity (and lowest dissimilarity), and the similarity decreases (and dissimilarity increases) with the differences in their species composition. All cluster and ordination methods operates with similarity or dissimilarity between samples. It's true even in case of such methods like Correspondence analysis (CA), which is based on iterative weighted averaging of sample and species scores along ordination axes - resulting distribution of samples in ordination space reflects their chi-square distances1).

Similarity, dissimilarity and distance

Intuitively, one thinks about similarity among samples - the more are two samples similar in terms of their species composition, the higher is their similarity. The values of similarity indices range from 0 (the samples does not share any species) to 1 (samples are identical). Classification and ordination techniques, however, does not use similarities, but so called distances, because they need to localize the samples in multidimensional space. Distances in fact equals to ecological dissimilarities 2), and these could be derived from particular similarity indices (dissimilarity = 1-similarity), or there are specific distance measures, such as Euclidean, which doesn't have counterpart in similarity index. While all similarity indices can be converted into dissimilarities, not all dissimilarity (or distance, respectively) measures could be converted into similarities (as is true e.g. for Euclidean distance).

There are number of measures of similarities or distances (Legendre & Legendre 2012 list around 30 of them). First decision you have to make is whether you aim to do R or Q mode of analysis 3), because each has different set of measures. Further, if focusing on differences between samples (Q mode), the most relevant measures are those ignoring so called double zeroes. Double zeros problem - briefly said - is a situation, when some species DOES NOT occur in both samples. Imagine, that you have matrix of many samples and many species, and you want to calculate similarity among two samples. You cut particular two rows from the matrix - these rows contain species which are present in the first, the second, both or none of the samples, and double zeroes are those species missing in both samples. Some indices treat double zeros as if the samples are similar, because they both have missing certain species (as well as they are similar because they both share the same species). This is, however, ecologically hard to defend: the fact that some species is missing in a sample can have several different interpretations, and it should not be considered as similarity4)). Legendre & Legendre 2012 offers a kind of “key” how to select appropriate measure for given data and problem (Tables 7.4-7.6). Generally, as a rule of thumb, Bray-Curtis, Sørensen or Chord distance are better choices than Euclidean or Chi-square distances.

There is a bit disorder in the names of different similarity or dissimilarity indices. Not to get confused, here we will talk only about dissimilarity indices, i.e. distances.

Further reading

Further inspiration could be found in CRAN Task View: Analysis of Ecological and Environmental Data, maintained by Gavin Simpson, in the section Dissimilarity coefficients.

1)
Similarly, PCA has samples distributed according to their Euclidean distances.
2)
although, strictly speaking, not all dissimilarities could be expressed as a distance in Euclidean space
3)
R mode focuses on differences among species, Q mode on differences among samples
4)
For example we have two samples, one on a wet habitat and other on dry one. In both samples, species of mesic habitats are missing, but in each of them from different reason - on wet habitat it's too wet for them, in dry habitat too dry.
en/similarity.txt · Last modified: 2017/02/22 15:07 by David Zelený