Misplaced Pages

Substitution model

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In biology, a substitution model , also called models of sequence evolution , are Markov models that describe changes over evolutionary time. These models describe evolutionary changes in macromolecules, such as DNA sequences or protein sequences , that can be represented as sequence of symbols (e.g., A, C, G, and T in the case of DNA or the 20 "standard" proteinogenic amino acids in the case of proteins ). Substitution models are used to calculate the likelihood of phylogenetic trees using multiple sequence alignment data. Thus, substitution models are central to maximum likelihood estimation of phylogeny as well as Bayesian inference in phylogeny . Estimates of evolutionary distances (numbers of substitutions that have occurred since a pair of sequences diverged from a common ancestor) are typically calculated using substitution models (evolutionary distances are used input for distance methods such as neighbor joining ). Substitution models are also central to phylogenetic invariants because they are necessary to predict site pattern frequencies given a tree topology. Substitution models are also necessary to simulate sequence data for a group of organisms related by a specific tree.

#961038

110-415: Phylogenetic tree topologies are often the parameter of interest; thus, branch lengths and any other parameters describing the substitution process are often viewed as nuisance parameters . However, biologists are sometimes interested in the other aspects of the model. For example, branch lengths, especially when those branch lengths are combined with information from the fossil record and a model to estimate

220-607: A {\displaystyle a} through e {\displaystyle e} , which are expressed relative to the fixed f = r G T = 1 {\displaystyle f=r_{GT}=1} in this example) and three equilibrium base frequency parameters (as described above, only three π i {\displaystyle \pi _{i}} values need to be specified because π → {\displaystyle {\vec {\pi }}} must sum to 1). The alternative notation also makes it easier to understand

330-1276: A π C b π G c π T a π A − ( a π A + d π G + e π T ) d π G e π T b π A d π C − ( b π A + d π C + f π T ) f π T c π A e π C f π G − ( c π A + e π C + f π G ) ) {\displaystyle Q={\begin{pmatrix}{-(a\pi _{C}+b\pi _{G}+c\pi _{T})}&a\pi _{C}&b\pi _{G}&c\pi _{T}\\a\pi _{A}&{-(a\pi _{A}+d\pi _{G}+e\pi _{T})}&d\pi _{G}&e\pi _{T}\\b\pi _{A}&d\pi _{C}&{-(b\pi _{A}+d\pi _{C}+f\pi _{T})}&f\pi _{T}\\c\pi _{A}&e\pi _{C}&f\pi _{G}&{-(c\pi _{A}+e\pi _{C}+f\pi _{G})}\end{pmatrix}}} The Q {\displaystyle Q} matrix

440-606: A museum drawer in Stuttgart ; they had been unearthed in a clay pit at Wiesloch –Frauenweiler, south of Heidelberg , Germany , and, because hummingbirds were assumed to have never occurred outside the Americas, were not recognized to be hummingbirds until Mayr took a closer look at them. Fossils of birds not clearly assignable to either hummingbirds or a related extinct family, the Jungornithidae, have been found at

550-409: A binary alphabet to score the following phenotypic traits "has feathers", "lays eggs", "has fur", "is warm-blooded", and "capable of powered flight". In this toy example hummingbirds would have sequence 11011 (most other birds would have the same string), ostriches would have the sequence 11010, cattle (and most other land mammals ) would have 00110, and bats would have 00111. The likelihood of

660-468: A female at speeds around 23 m/s (83 km/h; 51 mph). The sexes differ in feather coloration, with males having distinct brilliance and ornamentation of head, neck, wing, and breast feathers. The most typical feather ornament in males is the gorget – a bib-like iridescent neck-feather patch that changes brilliance with the viewing angle to attract females and warn male competitors away from territory. Hummingbirds begin mating when they are

770-468: A fossil record may make it possible to determine the number of years between an ancestral species and a descendant species. Because some species evolve at faster rates than others, these two measures of branch length are not always in direct proportion. The expected number of substitutions per site per year is often indicated with the Greek letter mu (μ). A model is said to have a strict molecular clock if

880-421: A four nucleotide alphabet (A, C, G, and U). However, substitution models can be used for alphabets of any size; the alphabet is the 20 proteinogenic amino acids for proteins and the sense codons (i.e., the 61 codons that encode amino acids in the standard genetic code ) for aligned protein-coding gene sequences. In fact, substitution models can be developed for any biological characters that can be encoded using

990-470: A function of a number of parameters which are estimated for every data set analyzed, preferably using maximum likelihood . This has the advantage that the model can be adjusted to the particularities of a specific data set (e.g. different composition biases in DNA). Problems can arise when too many parameters are used, particularly if they can compensate for each other (this can lead to non-identifiability). Then it

1100-484: A given position, conditional on there being a base i in that position at time 0. When the model is time reversible, this can be performed between any two sequences, even if one is not the ancestor of the other, if you know the total branch length between them. The asymptotic properties of P ij (t) are such that P ij (0) = δ ij , where δ ij is the Kronecker delta function. That is, there

1210-645: A high metabolic rate dependent on foraging for sugars from flower nectar. Hummingbird legs are short with feet having three toes pointing forward and one backward – the hallux . The toes of hummingbirds are formed as claws with ridged inner surfaces to aid gripping onto flower stems or petals. Hummingbirds do not walk on the ground or hop like most birds, but rather shuffle laterally and use their feet to grip while perching, preening feathers, or nest-building (by females), and during fights to grab feathers of opponents. Hummingbirds apply their legs as pistons for generating thrust upon taking flight, although

SECTION 10

#1732791241962

1320-421: A model is time-reversible, which species was the ancestral species is irrelevant. Instead, the phylogenetic tree can be rooted using any of the species, re-rooted later based on new knowledge, or left unrooted. This is because there is no 'special' species, all species will eventually derive from one another with the same probability. A model is time reversible if and only if it satisfies the property (the notation

1430-451: A phylogenetic tree can then be calculated using those binary sequences and an appropriate substitution model. The existence of these morphological models make it possible to analyze data matrices with fossil taxa, either using the morphological data alone or a combination of morphological and molecular data (with the latter scored as missing data for the fossil taxa). There is an obvious similarity between use of molecular or phenotypic data in

1540-437: A phylogenetic tree is expressed as the expected number of substitutions per site; if the evolutionary model indicates that each site within an ancestral sequence will typically experience x substitutions by the time it evolves to a particular descendant's sequence then the ancestor and descendant are considered to be separated by branch length x . Sometimes a branch length is measured in terms of geological years. For example,

1650-439: A rate matrix, Q , which describes the rate at which bases of one type change into bases of another type; element Q i j {\displaystyle Q_{ij}} for i  ≠  j is the rate at which base i goes to base j . The diagonals of the Q matrix are chosen so that the rows sum to zero: The equilibrium row vector π must be annihilated by the rate matrix Q : The transition matrix function

1760-526: A set of exchangeability parameters ( r i j {\displaystyle r_{ij}} ) for any alphabet of k {\displaystyle k} character states. These values can then be used to populate the Q {\displaystyle Q} matrix by setting the off-diagonal elements as shown above (the general notation would be Q i j = r i j π j {\displaystyle Q_{ij}=r_{ij}\pi _{j}} ), setting

1870-442: A small number of flower species. Even in the most specialized hummingbird–plant mutualisms, the number of food plant lineages of the individual hummingbird species increases with time. The bee hummingbird ( Mellisuga helenae ) – the world's smallest bird – evolved to dwarfism likely because it had to compete with long-billed hummingbirds having an advantage for nectar foraging from specialized flowers, consequently leading

1980-457: A specific alphabet (e.g., amino acid sequences combined with information about the conformation of those amino acids in three-dimensional protein structures ). The majority of substitution models used for evolutionary research assume independence among sites (i.e., the probability of observing any specific site pattern is identical regardless of where the site pattern is in the sequence alignment). This simplifies likelihood calculations because it

2090-461: A specific multinomial distribution for site pattern frequencies. If we consider a multiple sequence alignment of four DNA sequences there are 256 possible site patterns so there are 255 degrees of freedom for the site pattern frequencies. However, it is possible to specify the expected site pattern frequencies using five degrees of freedom if using the Jukes-Cantor model of DNA evolution, which

2200-469: A state similar to hibernation , and slow their metabolic rate to 1 ⁄ 15 of its normal rate. While most hummingbirds do not migrate , the rufous hummingbird has one of the longest migrations among birds, traveling twice per year between Alaska and Mexico , a distance of about 3,900 miles (6,300 km). Hummingbirds split from their sister group , the swifts and treeswifts , around 42 million years ago. The oldest known fossil hummingbird

2310-406: A year old. Sex occurs over 3–5 seconds when the male joins its cloaca with the female's, passing sperm to fertilize the female's eggs. Hummingbird females build a nest resembling a small cup about 1.5 inches (3.8 cm) in diameter, commonly attached to a tree branch using spider webs, lichens , moss, and loose strings of plant fibers (image). Typically, two pea -shaped white eggs (image) –

SECTION 20

#1732791241962

2420-782: Is Eurotrochilus , from the Rupelian Stage of Early Oligocene Europe. Hummingbirds are the smallest known and smallest living avian theropod dinosaurs . The iridescent colors and highly specialized feathers of many species (mainly in males) give some hummingbirds exotic common names, such as sun gem, fairy, woodstar, sapphire or sylph . Across the estimated 366 species, hummingbird weights range from as small as 2 grams (0.071 oz) to as large as 20 grams (0.71 oz). They have characteristic long, narrow beaks (bills) which may be straight (of varying lengths) or highly curved. The bee hummingbird – only 6 centimetres (2.4 in) long and weighing about 2 grams (0.071 oz) –

2530-417: Is a case where use can be made of a pivotal quantity . However, in other cases no such circumvention is known. Practical approaches to statistical analysis treat nuisance parameters somewhat differently in frequentist and Bayesian methodologies. A general approach in a frequentist analysis can be based on maximum likelihood-ratio tests . These provide both significance tests and confidence intervals for

2640-409: Is a function from the branch lengths (in some units of time, possibly in substitutions), to a matrix of conditional probabilities. It is denoted P ( t ) {\displaystyle P(t)} . The entry in the i column and the j row, P i j ( t ) {\displaystyle P_{ij}(t)} , is the probability, after time t , that there is a base j at

2750-428: Is a simple substitution model that allows one to calculate the expected site pattern frequencies only the tree topology and the branch lengths (given four taxa an unrooted bifurcating tree has five branch lengths). Substitution models also make it possible to simulate sequence data using Monte Carlo methods . Simulated multiple sequence alignments can be used to assess the performance of phylogenetic methods and generate

2860-612: Is beneficial because one can use reduced alphabets for amino acids. For example, one can use k = 6 {\displaystyle k=6} and encode amino acids by recoding the amino acids using the six categories proposed by Margaret Dayhoff . Reduced amino acid alphabets are viewed as a way to reduce the impact of compositional variation and saturation. Importantly, evolutionary patterns can vary among genomic regions and thus different genomic regions can fit with different substitution models. Actually, ignoring heterogeneous evolutionary patterns along sequences can lead to biases in

2970-794: Is due to convergent evolution . The hummingbird moth has flying and feeding characteristics similar to those of a hummingbird. Hummingbirds may be mistaken for hummingbird hawk-moths , which are large, flying insects with hovering capabilities, and exist only in Eurasia. Hummingbirds are restricted to the Americas from south central Alaska to Tierra del Fuego , including the Caribbean. The majority of species occur in tropical and subtropical Central and South America, but several species also breed in temperate climates and some hillstars occur even in alpine Andean highlands at altitudes up to 5,200 m (17,100 ft). The greatest species richness

3080-497: Is enough data available to create empirical models with any number of parameters, including empirical codon models. Because of the problems mentioned above, the two approaches are often combined, by estimating most of the parameters once on large-scale data, while a few remaining parameters are then adjusted to the data set under consideration. The following sections give an overview of the different approaches taken for DNA, protein or codon-based models. The first models of DNA evolution

3190-456: Is explained below) or, equivalently, the detailed balance property, for every i , j , and t . Time-reversibility should not be confused with stationarity . A model is stationary if Q does not change with time. The analysis below assumes a stationary model. Stationary, neutral, independent, finite sites models (assuming a constant rate of evolution) have two parameters, π , an equilibrium vector of base (or character) frequencies and

3300-419: Is greatest, possible ancestors of extant hummingbirds may have lived in parts of Europe and what is southern Russia today. As of 2023, 366 hummingbird species have been identified. They have been traditionally divided into two subfamilies : the hermits (subfamily Phaethornithinae) and the typical hummingbirds (subfamily Trochilinae , all the others). Molecular phylogenetic studies have shown, though, that

3410-612: Is in humid tropical and subtropical forests of the northern Andes and adjacent foothills, but the number of species found in the Atlantic Forest , Central America or southern Mexico also far exceeds the number found in southern South America, the Caribbean islands, the United States, and Canada. While fewer than 25 different species of hummingbirds have been recorded from the United States and fewer than 10 from Canada and Chile each, Colombia alone has more than 160 and

Substitution model - Misplaced Pages Continue

3520-438: Is likely necessary to adjust the model to these circumstances. Nuisance parameter In statistics , a nuisance parameter is any parameter which is unspecified but which must be accounted for in the hypothesis testing of the parameters which are of interest. The classic example of a nuisance parameter comes from the normal distribution , a member of the location–scale family . For at least one normal distribution,

3630-404: Is no change in base composition between a sequence and itself. At the other extreme, lim t → ∞ P i j ( t ) = π j , {\displaystyle \lim _{t\rightarrow \infty }P_{ij}(t)=\pi _{j}\,,} or, in other words, as time goes to infinity the probability of finding base j at a position given there

3740-510: Is normalized so − ∑ i = 1 4 π i Q i i = 1 {\displaystyle -\sum _{i=1}^{4}\pi _{i}Q_{ii}=1} . This notation is easier to understand than the notation originally used by Tavaré , because all model parameters correspond either to "exchangeability" parameters ( a {\displaystyle a} through f {\displaystyle f} , which can also be written using

3850-409: Is not possible to estimate all entries of the substitution matrix from the current data set only. On the downside, the parameters estimated from the training data might be too generic and therefore have a poor fit to any particular dataset. A potential solution for that problem is to estimate some parameters from the data using maximum likelihood (or some other method). In studies of protein evolution

3960-516: Is often the case that the data set is too small to yield enough information to estimate all parameters accurately. Empirical models are created by estimating many parameters (typically all entries of the rate matrix as well as the character frequencies, see the GTR model above) from a large data set. These parameters are then fixed and will be reused for every data set. This has the advantage that those parameters can be estimated more accurately. Normally, it

4070-478: Is often unrealistic, especially across long periods of evolution. For example, even though rodents are genetically very similar to primates , they have undergone a much higher number of substitutions in the estimated time since divergence in some regions of the genome . This could be due to their shorter generation time , higher metabolic rate , increased population structuring, increased rate of speciation , or smaller body size . When studying ancient events like

4180-415: Is only necessary to calculate the probability of all site patterns that appear in the alignment then use those values to calculate the overall likelihood of the alignment (e.g., the probability of three "GGGG" site patterns given some model of DNA sequence evolution is simply the probability of a single "GGGG" site pattern raised to the third power). This means that substitution models can be viewed as implying

4290-485: Is the exchangeability of nucleotides i {\displaystyle i} and j {\displaystyle j} and π j {\displaystyle \pi _{j}} is the equilibrium frequency of the j t h {\displaystyle j^{th}} nucleotide. The matrix shown above uses the letters a {\displaystyle a} through f {\displaystyle f} for

4400-461: Is the matrix Q multiplied by itself enough times to give its n power. If Q is diagonalizable , the matrix exponential can be computed directly: let Q  =  U  Λ  U be a diagonalization of Q , with where Λ is a diagonal matrix and where { λ i } {\displaystyle \lbrace \lambda _{i}\rbrace } are the eigenvalues of Q , each repeated according to its multiplicity. Then where

4510-544: Is the world's smallest bird and smallest warm-blooded vertebrate . Hummingbirds have compact bodies with relatively long, bladelike wings having anatomical structure enabling helicopter -like flight in any direction, including the ability to hover. Particularly while hovering, the wing beats produce the humming sounds, which function to alert other birds. In some species, the tail feathers produce sounds used by males during courtship flying. Hummingbirds have extremely rapid wing-beats as high as 80 per second, supported by

Substitution model - Misplaced Pages Continue

4620-436: Is to reduce the number of codons by forbidding the stop (or nonsense ) codons. This is a biologically reasonable assumption because including the stop codons would mean that one is calculating the probability of finding sense codon j {\displaystyle j} after time t {\displaystyle t} given that the ancestral codon is i {\displaystyle i} would involve

4730-453: Is typically set to a value of 1 to increase the readability of the exchangeability parameter estimates (since it allows users to express those values relative to chosen exchangeability parameter). The practice of expressing the exchangeability parameters in relative terms is not problematic because the Q {\displaystyle Q} matrix is normalized. Normalization allows t {\displaystyle t} (time) in

4840-762: The Americas and comprise the biological family Trochilidae . With approximately 366 species and 113 genera , they occur from Alaska to Tierra del Fuego , but most species are found in Central and South America . As of 2024, 21 hummingbird species are listed as endangered or critically endangered , with numerous species declining in population. Hummingbirds have varied specialized characteristics to enable rapid, maneuverable flight: exceptional metabolic capacity , adaptations to high altitude, sensitive visual and communication abilities, and long-distance migration in some species. Among all birds, male hummingbirds have

4950-468: The Cambrian explosion under a molecular clock assumption, poor concurrence between cladistic and phylogenetic data is often observed. There has been some work on models allowing variable rate of evolution. Models that can take into account variability of the rate of the molecular clock between different evolutionary lineages in the phylogeny are called “relaxed” in opposition to “strict”. In such models

5060-557: The International Union for Conservation of Nature Red List of Threatened Species in 2024, 8 hummingbird species are classified as critically endangered , 13 are endangered , 13 are vulnerable , and 20 species are near-threatened . Two species – the Brace's emerald ( Riccordia bracei ) and Caribbean emerald ( Riccordia elegans ) – have been declared extinct . Of the 15 species of North American hummingbirds that inhabit

5170-510: The Messel pit and in the Caucasus , dating from 35 to 40 million years ago; this indicates that the split between these two lineages indeed occurred around that time. The areas where these early fossils have been found had a climate quite similar to that of the northern Caribbean or southernmost China during that time. The biggest remaining mystery at present is what happened to hummingbirds in

5280-583: The cladogram below, the English names are those introduced in 1997. The scientific names are those introduced in 2013. Florisuginae – topazes Phaethornithinae – hermits Polytminae – mangoes Heliantheini – brilliants Lesbiini – coquettes Patagoninae – giants Lampornithini – mountain gems Mellisugini – bees Trochilini – emeralds While all hummingbirds depend on flower nectar to fuel their high metabolisms and hovering flight, coordinated changes in flower and bill shape stimulated

5390-502: The likelihood function into components representing information about the parameters of interest and information about the other (nuisance) parameters. This can involve ideas about sufficient statistics and ancillary statistics . When this partition can be achieved it may be possible to complete a Bayesian analysis for the parameters of interest by determining their joint posterior distribution algebraically. The partition allows frequentist theory to develop general estimation approaches in

5500-456: The long-billed hermit , appear to be evolving a dagger -like weapon on the beak tip as a secondary sexual trait to defend mating areas . The Andes Mountains appear to be a particularly rich environment for hummingbird evolution because diversification occurred simultaneously with mountain uplift over the past 10 million years. Hummingbirds remain in dynamic diversification inhabiting ecological regions across South America, North America, and

5610-478: The null distribution for certain statistical tests in the fields of molecular evolution and molecular phylogenetics. Examples of these tests include tests of model fit and the "SOWH test" that can be used to examine tree topologies. The fact that substitution models can be used to analyze any biological alphabet has made it possible to develop models of evolution for phenotypic datasets (e.g., morphological and behavioural traits). Typically, "0" is. used to indicate

SECTION 50

#1732791241962

5720-411: The ornithophilous flowers upon which they feed. This coevolution implies that morphological traits of hummingbirds, such as bill length, bill curvature, and body mass, are correlated with morphological traits of plants, such as corolla length, curvature, and volume. Some species, especially those with unusual bill shapes, such as the sword-billed hummingbird and the sicklebills , are coevolved with

5830-414: The transition rate, one for the rate of transversions that conserve the strong/weak properties of nucleotides ( A ↔ T {\displaystyle A\leftrightarrow T} and C ↔ G {\displaystyle C\leftrightarrow G} , designated β {\displaystyle \beta } by Kimura), and one for rate of transversions that conserve

5940-407: The variance (s), σ is often not specified or known, but one desires to hypothesis test on the mean(s). Another example might be linear regression with unknown variance in the explanatory variable (the independent variable): its variance is a nuisance parameter that must be accounted for to derive an accurate interval estimate of the regression slope , calculate p-values , hypothesis test on

6050-425: The 4 frequency parameters must sum to 1, there are only 3 free frequency parameters. The total of 9 free parameters is often further reduced to 8 parameters plus μ {\displaystyle \mu } , the overall number of substitutions per unit time. When measuring time in substitutions ( μ {\displaystyle \mu } =1) only 8 free parameters remain. In general, to compute

6160-458: The Caribbean, indicating an enlarging evolutionary radiation . Within the same geographic region, hummingbird clades coevolved with nectar-bearing plant clades, affecting mechanisms of pollination . The same is true for the sword-billed hummingbird ( Ensifera ensifera ), one of the morphologically most extreme species, and one of its main food plant clades ( Passiflora section Tacsonia ). Hummingbirds are specialized nectarivores tied to

6270-411: The GTR model can be applied to biological alphabets with a larger state-space (e.g., amino acids or codons ). It is possible to write a set of equilibrium state frequencies as π 1 {\displaystyle \pi _{1}} , π 2 {\displaystyle \pi _{2}} , ... π k {\displaystyle \pi _{k}} and

6380-690: The SYM model and the full GTR (or REV) model (where all exchangeability parameters are free). The equilibrium base frequencies are typically treated in two different ways: 1) all π i {\displaystyle \pi _{i}} values are constrained to be equal (i.e., π A = π C = π G = π T = 0.25 {\displaystyle \pi _{A}=\pi _{C}=\pi _{G}=\pi _{T}=0.25} ); or 2) all π i {\displaystyle \pi _{i}} values are treated as free parameters. Although

6490-399: The United States and Canada indicate that the ruby-throated hummingbird numbers are around 34 million, rufous hummingbirds are around 19 million, black-chinned , Anna's , and broad-tailed hummingbirds are about 8 million each, calliopes at 4 million, and Costa's and Allen's hummingbirds are around 2 million each. Several species exist only in the thousands or hundreds. According to

6600-428: The United States and Canada, several have changed their range of distribution, while others showed declines in numbers since the 1970s, including in 2023 with dozens of hummingbird species in decline. As of the 21st century, rufous, Costa's, calliope, broad-tailed, and Allen's hummingbirds are in significant decline, some losing as much as 67% of their numbers since 1970 at nearly double the rate of population loss over

6710-414: The absence of a trait and "1" is used to indicate the presence of a trait, although it is also possible to score characters using multiple states. Using this framework, we might encode a set of phenotypes as binary strings (this could be generalized to k -state strings for characters with more than two states) before analyses using an appropriate mode. This can be illustrated using a "toy" example: we can use

SECTION 60

#1732791241962

6820-405: The amino/keto properties of nucleotides ( A ↔ C {\displaystyle A\leftrightarrow C} and G ↔ T {\displaystyle G\leftrightarrow T} , designated γ {\displaystyle \gamma } by Kimura). In 1981, Joseph Felsenstein proposed a four-parameter model (F81) in which the substitution rate corresponds to

6930-669: The bee hummingbird to more successfully compete for flower foraging against insects. Many plants pollinated by hummingbirds produce flowers in shades of red, orange, and bright pink, although the birds take nectar from flowers of other colors. Hummingbirds can see wavelengths into the near- ultraviolet , but hummingbird-pollinated flowers do not reflect these wavelengths as many insect-pollinated flowers do. This narrow color spectrum may render hummingbird-pollinated flowers relatively inconspicuous to most insects, thereby reducing nectar robbing . Hummingbird-pollinated flowers also produce relatively weak nectar (averaging 25% sugars) containing

7040-753: The comparably small Ecuador has about 130 species. The family Trochilidae was introduced in 1825 by Irish zoologist Nicholas Aylward Vigors with Trochilus as the type genus . In traditional taxonomy , hummingbirds are placed in the order Apodiformes , which also contains the swifts , but some taxonomists have separated them into their own order, the Trochiliformes. Hummingbirds' wing bones are hollow and fragile, making fossilization difficult and leaving their evolutionary history poorly documented. Though scientists theorize that hummingbirds originated in South America, where species diversity

7150-419: The data while keeping the exchangeability matrix fixed. Beyond the common practice of estimating amino acid frequencies from the data, methods to estimate exchangeability parameters or adjust the Q {\displaystyle Q} matrix for protein evolution in other ways have been proposed. With the large-scale genome sequencing still producing very large amounts of DNA and protein sequences, there

7260-404: The data. It is also necessary because the patterns of DNA sequence evolution often differ among organisms and among genes within organisms. The later may reflect optimization by the action of selection for specific purposes (e.g. fast expression or messenger RNA stability) or it might reflect neutral variation in the patterns of substitution. Thus, depending on the organism and the type of gene, it

7370-420: The diagonal elements Q i i {\displaystyle Q_{ii}} to the negative sum of the off-diagonal elements on the same row, and normalizing. Obviously, k = 20 {\displaystyle k=20} for amino acids and k = 61 {\displaystyle k=61} for codons (assuming the standard genetic code ). However, the generality of this notation

7480-753: The diagonal matrix e is given by Generalised time reversible (GTR) is the most general neutral, independent, finite-sites, time-reversible model possible. It was first described in a general form by Simon Tavaré in 1986. The GTR model is often called the general time reversible model in publications; it has also been called the REV model. The GTR parameters for nucleotides consist of an equilibrium base frequency vector, π → = ( π 1 , π 2 , π 3 , π 4 ) {\displaystyle {\vec {\pi }}=(\pi _{1},\pi _{2},\pi _{3},\pi _{4})} , giving

7590-421: The earliest species of hummingbird occurred in the early Oligocene ( Rupelian about 34–28 million years ago) of Europe, belonging to the genus Eurotrochilus, having similar morphology to modern hummingbirds. A phylogenetic tree unequivocally indicates that modern hummingbirds originated in South America, with the last common ancestor of all living hummingbirds living around 22 million years ago. A map of

7700-412: The encoded amino acid (synonymous substitutions). Most of the work on substitution models has focused on DNA/ RNA and protein sequence evolution. Models of DNA sequence evolution, where the alphabet corresponds to the four nucleotides (A, C, G, and T), are probably the easiest models to understand. DNA models can also be used to examine RNA virus evolution; this reflects the fact that RNA also has

7810-439: The equilibrium amino acid frequencies π → = ( π A , π R , π N , . . . π V ) {\displaystyle {\vec {\pi }}=(\pi _{A},\pi _{R},\pi _{N},...\pi _{V})} (using the one-letter IUPAC codes for amino acids to indicate their equilibrium frequencies) are often estimated from

7920-829: The equilibrium base frequencies can be constrained in other ways most constraints that link some but not all π i {\displaystyle \pi _{i}} values are unrealistic from a biological standpoint. The possible exception is enforcing strand symmetry (i.e., constraining π A = π T {\displaystyle \pi _{A}=\pi _{T}} and π C = π G {\displaystyle \pi _{C}=\pi _{G}} but allowing π A + π T ≠ π C + π G {\displaystyle \pi _{A}+\pi _{T}\neq \pi _{C}+\pi _{G}} ). The alternative notation also makes it straightforward to see how

8030-609: The equilibrium frequency of the target nucleotide. Hasegawa, Kishino, and Yano unified the two last models to a five-parameter model (HKY). After these pioneering efforts, many additional sub-models of the GTR model were introduced into the literature (and common use) in the 1990s. Other models that move beyond the GTR model in specific ways were also developed and refined by several researchers. Almost all DNA substitution models are mechanistic models (as described above). The small number of parameters that one needs to estimate for these models makes it feasible to estimate those parameters from

8140-464: The estimation of evolutionary parameters, including the K a /K s ratio . In this regard, the use of mixture models in phylogenentic frameworks is convenient to better mimic the molecular evolution observed in real data. A main difference in evolutionary models is how many parameters are estimated every time for the data set under consideration and how many of them are estimated once on a large data set. Mechanistic models describe all substitutions as

8250-457: The evolutionary distance between those sequences is t {\displaystyle t} whereas p C A ( t ) {\displaystyle p_{\mathrm {CA} }(t)} is the probability of observing C in sequence 1 and A in sequence 2 at the same evolutionary distance). An arbitrarily chosen exchangeability parameters (e.g., f = r G T {\displaystyle f=r_{GT}} )

8360-420: The exchangeability parameters in the interest of readability, but those parameters could also be to written in a systematic manner using the r i j {\displaystyle r_{ij}} notation (e.g., a = r A C {\displaystyle a=r_{AC}} , b = r A G {\displaystyle b=r_{AG}} , and so forth). Note that

8470-400: The expected number of substitutions per year μ is constant regardless of which species' evolution is being examined. An important implication of a strict molecular clock is that the number of expected substitutions between an ancestral species and any of its present-day descendants must be independent of which descendant species is examined. Note that the assumption of a strict molecular clock

8580-423: The field of cladistics and analyses of morphological characters using a substitution model. However, there has been a vociferous debate in the systematics community regarding the question of whether or not cladistic analyses should be viewed as "model-free". The field of cladistics (defined in the strictest sense) favor the use of the maximum parsimony criterion for phylogenetic inference. Many cladists reject

8690-497: The formation of new species of hummingbirds and plants. Due to this exceptional evolutionary pattern, as many as 140 hummingbird species can coexist in a specific region, such as the Andes range . The hummingbird evolutionary tree shows that one key evolutionary factor appears to have been an altered taste receptor that enabled hummingbirds to seek nectar. Upon maturity, males of a particular species, Phaethornis longirostris,

8800-444: The frequency at which each base occurs at each site, and the rate matrix Because the model must be time reversible and must approach the equilibrium nucleotide (base) frequencies at long times, each rate below the diagonal equals the reciprocal rate above the diagonal multiplied by the equilibrium ratio of the two bases. As such, the nucleotide GTR requires 6 substitution rate parameters and 4 equilibrium base frequency parameters. Since

8910-648: The genome, it is more common to work with a codon substitution model (a codon is three bases and codes for one amino acid in a protein). There are 4 3 = 64 {\displaystyle 4^{3}=64} codons, resulting in 2078 free parameters. However, the rates for transitions between codons which differ by more than one base are often assumed to be zero, reducing the number of free parameters to only 20 × 19 × 3 2 + 63 − 1 = 632 {\displaystyle {{20\times 19\times 3} \over 2}+63-1=632} parameters. Another common practice

9020-462: The hermits are sister to the topazes , making the former definition of the Trochilinae not monophyletic . The hummingbirds form nine major clades : the topazes and jacobins , the hermits, the mangoes , the coquettes , the brilliants , the giant hummingbird ( Patagona gigas ), the mountaingems , the bees , and the emeralds . The topazes and jacobins combined have the oldest split with

9130-455: The humming sound created by their beating wings , which flap at high frequencies audible to other birds and humans. They hover at rapid wing-flapping rates, which vary from around 12 beats per second in the largest species to 80 per second in small hummingbirds. Hummingbirds have the highest mass-specific metabolic rate of any homeothermic animal. To conserve energy when food is scarce and at night when not foraging, they can enter torpor ,

9240-464: The hummingbird family tree – reconstructed from analysis of 284 species – shows rapid diversification from 22 million years ago. Hummingbirds fall into nine main clades – the topazes , hermits , mangoes , brilliants , coquettes , the giant hummingbird, mountaingems , bees , and emeralds – defining their relationship to nectar -bearing flowering plants which attract hummingbirds into new geographic areas. Molecular phylogenetic studies of

9350-554: The hummingbirds have shown that the family is composed of nine major clades. When Edward Dickinson and James Van Remsen Jr. updated the Howard and Moore Complete Checklist of the Birds of the World for the 4th edition in 2013, they divided the hummingbirds into six subfamilies. Molecular phylogenetic studies determined the relationships between the major groups of hummingbirds. In

9460-502: The joint posterior distribution of all the parameters: see Markov chain Monte Carlo . Given these, the joint distribution of only the parameters of interest can be readily found by marginalizing over the nuisance parameters. However, this approach may not always be computationally efficient if some or all of the nuisance parameters can be eliminated on a theoretical basis. Hummingbird Hummingbirds are birds native to

9570-413: The mathematics, the model does not care which sequence is the ancestor and which is the descendant so long as all other parameters (such as the number of substitutions per site that is expected between the two sequences) are held constant. When an analysis of real biological data is performed, there is generally no access to the sequences of ancestral species, only to the present-day species. However, when

9680-489: The matrix exponentiation P ( t ) = e Q t {\displaystyle P(t)=e^{Qt}} to be expressed in units of expected substitutions per site (standard practice in molecular phylogenetics). This is the equivalent to the statement that one is setting the mutation rate μ {\displaystyle \mu } to 1) and reducing the number of free parameters to eight. Specifically, there are five free exchangeability parameters (

9790-402: The notation r i j {\displaystyle r_{ij}} ) or to equilibrium nucleotide frequencies π → = ( π A , π C , π G , π T ) {\displaystyle {\vec {\pi }}=(\pi _{A},\pi _{C},\pi _{G},\pi _{T})} . Note that

9900-726: The nucleotides in a different order (e.g., some authors choose to group two purines together and the two pyrimidines together; see also models of DNA evolution ). These differences in notation make it important to be clear regarding the order of the states when writing the Q {\displaystyle Q} matrix. The value of this notation is that instantaneous rate of change from nucleotide i {\displaystyle i} to nucleotide j {\displaystyle j} can always be written as r i j π j {\displaystyle r_{ij}\pi _{j}} , where r i j {\displaystyle r_{ij}}

10010-1570: The nucleotides in the Q {\displaystyle Q} matrix have been written in alphabetical order. In other words, the transition probability matrix for the Q {\displaystyle Q} matrix above would be: P ( t ) = e Q t = ( p A A ( t ) p A C ( t ) p A G ( t ) p A T ( t ) p C A ( t ) p C C ( t ) p C G ( t ) p C T ( t ) p G A ( t ) p G C ( t ) p G G ( t ) p G T ( t ) p T A ( t ) p T C ( t ) p T G ( t ) p T T ( t ) ) {\displaystyle P(t)=e^{Qt}={\begin{pmatrix}p_{\mathrm {AA} }(t)&p_{\mathrm {AC} }(t)&p_{\mathrm {AG} }(t)&p_{\mathrm {AT} }(t)\\p_{\mathrm {CA} }(t)&p_{\mathrm {CC} }(t)&p_{\mathrm {CG} }(t)&p_{\mathrm {CT} }(t)\\p_{\mathrm {GA} }(t)&p_{\mathrm {GC} }(t)&p_{\mathrm {GG} }(t)&p_{\mathrm {GT} }(t)\\p_{\mathrm {TA} }(t)&p_{\mathrm {TC} }(t)&p_{\mathrm {TG} }(t)&p_{\mathrm {TT} }(t)\end{pmatrix}}} Some publications write

10120-576: The number of parameters, you count the number of entries above the diagonal in the matrix, i.e. for n trait values per site n 2 − n 2 {\displaystyle {{n^{2}-n} \over 2}} , and then add n-1 for the equilibrium frequencies, and subtract 1 because μ {\displaystyle \mu } is fixed. You get For example, for an amino acid sequence (there are 20 "standard" amino acids that make up proteins ), you would find there are 208 parameters. However, when studying coding regions of

10230-435: The ordering of the nucleotide subscripts for exchangeability parameters is irrelevant (e.g., r A C = r C A {\displaystyle r_{AC}=r_{CA}} ) but the transition probability matrix values are not (i.e., p A C ( t ) {\displaystyle p_{\mathrm {AC} }(t)} is the probability of observing A in sequence 1 and C in sequence 2 when

10340-442: The parameters of interest which are approximately valid for moderate to large sample sizes and which take account of the presence of nuisance parameters. See Basu (1977) for some general discussion and Spall and Garner (1990) for some discussion relative to the identification of parameters in linear dynamic (i.e., state space representation ) models. In Bayesian analysis , a generally applicable approach creates random samples from

10450-399: The position that maximum parsimony is based on a substitution model and (in many cases) they justify the use of parsimony using the philosophy of Karl Popper . However, the existence of "parsimony-equivalent" models (i.e., substitution models that yield the maximum parsimony tree when used for analyses) makes it possible to view parsimony as a substitution model. Typically, a branch length of

10560-404: The possibility of passing through a state with a premature stop codon. An alternative (and commonly used) way to write the instantaneous rate matrix ( Q {\displaystyle Q} matrix) for the nucleotide GTR model is: Q = ( − ( a π C + b π G + c π T )

10670-401: The presence of nuisance parameters. If the partition cannot be achieved it may still be possible to make use of an approximate partition. In some special cases, it is possible to formulate methods that circumvent the presences of nuisance parameters. The t-test provides a practically useful test because the test statistic does not depend on the unknown variance but only the sample variance. It

10780-788: The previous 50 years. The ruby-throated hummingbird population – the most populous North American hummingbird – decreased by 17% over the early 21st century. Habitat loss, glass collisions, cat predation, pesticides , and possibly climate change affecting food availability, migration signals, and breeding are factors that may contribute to declining hummingbird numbers. By contrast, Anna's hummingbirds had large population growth at an accelerating rate since 2010, and expanded their range northward to reside year-round in cold winter climates. Some species of sunbirds — an Old World group restricted in distribution to Eurasia , Africa, and Australia — resemble hummingbirds in appearance and behavior, but are not related to hummingbirds, as their resemblance

10890-493: The rate can be assumed to be correlated or not between ancestors and descendants and rate variation among lineages can be drawn from many distributions but usually exponential and lognormal distributions are applied. There is a special case, called “local molecular clock” when a phylogeny is divided into at least two partitions (sets of lineages) and a strict molecular clock is applied in each, but with different rates. Many useful substitution models are time-reversible ; in terms of

11000-796: The rest of the hummingbirds. The hummingbird family has the third-greatest number of species of any bird family (after the tyrant flycatchers and the tanagers ). Fossil hummingbirds are known from the Pleistocene of Brazil and the Bahamas , but neither has yet been scientifically described, and fossils and subfossils of a few extant species are known. Until recently, older fossils had not been securely identifiable as those of hummingbirds. In 2004, Gerald Mayr identified two 30-million-year-old hummingbird fossils. The fossils of this primitive hummingbird species, named Eurotrochilus inexpectatus ("unexpected European hummingbird"), had been sitting in

11110-485: The roughly 25 million years between the primitive Eurotrochilus and the modern fossils. The astounding morphological adaptations , the decrease in size, and the dispersal to the Americas and extinction in Eurasia all occurred during this timespan. DNA–DNA hybridization results suggest that the main radiation of South American hummingbirds took place at least partly in the Miocene , some 12 to 13 million years ago, during

11220-442: The shortness of their legs provides about 20% less propulsion than assessed in other birds. During flight, hummingbird feet are tucked up under the body, enabling optimal aerodynamics and maneuverability. Of those species that have been measured during flight, the top flight speeds of hummingbirds exceed 15 m/s (54 km/h; 34 mph). During courtship , some male species dive from 30 metres (100 ft) of height above

11330-522: The slope's value; see regression dilution . Nuisance parameters are often scale parameters , but not always; for example in errors-in-variables models , the unknown true location of each observation is a nuisance parameter. A parameter may also cease to be a "nuisance" if it becomes the object of study, is estimated from data, or known. The general treatment of nuisance parameters can be broadly similar between frequentist and Bayesian approaches to theoretical statistics. It relies on an attempt to partition

11440-556: The smallest of any bird – are incubated over 2–3 weeks in breeding season. Fed by regurgitation only from the mother, the chicks fledge about 3 weeks after hatching. The average lifespan of a ruby-throated hummingbird is estimated to be 3–5 years, with most deaths occurring in yearlings, although one banded ruby-throated hummingbird lived for 9 years and 2 months. Bee hummingbirds live 7–10 years. Although most hummingbird species live in remote habitats where their population numbers are difficult to assess, population studies in

11550-519: The sub-models of the GTR model, which simply correspond to cases where exchangeability and/or equilibrium base frequency parameters are constrained to take on equal values. A number of specific sub-models have been named, largely based on their original publications: There are 203 possible ways that the exchangeability parameters can be restricted to form sub-models of GTR, ranging from the JC69 and F81 models (where all exchangeability parameters are equal) to

11660-514: The timeframe for evolution. Other model parameters have been used to gain insights into various aspects of the process of evolution. The K a /K s ratio (also called ω in codon substitution models) is a parameter of interest in many studies. The K a /K s ratio can be used to examine the action of natural selection on protein-coding regions, it provides information about the relative rates of nucleotide substitutions that change amino acids (non-synonymous substitutions) to those that do not change

11770-539: The uplifting of the northern Andes . In 2013, a 50-million-year-old bird fossil unearthed in Wyoming was found to be a predecessor to hummingbirds and swifts before the groups diverged. Hummingbirds split from other members of Apodiformes, the insectivorous swifts (family Apodidae) and treeswifts (family Hemiprocnidae), about 42 million years ago, probably in Eurasia . Despite their current New World distribution,

11880-574: The widest diversity of plumage color, particularly in blues, greens, and purples. Hummingbirds are the smallest mature birds, measuring 7.5–13 cm (3–5 in) in length. The smallest is the 5 cm (2.0 in) bee hummingbird , which weighs less than 2.0 g (0.07 oz), and the largest is the 23 cm (9 in) giant hummingbird , weighing 18–24 grams (0.63–0.85 oz). Noted for long beaks , hummingbirds are specialized for feeding on flower nectar , but all species also consume small insects. They are known as hummingbirds because of

11990-400: Was a base i at that position originally goes to the equilibrium probability that there is base j at that position, regardless of the original base. Furthermore, it follows that π P ( t ) = π {\displaystyle \pi P(t)=\pi } for all t . The transition matrix can be computed from the rate matrix via matrix exponentiation : where Q

12100-462: Was proposed Jukes and Cantor in 1969. The Jukes-Cantor (JC or JC69) model assumes equal transition rates as well as equal equilibrium frequencies for all bases and it is the simplest sub-model of the GTR model. In 1980, Motoo Kimura introduced a model with two parameters (K2P or K80): one for the transition and one for the transversion rate. A year later, Kimura introduced a second model (K3ST, K3P, or K81) with three substitution types: one for

#961038