Misplaced Pages

Discrete optimized protein energy

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

In protein structure prediction , statistical potentials or knowledge-based potentials are scoring functions derived from an analysis of known protein structures in the Protein Data Bank (PDB).

#147852

81-396: DOPE , or D iscrete O ptimized P rotein E nergy, is a statistical potential used to assess homology models in protein structure prediction . DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. It

162-563: A i , a j ) P ( r i j ) {\displaystyle P\left(A\mid X\right)\approx \prod _{i<j}P\left(a_{i},a_{j}\mid r_{ij}\right)\propto \prod _{i<j}{\frac {P\left(r_{ij}\mid a_{i},a_{j}\right)}{P(r_{ij})}}} where the product runs over all amino acid pairs a i , a j {\displaystyle a_{i},a_{j}} (with i < j {\displaystyle i<j} ), and r i j {\displaystyle r_{ij}}

243-574: A i , a j ) = − k T ∑ i < j ln ⁡ P ( r i j ∣ a i , a j ) Q R ( r i j ∣ a i , a j ) {\displaystyle \Delta F_{\textrm {T}}=\sum _{i<j}\Delta F(r_{ij}\mid a_{i},a_{j})=-kT\sum _{i<j}\ln {\frac {P\left(r_{ij}\mid a_{i},a_{j}\right)}{Q_{R}\left(r_{ij}\mid a_{i},a_{j}\right)}}} where

324-484: A grand canonical ensemble , in which the system can exchange both heat and particles with the environment, at fixed temperature, volume, and chemical potential . Other types of partition functions can be defined for different circumstances; see partition function (mathematics) for generalizations. The partition function has many physical meanings, as discussed in Meaning and significance . Initially, let us assume that

405-454: A partition function describes the statistical properties of a system in thermodynamic equilibrium . Partition functions are functions of the thermodynamic state variables , such as the temperature and volume . Most of the aggregate thermodynamic variables of the system, such as the total energy , free energy , entropy , and pressure , can be expressed in terms of the partition function or its derivatives . The partition function

486-517: A database of known protein structures (obtained from the PDB ). Many textbooks present the statistical PMFs as proposed by Sippl as a simple consequence of the Boltzmann distribution , as applied to pairwise distances between amino acids. This is incorrect, but a useful start to introduce the construction of the potential in practice. The Boltzmann distribution applied to a specific pair of amino acids,

567-515: A method for calculating the expected values of many microscopic quantities. We add the quantity artificially to the microstate energies (or, in the language of quantum mechanics, to the Hamiltonian), calculate the new partition function and expected value, and then set λ to zero in the final expression. This is analogous to the source field method used in the path integral formulation of quantum field theory . In this section, we will state

648-413: A prior distribution based on new information on the probabilities of the elements of a partition on the support of the prior. From this point of view, (i) it is not necessary to assume that the database of protein structures—used to build the potentials—follows a Boltzmann distribution, (ii) statistical potentials generalize readily beyond pairwise differences, and (iii) the reference ratio is determined by

729-535: A rigorous definition of the reference state, which is implied by Q ( X ) {\displaystyle Q(X)} . Conventional applications of pairwise distance statistical PMFs usually lack two necessary features to make them fully rigorous: the use of a proper probability distribution over pairwise distances in proteins, and the recognition that the reference state is rigorously defined by Q ( X ) {\displaystyle Q(X)} . Statistical potentials are used as energy functions in

810-421: A second variable Y {\displaystyle Y} , with Y = f ( X ) {\displaystyle Y=f(X)} . Typically, X {\displaystyle X} and Y {\displaystyle Y} are fine and coarse grained variables, respectively. For example, Q ( X ) {\displaystyle Q(X)} could concern the local structure of

891-408: A statistical potential is formulated as an interaction matrix that assigns a weight or energy value to each possible pair of standard amino acids . The energy of a particular structural model is then the combined energy of all pairwise contacts (defined as two amino acids within a certain distance of each other) in the structure. The energies are determined using statistics on amino acid contacts in

SECTION 10

#1732765496148

972-408: A structure X {\displaystyle X} , given the amino acid sequence A {\displaystyle A} , can be written as: P ( X ∣ A ) {\displaystyle P(X\mid A)} is proportional to the product of the likelihood P ( A ∣ X ) {\displaystyle P\left(A\mid X\right)} times

1053-498: A system is subdivided into N sub-systems with negligible interaction energy, that is, we can assume the particles are essentially non-interacting. If the partition functions of the sub-systems are ζ 1 , ζ 2 , ..., ζ N , then the partition function of the entire system is the product of the individual partition functions: Z = ∏ j = 1 N ζ j . {\displaystyle Z=\prod _{j=1}^{N}\zeta _{j}.} If

1134-414: A thermodynamically large system is in thermal contact with the environment, with a temperature T , and both the volume of the system and the number of constituent particles are fixed. A collection of this kind of system comprises an ensemble called a canonical ensemble . The appropriate mathematical expression for the canonical partition function depends on the degrees of freedom of the system, whether

1215-438: A trace over the state space (which is independent of the choice of basis ): Z = tr ⁡ ( e − β H ^ ) , {\displaystyle Z=\operatorname {tr} (e^{-\beta {\hat {H}}}),} where Ĥ is the quantum Hamiltonian operator . The exponential of an operator can be defined using the exponential power series . The classical form of Z

1296-568: A valine and a serine at a given distance r {\displaystyle r} from each other, giving rise to the free energy difference Δ F {\displaystyle \Delta F} . The total free energy difference of a protein, Δ F T {\displaystyle \Delta F_{\textrm {T}}} , is then claimed to be the sum of all the pairwise free energies: Δ F T = ∑ i < j Δ F ( r i j ∣

1377-411: Is C v = ∂ ⟨ E ⟩ ∂ T = 1 k B T 2 ⟨ ( Δ E ) 2 ⟩ . {\displaystyle C_{v}={\frac {\partial \langle E\rangle }{\partial T}}={\frac {1}{k_{\text{B}}T^{2}}}\langle (\Delta E)^{2}\rangle .} In general, consider

1458-764: Is a normalised Gaussian wavepacket centered at position x and momentum p . Thus Z = ∫ tr ⁡ ( e − β H ^ | x , p ⟩ ⟨ x , p | ) d x d p h = ∫ ⟨ x , p | e − β H ^ | x , p ⟩ d x d p h . {\displaystyle Z=\int \operatorname {tr} \left(e^{-\beta {\hat {H}}}|x,p\rangle \langle x,p|\right){\frac {dx\,dp}{h}}=\int \langle x,p|e^{-\beta {\hat {H}}}|x,p\rangle {\frac {dx\,dp}{h}}.} A coherent state

1539-403: Is a stub . You can help Misplaced Pages by expanding it . This bioinformatics-related article is a stub . You can help Misplaced Pages by expanding it . Statistical potential The original method to obtain such potentials is the quasi-chemical approximation , due to Miyazawa and Jernigan. It was later followed by the potential of mean force (statistical PMF ), developed by Sippl. Although

1620-2485: Is a number defined as the canonical ensemble partition function : Z ≡ ∑ i exp ⁡ ( − λ 2 k B E i ) . {\displaystyle Z\equiv \sum _{i}\exp \left(-{\frac {\lambda _{2}}{k_{\text{B}}}}E_{i}\right).} Isolating for λ 1 {\displaystyle \lambda _{1}} yields λ 1 = k B ln ⁡ ( Z ) − k B {\displaystyle \lambda _{1}=k_{\text{B}}\ln(Z)-k_{\text{B}}} . Rewriting ρ i {\displaystyle \rho _{i}} in terms of Z {\displaystyle Z} gives ρ i = 1 Z exp ⁡ ( − λ 2 k B E i ) . {\displaystyle \rho _{i}={\frac {1}{Z}}\exp \left(-{\frac {\lambda _{2}}{k_{\text{B}}}}E_{i}\right).} Rewriting S {\displaystyle S} in terms of Z {\displaystyle Z} gives S = − k B ∑ i ρ i ln ⁡ ρ i = − k B ∑ i ρ i ( − λ 2 k B E i − ln ⁡ ( Z ) ) = λ 2 ∑ i ρ i E i + k B ln ⁡ ( Z ) ∑ i ρ i = λ 2 U + k B ln ⁡ ( Z ) . {\displaystyle {\begin{aligned}S&=-k_{\text{B}}\sum _{i}\rho _{i}\ln \rho _{i}\\&=-k_{\text{B}}\sum _{i}\rho _{i}\left(-{\frac {\lambda _{2}}{k_{\text{B}}}}E_{i}-\ln(Z)\right)\\&=\lambda _{2}\sum _{i}\rho _{i}E_{i}+k_{\text{B}}\ln(Z)\sum _{i}\rho _{i}\\&=\lambda _{2}U+k_{\text{B}}\ln(Z).\end{aligned}}} To obtain λ 2 {\displaystyle \lambda _{2}} , we differentiate S {\displaystyle S} with respect to

1701-497: Is an approximate eigenstate of both operators x ^ {\displaystyle {\hat {x}}} and p ^ {\displaystyle {\hat {p}}} , hence also of the Hamiltonian Ĥ , with errors of the size of the uncertainties. If Δ x and Δ p can be regarded as zero, the action of Ĥ reduces to multiplication by the classical Hamiltonian, and Z reduces to

SECTION 20

#1732765496148

1782-643: Is defined as the thermodynamic beta . Finally, the probability distribution ρ i {\displaystyle \rho _{i}} and entropy S {\displaystyle S} are respectively ρ i = 1 Z e − β E i , S = U T + k B ln ⁡ Z . {\displaystyle {\begin{aligned}\rho _{i}&={\frac {1}{Z}}e^{-\beta E_{i}},\\S&={\frac {U}{T}}+k_{\text{B}}\ln Z.\end{aligned}}} In classical mechanics ,

1863-490: Is dimensionless. Each partition function is constructed to represent a particular statistical ensemble (which, in turn, corresponds to a particular free energy ). The most common statistical ensembles have named partition functions. The canonical partition function applies to a canonical ensemble , in which the system is allowed to exchange heat with the environment at fixed temperature, volume, and number of particles . The grand canonical partition function applies to

1944-401: Is estimated from the database of known protein structures, while Q R ( r ) {\displaystyle Q_{R}(r)} typically results from calculations or simulations. For example, P ( r ) {\displaystyle P(r)} could be the conditional probability of finding the C β {\displaystyle C\beta } atoms of

2025-417: Is given by: where r {\displaystyle r} is the distance, k {\displaystyle k} is the Boltzmann constant , T {\displaystyle T} is the temperature and Z {\displaystyle Z} is the partition function , with The quantity F ( r ) {\displaystyle F(r)} is the free energy assigned to

2106-462: Is implemented in the popular homology modeling program MODELLER and used to assess the energy of the protein model generated through many iterations by MODELLER, which produces homology models by the satisfaction of spatial restraints. The models returning the minimum molpdfs can be chosen as best probable structures and can be further used for evaluating with the DOPE score. Like the current version of

2187-457: Is known as the Gibbs paradox . It may not be obvious why the partition function, as we have defined it above, is an important quantity. First, consider what goes into it. The partition function is a function of the temperature T and the microstate energies E 1 , E 2 , E 3 , etc. The microstate energies are determined by other thermodynamic variables, such as the number of particles and

2268-420: Is obtained from the set of known protein structures, as explained in the previous section. However, as Ben-Naim wrote in a publication on the subject: [...] the quantities, referred to as "statistical potentials," "structure based potentials," or "pair potentials of mean force", as derived from the protein data bank (PDB), are neither "potentials" nor "potentials of mean force," in the ordinary sense as used in

2349-512: Is otherwise known as the Boltzmann factor . There are multiple approaches to deriving the partition function. The following derivation follows the more powerful and general information-theoretic Jaynesian maximum entropy approach. According to the second law of thermodynamics , a system assumes a configuration of maximum entropy at thermodynamic equilibrium . We seek a probability distribution of states ρ i {\displaystyle \rho _{i}} that maximizes

2430-523: Is quantum mechanical and discrete, the canonical partition function is defined as the trace of the Boltzmann factor: Z = tr ⁡ ( e − β H ^ ) , {\displaystyle Z=\operatorname {tr} (e^{-\beta {\hat {H}}}),} where: The dimension of e − β H ^ {\displaystyle e^{-\beta {\hat {H}}}}

2511-573: Is recovered when the trace is expressed in terms of coherent states and when quantum-mechanical uncertainties in the position and momentum of a particle are regarded as negligible. Formally, using bra–ket notation , one inserts under the trace for each degree of freedom the identity: 1 = ∫ | x , p ⟩ ⟨ x , p | d x d p h , {\displaystyle {\boldsymbol {1}}=\int |x,p\rangle \langle x,p|{\frac {dx\,dp}{h}},} where | x , p ⟩

Discrete optimized protein energy - Misplaced Pages Continue

2592-419: Is related to g ( r ) {\displaystyle g(r)} by: According to the reversible work theorem, the two-particle potential of mean force W ( r ) {\displaystyle W(r)} is the reversible work required to bring two particles in the liquid from infinite separation to a distance r {\displaystyle r} from each other. Sippl justified

2673-1102: Is some quantity with units of action (usually taken to be the Planck constant ). For a gas of N {\displaystyle N} identical classical noninteracting particles in three dimensions, the partition function is Z = 1 N ! h 3 N ∫ exp ⁡ ( − β ∑ i = 1 N H ( q i , p i ) ) d 3 q 1 ⋯ d 3 q N d 3 p 1 ⋯ d 3 p N = Z single N N ! {\displaystyle Z={\frac {1}{N!h^{3N}}}\int \,\exp \left(-\beta \sum _{i=1}^{N}H({\textbf {q}}_{i},{\textbf {p}}_{i})\right)\;\mathrm {d} ^{3}q_{1}\cdots \mathrm {d} ^{3}q_{N}\,\mathrm {d} ^{3}p_{1}\cdots \mathrm {d} ^{3}p_{N}={\frac {Z_{\text{single}}^{N}}{N!}}} where The reason for

2754-799: Is the Helmholtz free energy defined as A = U − TS , where U = ⟨ E ⟩ is the total energy and S is the entropy , so that A = ⟨ E ⟩ − T S = − k B T ln ⁡ Z . {\displaystyle A=\langle E\rangle -TS=-k_{\text{B}}T\ln Z.} Furthermore, the heat capacity can be expressed as C v = T ∂ S ∂ T = − T ∂ 2 A ∂ T 2 . {\displaystyle C_{\text{v}}=T{\frac {\partial S}{\partial T}}=-T{\frac {\partial ^{2}A}{\partial T^{2}}}.} Suppose

2835-418: Is the degeneracy factor, or number of quantum states s that have the same energy level defined by E j = E s . The above treatment applies to quantum statistical mechanics , where a physical system inside a finite-sized box will typically have a discrete set of energy eigenstates, which we can use as the states s above. In quantum mechanics, the partition function can be more formally written as

2916-410: Is the distance between amino acids i {\displaystyle i} and j {\displaystyle j} . Obviously, the negative of the logarithm of the expression has the same functional form as the classic pairwise distance statistical PMFs, with the denominator playing the role of the reference state. This explanation has two shortcomings: it relies on the unfounded assumption

2997-589: Is the number of energy eigenstates of the system. For a canonical ensemble that is quantum mechanical and continuous, the canonical partition function is defined as Z = 1 h ∫ ⟨ q , p | e − β H ^ | q , p ⟩ d q d p , {\displaystyle Z={\frac {1}{h}}\int \langle q,p|e^{-\beta {\hat {H}}}|q,p\rangle \,\mathrm {d} q\,\mathrm {d} p,} where: In systems with multiple quantum states s sharing

3078-429: The amino acid sequence . Intuitively, it is clear that a low value for Δ F T {\displaystyle \Delta F_{\textrm {T}}} indicates that the set of distances in a structure is more likely in proteins than in the reference state. However, the physical meaning of these statistical PMFs has been widely disputed, since their introduction. The main issues are: In response to

3159-487: The extensive variable X and intensive variable Y where X and Y form a pair of conjugate variables . In ensembles where Y is fixed (and X is allowed to fluctuate), then the average value of X will be: ⟨ X ⟩ = ± ∂ ln ⁡ Z ∂ β Y . {\displaystyle \langle X\rangle =\pm {\frac {\partial \ln Z}{\partial \beta Y}}.} The sign will depend on

3240-455: The factorial factor N ! is discussed below . The extra constant factor introduced in the denominator was introduced because, unlike the discrete form, the continuous form shown above is not dimensionless . As stated in the previous section, to make it into a dimensionless quantity, we must divide it by h (where h is usually taken to be the Planck constant). For a canonical ensemble that

3321-727: The fundamental postulate of statistical mechanics (which states that all attainable microstates of a system are equally probable), the probability p i will be inversely proportional to the number of microstates of the total closed system ( S , B ) in which S is in microstate i with energy E i . Equivalently, p i will be proportional to the number of microstates of the heat bath B with energy E − E i : p i = Ω B ( E − E i ) Ω ( S , B ) ( E ) . {\displaystyle p_{i}={\frac {\Omega _{B}(E-E_{i})}{\Omega _{(S,B)}(E)}}.} Assuming that

Discrete optimized protein energy - Misplaced Pages Continue

3402-406: The position and momentum variables of a particle can vary continuously, so the set of microstates is actually uncountable . In classical statistical mechanics, it is rather inaccurate to express the partition function as a sum of discrete terms. In this case we must describe the partition function using an integral rather than a sum. For a canonical ensemble that is classical and continuous,

3483-562: The prior P ( X ) {\displaystyle P\left(X\right)} . By assuming that the likelihood can be approximated as a product of pairwise probabilities, and applying Bayes' theorem , the likelihood can be written as: P ( A ∣ X ) ≈ ∏ i < j P ( a i , a j ∣ r i j ) ∝ ∏ i < j P ( r i j ∣

3564-5122: The Lagrangian (or Lagrange function) L {\displaystyle {\mathcal {L}}} as L = ( − k B ∑ i ρ i ln ⁡ ρ i ) + λ 1 ( 1 − ∑ i ρ i ) + λ 2 ( U − ∑ i ρ i E i ) . {\displaystyle {\mathcal {L}}=\left(-k_{\text{B}}\sum _{i}\rho _{i}\ln \rho _{i}\right)+\lambda _{1}\left(1-\sum _{i}\rho _{i}\right)+\lambda _{2}\left(U-\sum _{i}\rho _{i}E_{i}\right).} Varying and extremizing L {\displaystyle {\mathcal {L}}} with respect to ρ i {\displaystyle \rho _{i}} leads to 0 ≡ δ L = δ ( − ∑ i k B ρ i ln ⁡ ρ i ) + δ ( λ 1 − ∑ i λ 1 ρ i ) + δ ( λ 2 U − ∑ i λ 2 ρ i E i ) = ∑ i [ δ ( − k B ρ i ln ⁡ ρ i ) − δ ( λ 1 ρ i ) − δ ( λ 2 E i ρ i ) ] = ∑ i [ ∂ ∂ ρ i ( − k B ρ i ln ⁡ ρ i ) δ ( ρ i ) − ∂ ∂ ρ i ( λ 1 ρ i ) δ ( ρ i ) − ∂ ∂ ρ i ( λ 2 E i ρ i ) δ ( ρ i ) ] = ∑ i [ − k B ln ⁡ ρ i − k B − λ 1 − λ 2 E i ] δ ( ρ i ) . {\displaystyle {\begin{aligned}0&\equiv \delta {\mathcal {L}}\\&=\delta \left(-\sum _{i}k_{\text{B}}\rho _{i}\ln \rho _{i}\right)+\delta \left(\lambda _{1}-\sum _{i}\lambda _{1}\rho _{i}\right)+\delta \left(\lambda _{2}U-\sum _{i}\lambda _{2}\rho _{i}E_{i}\right)\\&=\sum _{i}{\bigg [}\delta {\Big (}-k_{\text{B}}\rho _{i}\ln \rho _{i}{\Big )}-\delta {\Big (}\lambda _{1}\rho _{i}{\Big )}-\delta {\Big (}\lambda _{2}E_{i}\rho _{i}{\Big )}{\bigg ]}\\&=\sum _{i}\left[{\frac {\partial }{\partial \rho _{i}}}{\Big (}-k_{\text{B}}\rho _{i}\ln \rho _{i}{\Big )}\,\delta (\rho _{i})-{\frac {\partial }{\partial \rho _{i}}}{\Big (}\lambda _{1}\rho _{i}{\Big )}\,\delta (\rho _{i})-{\frac {\partial }{\partial \rho _{i}}}{\Big (}\lambda _{2}E_{i}\rho _{i}{\Big )}\,\delta (\rho _{i})\right]\\&=\sum _{i}{\bigg [}-k_{\text{B}}\ln \rho _{i}-k_{\text{B}}-\lambda _{1}-\lambda _{2}E_{i}{\bigg ]}\,\delta (\rho _{i}).\end{aligned}}} Since this equation should hold for any variation δ ( ρ i ) {\displaystyle \delta (\rho _{i})} , it implies that 0 ≡ − k B ln ⁡ ρ i − k B − λ 1 − λ 2 E i . {\displaystyle 0\equiv -k_{\text{B}}\ln \rho _{i}-k_{\text{B}}-\lambda _{1}-\lambda _{2}E_{i}.} Isolating for ρ i {\displaystyle \rho _{i}} yields ρ i = exp ⁡ ( − k B − λ 1 − λ 2 E i k B ) . {\displaystyle \rho _{i}=\exp \left({\frac {-k_{\text{B}}-\lambda _{1}-\lambda _{2}E_{i}}{k_{\text{B}}}}\right).} To obtain λ 1 {\displaystyle \lambda _{1}} , one substitutes

3645-517: The MODELLER software, DOPE is implemented in Python and is run within the MODELLER environment. The DOPE method is generally used to assess the quality of a structure model as a whole. Alternatively, DOPE can also generate a residue-by-residue energy profile for the input model, making it possible for the user to spot the problematic region in the structure model. This protein -related article

3726-470: The PMF. Typically, Q ( X ) {\displaystyle Q(X)} is brought in by sampling (typically from a fragment library), and not explicitly evaluated; the ratio, which in contrast is explicitly evaluated, corresponds to Sippl's PMF. This explanation is quantitive, and allows the generalization of statistical PMFs from pairwise distances to arbitrary coarse grained variables. It also provides

3807-481: The additional contributions to this derivative cancel each other.) Thus the canonical partition function Z {\displaystyle Z} becomes Z ≡ ∑ i e − β E i , {\displaystyle Z\equiv \sum _{i}e^{-\beta E_{i}},} where β ≡ 1 / ( k B T ) {\displaystyle \beta \equiv 1/(k_{\text{B}}T)}

3888-487: The assessment of an ensemble of structural models produced by homology modeling or protein threading . Many differently parameterized statistical potentials have been shown to successfully identify the native state structure from an ensemble of decoy or non-native structures. Statistical potentials are not only used for protein structure prediction , but also for modelling the protein folding pathway. Partition function (statistical mechanics) In physics ,

3969-632: The average energy U {\displaystyle U} and apply the first law of thermodynamics , d U = T d S − P d V {\displaystyle dU=TdS-PdV} : d S d U = λ 2 ≡ 1 T . {\displaystyle {\frac {dS}{dU}}=\lambda _{2}\equiv {\frac {1}{T}}.} (Note that λ 2 {\displaystyle \lambda _{2}} and Z {\displaystyle Z} vary with U {\displaystyle U} as well; however, using

4050-434: The canonical partition function is defined as Z = 1 h 3 ∫ e − β H ( q , p ) d 3 q d 3 p , {\displaystyle Z={\frac {1}{h^{3}}}\int e^{-\beta H(q,p)}\,\mathrm {d} ^{3}q\,\mathrm {d} ^{3}p,} where To make it into a dimensionless quantity, we must divide it by h , which

4131-440: The chain rule and d d λ 2 ln ⁡ ( Z ) = − 1 k B ∑ i ρ i E i = − U k B , {\displaystyle {\frac {d}{d\lambda _{2}}}\ln(Z)=-{\frac {1}{k_{\text{B}}}}\sum _{i}\rho _{i}E_{i}=-{\frac {U}{k_{\text{B}}}},} one can show that

SECTION 50

#1732765496148

4212-423: The classical configuration integral. For simplicity, we will use the discrete form of the partition function in this section. Our results will apply equally well to the continuous form. Consider a system S embedded into a heat bath B . Let the total energy of both systems be E . Let p i denote the probability that the system S is in a particular microstate , i , with energy E i . According to

4293-536: The context is classical mechanics or quantum mechanics , and whether the spectrum of states is discrete or continuous . For a canonical ensemble that is classical and discrete, the canonical partition function is defined as Z = ∑ i e − β E i , {\displaystyle Z=\sum _{i}e^{-\beta E_{i}},} where The exponential factor e − β E i {\displaystyle e^{-\beta E_{i}}}

4374-415: The discrete Gibbs entropy S = − k B ∑ i ρ i ln ⁡ ρ i {\displaystyle S=-k_{\text{B}}\sum _{i}\rho _{i}\ln \rho _{i}} subject to two physical constraints: Applying variational calculus with constraints (analogous in some sense to the method of Lagrange multipliers ), we write

4455-671: The energy (or "energy fluctuation") is ⟨ ( Δ E ) 2 ⟩ ≡ ⟨ ( E − ⟨ E ⟩ ) 2 ⟩ = ⟨ E 2 ⟩ − ⟨ E ⟩ 2 = ∂ 2 ln ⁡ Z ∂ β 2 . {\displaystyle \langle (\Delta E)^{2}\rangle \equiv \langle (E-\langle E\rangle )^{2}\rangle =\langle E^{2}\rangle -\langle E\rangle ^{2}={\frac {\partial ^{2}\ln Z}{\partial \beta ^{2}}}.} The heat capacity

4536-1912: The entropy and temperature of the bath respectively: k ln ⁡ p i = k ln ⁡ Ω B ( E − E i ) − k ln ⁡ Ω ( S , B ) ( E ) ≈ − ∂ ( k ln ⁡ Ω B ( E ) ) ∂ E E i + k ln ⁡ Ω B ( E ) − k ln ⁡ Ω ( S , B ) ( E ) ≈ − ∂ S B ∂ E E i + k ln ⁡ Ω B ( E ) Ω ( S , B ) ( E ) ≈ − E i T + k ln ⁡ Ω B ( E ) Ω ( S , B ) ( E ) {\displaystyle {\begin{aligned}k\ln p_{i}&=k\ln \Omega _{B}(E-E_{i})-k\ln \Omega _{(S,B)}(E)\\[5pt]&\approx -{\frac {\partial {\big (}k\ln \Omega _{B}(E){\big )}}{\partial E}}E_{i}+k\ln \Omega _{B}(E)-k\ln \Omega _{(S,B)}(E)\\[5pt]&\approx -{\frac {\partial S_{B}}{\partial E}}E_{i}+k\ln {\frac {\Omega _{B}(E)}{\Omega _{(S,B)}(E)}}\\[5pt]&\approx -{\frac {E_{i}}{T}}+k\ln {\frac {\Omega _{B}(E)}{\Omega _{(S,B)}(E)}}\end{aligned}}} Thus p i ∝ e − E i / ( k T ) = e − β E i . {\displaystyle p_{i}\propto e^{-E_{i}/(kT)}=e^{-\beta E_{i}}.} Since

4617-421: The following free energy difference: The reference state typically results from a hypothetical system in which the specific interactions between the amino acids are absent. The second term involving Z {\displaystyle Z} and Z R {\displaystyle Z_{R}} can be ignored, as it is a constant. In practice, P ( r ) {\displaystyle P(r)}

4698-528: The heat bath's internal energy is much larger than the energy of S ( E ≫ E i ), we can Taylor-expand Ω B {\displaystyle \Omega _{B}} to first order in E i and use the thermodynamic relation ∂ S B / ∂ E = 1 / T {\displaystyle \partial S_{B}/\partial E=1/T} , where here S B {\displaystyle S_{B}} , T {\displaystyle T} are

4779-515: The issue regarding the physical validity, the first justification of statistical PMFs was attempted by Sippl. It was based on an analogy with the statistical physics of liquids. For liquids, the potential of mean force is related to the radial distribution function g ( r ) {\displaystyle g(r)} , which is given by: where P ( r ) {\displaystyle P(r)} and Q R ( r ) {\displaystyle Q_{R}(r)} are

4860-430: The likelihood can be expressed as a product of pairwise probabilities, and it is purely qualitative . Hamelryck and co-workers later gave a quantitative explanation for the statistical potentials, according to which they approximate a form of probabilistic reasoning due to Richard Jeffrey and named probability kinematics . This variant of Bayesian thinking (sometimes called " Jeffrey conditioning ") allows updating

4941-468: The literature on liquids and solutions. Moreover, this analogy does not solve the issue of how to specify a suitable reference state for proteins. In the mid-2000s, authors started to combine multiple statistical potentials, derived from different structural features, into composite scores . For that purpose, they used machine learning techniques, such as support vector machines (SVMs). Probabilistic neural networks (PNNs) have also been applied for

SECTION 60

#1732765496148

5022-756: The microstate energies depend on a parameter λ in the manner E s = E s ( 0 ) + λ A s for all s {\displaystyle E_{s}=E_{s}^{(0)}+\lambda A_{s}\qquad {\text{for all}}\;s} then the expected value of A is ⟨ A ⟩ = ∑ s A s P s = − 1 β ∂ ∂ λ ln ⁡ Z ( β , λ ) . {\displaystyle \langle A\rangle =\sum _{s}A_{s}P_{s}=-{\frac {1}{\beta }}{\frac {\partial }{\partial \lambda }}\ln Z(\beta ,\lambda ).} This provides us with

5103-415: The most accurate structure for 25 out of 43 free modelling domains . Baker and co-workers justified statistical PMFs from a Bayesian point of view and used these insights in the construction of the coarse grained ROSETTA energy function. According to Bayesian probability calculus, the conditional probability P ( X ∣ A ) {\displaystyle P(X\mid A)} of

5184-550: The obtained scores are often considered as approximations of the free energy —thus referred to as pseudo-energies —this physical interpretation is incorrect. Nonetheless, they are applied with success in many cases, because they frequently correlate with actual Gibbs free energy differences. Possible features to which a pseudo-energy can be assigned include: The classic application is, however, based on pairwise amino acid contacts or distances, thus producing statistical interatomic potentials . For pairwise amino acid contacts,

5265-419: The pairwise distances will be distributed according to P ( Y ) {\displaystyle P(Y)} , the following expression is needed: where Q ( Y ) {\displaystyle Q(Y)} is the distribution over Y {\displaystyle Y} implied by Q ( X ) {\displaystyle Q(X)} . The ratio in the expression corresponds to

5346-533: The pairwise system. Simple rearrangement results in the inverse Boltzmann formula , which expresses the free energy F ( r ) {\displaystyle F(r)} as a function of P ( r ) {\displaystyle P(r)} : To construct a PMF, one then introduces a so-called reference state with a corresponding distribution Q R {\displaystyle Q_{R}} and partition function Z R {\displaystyle Z_{R}} , and calculates

5427-491: The prior distribution. Expressions that resemble statistical PMFs naturally result from the application of probability theory to solve a fundamental problem that arises in protein structure prediction: how to improve an imperfect probability distribution Q ( X ) {\displaystyle Q(X)} over a first variable X {\displaystyle X} using a probability distribution P ( Y ) {\displaystyle P(Y)} over

5508-508: The probability into the first constraint: 1 = ∑ i ρ i = exp ⁡ ( − k B − λ 1 k B ) Z , {\displaystyle {\begin{aligned}1&=\sum _{i}\rho _{i}\\&=\exp \left({\frac {-k_{\text{B}}-\lambda _{1}}{k_{\text{B}}}}\right)Z,\end{aligned}}} where Z {\displaystyle Z}

5589-506: The protein, while P ( Y ) {\displaystyle P(Y)} could concern the pairwise distances between the amino acids. In that case, X {\displaystyle X} could for example be a vector of dihedral angles that specifies all atom positions (assuming ideal bond lengths and angles). In order to combine the two distributions, such that the local structure will be distributed according to Q ( X ) {\displaystyle Q(X)} , while

5670-523: The relationships between the partition function and the various thermodynamic parameters of the system. These results can be derived using the method of the previous section and the various thermodynamic relations. As we have already seen, the thermodynamic energy is ⟨ E ⟩ = − ∂ ln ⁡ Z ∂ β . {\displaystyle \langle E\rangle =-{\frac {\partial \ln Z}{\partial \beta }}.} The variance in

5751-394: The respective probabilities of finding two particles at a distance r {\displaystyle r} from each other in the liquid and in the reference state. For liquids, the reference state is clearly defined; it corresponds to the ideal gas, consisting of non-interacting particles. The two-particle potential of mean force W ( r ) {\displaystyle W(r)}

5832-467: The same energy E s , it is said that the energy levels of the system are degenerate . In the case of degenerate energy levels, we can write the partition function in terms of the contribution from energy levels (indexed by j ) as follows: Z = ∑ j g j ⋅ e − β E j , {\displaystyle Z=\sum _{j}g_{j}\cdot e^{-\beta E_{j}},} where g j

5913-683: The special case of entropy , entropy is given by S ≡ − k B ∑ s P s ln ⁡ P s = k B ( ln ⁡ Z + β ⟨ E ⟩ ) = ∂ ∂ T ( k B T ln ⁡ Z ) = − ∂ A ∂ T {\displaystyle S\equiv -k_{\text{B}}\sum _{s}P_{s}\ln P_{s}=k_{\text{B}}(\ln Z+\beta \langle E\rangle )={\frac {\partial }{\partial T}}(k_{\text{B}}T\ln Z)=-{\frac {\partial A}{\partial T}}} where A

5994-784: The specific definitions of the variables X and Y . An example would be X = volume and Y = pressure. Additionally, the variance in X will be ⟨ ( Δ X ) 2 ⟩ ≡ ⟨ ( X − ⟨ X ⟩ ) 2 ⟩ = ∂ ⟨ X ⟩ ∂ β Y = ∂ 2 ln ⁡ Z ∂ ( β Y ) 2 . {\displaystyle \langle (\Delta X)^{2}\rangle \equiv \langle (X-\langle X\rangle )^{2}\rangle ={\frac {\partial \langle X\rangle }{\partial \beta Y}}={\frac {\partial ^{2}\ln Z}{\partial (\beta Y)^{2}}}.} In

6075-438: The sub-systems have the same physical properties, then their partition functions are equal, ζ 1 = ζ 2 = ... = ζ , in which case Z = ζ N . {\displaystyle Z=\zeta ^{N}.} However, there is a well-known exception to this rule. If the sub-systems are actually identical particles , in the quantum mechanical sense that they are impossible to distinguish even in principle,

6156-405: The sum runs over all amino acid pairs a i , a j {\displaystyle a_{i},a_{j}} (with i < j {\displaystyle i<j} ) and r i j {\displaystyle r_{ij}} is their corresponding distance. In many studies Q R {\displaystyle Q_{R}} does not depend on

6237-429: The total partition function must be divided by a N ! ( N   factorial ): Z = ζ N N ! . {\displaystyle Z={\frac {\zeta ^{N}}{N!}}.} This is to ensure that we do not "over-count" the number of microstates. While this may seem like a strange requirement, it is actually necessary to preserve the existence of a thermodynamic limit for such systems. This

6318-620: The total probability to find the system in some microstate (the sum of all p i ) must be equal to 1, we know that the constant of proportionality must be the normalization constant , and so, we can define the partition function to be this constant: Z = ∑ i e − β E i = Ω ( S , B ) ( E ) Ω B ( E ) . {\displaystyle Z=\sum _{i}e^{-\beta E_{i}}={\frac {\Omega _{(S,B)}(E)}{\Omega _{B}(E)}}.} In order to demonstrate

6399-551: The training of a position-specific distance-dependent statistical potential. In 2016, the DeepMind artificial intelligence research laboratory started to apply deep learning techniques to the development of a torsion- and distance-dependent statistical potential. The resulting method, named AlphaFold , won the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP) by correctly predicting

6480-394: The use of statistical PMFs—a few years after he introduced them for use in protein structure prediction—by appealing to the analogy with the reversible work theorem for liquids. For liquids, g ( r ) {\displaystyle g(r)} can be experimentally measured using small angle X-ray scattering ; for proteins, P ( r ) {\displaystyle P(r)}

6561-1315: The usefulness of the partition function, let us calculate the thermodynamic value of the total energy. This is simply the expected value , or ensemble average for the energy, which is the sum of the microstate energies weighted by their probabilities: ⟨ E ⟩ = ∑ s E s P s = 1 Z ∑ s E s e − β E s = − 1 Z ∂ ∂ β Z ( β , E 1 , E 2 , ⋯ ) = − ∂ ln ⁡ Z ∂ β {\displaystyle \langle E\rangle =\sum _{s}E_{s}P_{s}={\frac {1}{Z}}\sum _{s}E_{s}e^{-\beta E_{s}}=-{\frac {1}{Z}}{\frac {\partial }{\partial \beta }}Z(\beta ,E_{1},E_{2},\cdots )=-{\frac {\partial \ln Z}{\partial \beta }}} or, equivalently, ⟨ E ⟩ = k B T 2 ∂ ln ⁡ Z ∂ T . {\displaystyle \langle E\rangle =k_{\text{B}}T^{2}{\frac {\partial \ln Z}{\partial T}}.} Incidentally, one should note that if

#147852