The CATH Protein Structure Classification database is a free, publicly available online resource that provides information on the evolutionary relationships of protein domains . It was created in the mid-1990s by Professor Christine Orengo and colleagues including Janet Thornton and David Jones , and continues to be developed by the Orengo group at University College London . CATH shares many broad features with the SCOP resource, however there are also many areas in which the detailed classification differs greatly.
20-653: Experimentally determined protein three-dimensional structures are obtained from the Protein Data Bank and split into their consecutive polypeptide chains , where applicable. Protein domains are identified within these chains using a mixture of automatic methods and manual curation. The domains are then classified within the CATH structural hierarchy: at the Class (C) level, domains are assigned according to their secondary structure content, i.e. all alpha , all beta ,
40-606: A chemical crystallography group within the Department of Chemistry, University of Cambridge . In 1965 she founded the CCDC and established the associated Cambridge Structural Database . At that time, there were only about 3,000 published X-ray structures , and the work involved converting these into a machine-readable form. Kennard invited Frank Allen to join the group, which he did in 1970, becoming Scientific Director and then Executive Director before retiring in 2008. In 1992,
60-588: A mixture of alpha and beta, or little secondary structure; at the Architecture (A) level, information on the secondary structure arrangement in three-dimensional space is used for assignment; at the Topology/fold (T) level, information on how the secondary structure elements are connected and arranged is used; assignments are made to the Homologous superfamily (H) level if there is good evidence that
80-558: Is a non-profit organisation based in Cambridge , England. Its primary activity is the compilation and maintenance of the Cambridge Structural Database , a database of small molecule crystal structures . They also perform analysis on the database for the benefit of the scientific community, and write and distribute computer software to allow others to do the same. In 1962, Dr. Olga Kennard OBE FRS set up
100-481: Is overseen by the Worldwide Protein Data Bank (wwPDB). These structural data are obtained and deposited by biologists and biochemists worldwide through the use of experimental methodologies such as X-ray crystallography , NMR spectroscopy , and, increasingly, cryo-electron microscopy . All submitted data are reviewed by expert biocurators and, once approved, are made freely available on
120-589: The "macromolecular Crystallographic Information file" format, mmCIF, which is an extension of the CIF format was phased in. mmCIF became the standard format for the PDB archive in 2014. In 2019, the wwPDB announced that depositions for crystallographic methods would only be accepted in mmCIF format. An XML version of PDB, called PDBML, was described in 2005. The structure files can be downloaded in any of these three formats, though an increasing number of structures do not fit
140-600: The CCDC moved into its own building adjacent to the Cambridge chemistry department. This new headquarters was designed by the Danish architect Professor Erik Christian Sørensen and won The Sunday Times Building of the Year Award in 1993. The CCDC still retains very close links as a University Partner Institution that trains students for postgraduate research degrees but from 1987 became an independent company. By 2019
160-582: The Internet under the CC0 Public Domain Dedication. Global access to the data is provided by the websites of the wwPDB member organisations (PDBe, PDBj, RCSB PDB, and BMRB ). The PDB is a key in areas of structural biology , such as structural genomics . Most major scientific journals and some funding agencies now require scientists to submit their structure data to the PDB. Many other databases use protein structures deposited in
180-488: The PDB became an international organization. The founding members are PDBe (Europe), RCSB (US), and PDBj (Japan). The BMRB joined in 2006. Each of the four members of wwPDB can act as deposition, data processing and distribution centers for PDB data. The data processing refers to the fact that wwPDB staff review and annotate each submitted entry. The data are then automatically checked for plausibility (the source code for this validation software has been made available to
200-619: The PDB. For example, SCOP and CATH classify protein structures, while PDBsum provides a graphic overview of PDB entries using information from other sources, such as Gene Ontology . Two forces converged to initiate the PDB: a small but growing collection of sets of protein structure data determined by X-ray diffraction; and the newly available (1968) molecular graphics display, the Brookhaven RAster Display (BRAD), to visualize these protein structures in 3-D. In 1969, with
220-556: The database had grown to over a million structures. The staff at the CCDC curate the database of small-molecule organic and metal-organic crystal structures and make these available for download by the public. They also create and maintain a suite of cheminformatics software that may be used to apply the data to applications in the life sciences, including crystal engineering and materials science . CCDC developed programs such as ConQuest and Mercury that run under Windows and various types of Unix , including Linux . ConQuest
SECTION 10
#1732798406611240-428: The distance between pairs of atoms of the protein is estimated. The final conformation of the protein is obtained from NMR by solving a distance geometry problem. After 2013, a growing number of proteins are determined by cryo-electron microscopy . For PDB structures determined by X-ray diffraction that have a structure factor file, their electron density map may be viewed. The data of such structures may be viewed on
260-555: The domains are related by evolution i.e. they are homologous. Additional sequence data for domains with no experimentally determined structures are provided by CATH's sister resource, Gene3D, which are used to populate the homologous superfamilies. Protein sequences from UniProtKB and Ensembl are scanned against CATH HMMs to predict domain sequence boundaries and make homologous superfamily assignments. The CATH team releases new data both as daily snapshots, and official releases approximately annually. The latest release of CATH-Gene3D (v4.3)
280-1249: The legacy PDB format. Individual files are easily downloaded into graphics packages from Internet URLs : The " 4hhb " is the PDB identifier. Each structure published in PDB receives a four-character alphanumeric identifier, its PDB ID. (This is not a unique identifier for biomolecules, because several structures for the same molecule—in different environments or conformations—may be contained in PDB with different PDB IDs.) The structure files may be viewed using one of several free and open source computer programs , including Jmol , Pymol , VMD , Molstar and Rasmol . Other non-free, shareware programs include ICM-Browser, MDL Chime , UCSF Chimera , Swiss-PDB Viewer, StarBiochem (a Java-based interactive molecular viewer with integrated search of protein databank), Sirius , and VisProt3DS (a tool for Protein Visualization in 3D stereoscopic view in anaglyph and other modes), and Discovery Studio . The RCSB PDB website contains an extensive list of both free and commercial molecule visualization programs and web browser plugins. Cambridge Crystallographic Data Centre The Cambridge Crystallographic Data Centre ( CCDC )
300-456: The public at no charge). The PDB database is updated weekly ( UTC +0 Wednesday), along with its holdings list. As of 10 January 2023 , the PDB comprised: Most structures are determined by X-ray diffraction, but about 7% of structures are determined by protein NMR . When using X-ray diffraction, approximations of the coordinates of the atoms of the protein are obtained, whereas using NMR,
320-495: The sponsorship of Walter Hamilton at the Brookhaven National Laboratory , Edgar Meyer ( Texas A&M University ) began to write software to store atomic coordinate files in a common format to make them available for geometric and graphical evaluation. By 1971, one of Meyer's programs, SEARCH, enabled researchers to remotely access information from the database to study protein structures offline. SEARCH
340-499: The three PDB websites. Historically, the number of structures in the PDB has grown at an approximately exponential rate, with 100 registered structures in 1982, 1,000 structures in 1993, 10,000 in 1999, 100,000 in 2014, and 200,000 in January 2023. The file format initially used by the PDB was called the PDB file format. The original format was restricted by the width of computer punch cards to 80 characters per line. Around 1996,
360-669: Was appointed head of the PDB. In October 1998, the PDB was transferred to the Research Collaboratory for Structural Bioinformatics (RCSB); the transfer was completed in June 1999. The new director was Helen M. Berman of Rutgers University (one of the managing institutions of the RCSB, the other being the San Diego Supercomputer Center at UC San Diego ). In 2003, with the formation of the wwPDB,
380-642: Was instrumental in enabling networking, thus marking the functional beginning of the PDB. The Protein Data Bank was announced in October 1971 in Nature New Biology as a joint venture between Cambridge Crystallographic Data Centre , UK and Brookhaven National Laboratory, US. Upon Hamilton's death in 1973, Tom Koetzle took over direction of the PDB for the subsequent 20 years. In January 1994, Joel Sussman of Israel's Weizmann Institute of Science
400-551: Was released in December 2020 and consists of: CATH is an open source software project, with developers developing and maintaining a number of open-source tools, which are available publicly on GitHub . Protein Data Bank The Protein Data Bank ( PDB ) is a database for the three-dimensional structural data of large biological molecules such as proteins and nucleic acids , which
#610389