Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics , proteomics , metabolomics , microarray gene expression, and phylogenetics . Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.
42-566: In molecular biology, STRING ( Search Tool for the Retrieval of Interacting Genes/Proteins ) is a biological database and web resource of known and predicted protein–protein interactions . The STRING database contains information from numerous sources, including experimental data, computational prediction methods and public text collections. It is freely accessible and it is regularly updated. The resource also serves to highlight functional enrichments in user-provided lists of proteins, using
84-445: A nucleobase to a ribose or deoxyribose ring. Examples of these include cytidine (C), uridine (U), adenosine (A), guanosine (G), and thymidine (T). Nucleosides can be phosphorylated by specific kinases in the cell, producing nucleotides . Both DNA and RNA are polymers , consisting of long, linear molecules assembled by polymerase enzymes from repeating structural units, or monomers, of mononucleotides. DNA uses
126-528: A pentose and one to three phosphate groups . They contain carbon, nitrogen, oxygen, hydrogen and phosphorus. They serve as sources of chemical energy ( adenosine triphosphate and guanosine triphosphate ), participate in cellular signaling ( cyclic guanosine monophosphate and cyclic adenosine monophosphate ), and are incorporated into important cofactors of enzymatic reactions ( coenzyme A , flavin adenine dinucleotide , flavin mononucleotide , and nicotinamide adenine dinucleotide phosphate ). DNA structure
168-443: A bond with removal of water. They can be hydrolyzed to yield their saccharin building blocks by boiling with dilute acid or reacting them with appropriate enzymes. Examples of disaccharides include sucrose , maltose , and lactose . Polysaccharides are polymerized monosaccharides, or complex carbohydrates. They have multiple simple sugars. Examples are starch , cellulose , and glycogen . They are generally large and often have
210-403: A common reference of functional partnership as annotated by KEGG (Kyoto Encyclopedia of Genes and Genomes). STRING imports protein association knowledge from databases of physical interaction and databases of curated biological pathway knowledge ( MINT , HPRD , BIND , DIP , BioGRID , KEGG , Reactome , IntAct , EcoCyc , NCI-Nature Pathway Interaction Database , GO ). Links are supplied to
252-402: A complex branched connectivity. Because of their size, polysaccharides are not water-soluble, but their many hydroxy groups become hydrated individually when exposed to water, and some polysaccharides form thick colloidal dispersions when heated in water. Shorter polysaccharides, with 3 to 10 monomers, are called oligosaccharides . A fluorescent indicator-displacement molecular imprinting sensor
294-636: A host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species . This knowledge helps facilitate the fight against diseases, assists in the development of medications , predicting certain genetic diseases and in discovering basic relationships among species in the history of life . Relational database concepts of computer science and Information retrieval concepts of digital libraries are important for understanding biological databases. Biological database design, development, and long-term management
336-640: A number of functional classification systems such as GO , Pfam and KEGG . The latest version 11b contains information on about 24,5 million proteins from more than 5000 organisms. STRING has been developed by a consortium of academic institutions including CPR , EMBL , KU , SIB , TUD and UZH . Protein–protein interaction networks are an important ingredient for the system-level understanding of cellular processes. Such networks can be used for filtering and assessing functional genomics data and for providing an intuitive platform for annotating structural, functional and evolutionary properties of proteins. Exploring
378-439: A protein is known as that protein's primary structure . This sequence is determined by the genetic makeup of the individual. It specifies the order of side-chain groups along the linear polypeptide "backbone". Proteins have two types of well-classified, frequently occurring elements of local structure defined by a particular pattern of hydrogen bonds along the backbone: alpha helix and beta sheet . Their number and arrangement
420-405: A protein, quaternary structure of protein is formed. Quaternary structure is an attribute of polymeric (same-sequence chains) or heteromeric (different-sequence chains) proteins like hemoglobin , which consists of two "alpha" and two "beta" polypeptide chains. An apoenzyme (or, generally, an apoprotein) is the protein without any small-molecule cofactors, substrates, or inhibitors bound. It
462-422: A well-defined, stable arrangement. The overall, compact, 3D structure of a protein is termed its tertiary structure or its "fold". It is formed as result of various attractive forces like hydrogen bonding , disulfide bridges , hydrophobic interactions , hydrophilic interactions, van der Waals force etc. When two or more polypeptide chains (either of identical or of different sequence) cluster to form
SECTION 10
#1732772432751504-434: Is a core area of the discipline of bioinformatics . Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi- structured data , and can be represented as tables, key delimited records, and XML structures. Most biological databases are available through web sites that organise data such that users can browse through
546-650: Is a special yearly issue of the journal Nucleic Acids Research (NAR). The Database Issue of NAR is freely available, and categorizes many of the public biological databases. A companion database to the issue called the Online Molecular Biology Database Collection lists 1,380 online databases. Other collections of databases exist such as MetaBase and the Bioinformatics Links Collection. Biomolecule A biomolecule or biological molecule
588-415: Is always an even number. For lipids present in biological membranes, the hydrophilic head is from one of three classes: Other lipids include prostaglandins and leukotrienes which are both 20-carbon fatty acyl units synthesized from arachidonic acid . They are also known as fatty acids Amino acids contain both amino and carboxylic acid functional groups . (In biochemistry , the term amino acid
630-584: Is an E. coli database. Other popular model organism databases include Mouse Genome Informatics for the laboratory mouse , Mus musculus , the Rat Genome Database for Rattus , ZFIN for Danio Rerio (zebrafish), PomBase for the fission yeast Schizosaccharomyces pombe , FlyBase for Drosophila , WormBase for the nematodes Caenorhabditis elegans and Caenorhabditis briggsae , and Xenbase for Xenopus tropicalis and Xenopus laevis frogs. Numerous databases attempt to document
672-422: Is an important control mechanism in the cell cycle . Only two amino acids other than the standard twenty are known to be incorporated into proteins during translation, in certain organisms: Besides those used in protein synthesis , other biologically important amino acids include carnitine (used in lipid transport within a cell), ornithine , GABA and taurine . The particular series of amino acids that form
714-788: Is available to access the data and to give a fast overview of the proteins and their interactions. A plug-in for cytoscape to use STRING data is available. Another possibility to access data STRING is to use the application programming interface (API) by constructing a URL that contain the request. Like many other databases that store protein association knowledge, STRING imports data from experimentally derived protein–protein interactions through literature curation. Furthermore, STRING also store computationally predicted interactions from: (i) text mining of scientific texts, (ii) interactions computed from genomic features, and (iii) interactions transferred from model organisms based on orthology. All predicted or imported interactions are benchmarked against
756-463: Is called the secondary structure of the protein. Alpha helices are regular spirals stabilized by hydrogen bonds between the backbone CO group ( carbonyl ) of one amino acid residue and the backbone NH group ( amide ) of the i+4 residue. The spiral has about 3.6 amino acids per turn, and the amino acid side chains stick out from the cylinder of the helix. Beta pleated sheets are formed by backbone hydrogen bonds between individual beta strands each of which
798-504: Is dominated by the well-known double helix formed by Watson-Crick base-pairing of C with G and A with T. This is known as B-form DNA, and is overwhelmingly the most favorable and common state of DNA; its highly specific and stable base-pairing is the basis of reliable genetic information storage. DNA can sometimes occur as single strands (often needing to be stabilized by single-strand binding proteins) or as A-form or Z-form helices, and occasionally in more complex 3D structures such as
840-588: Is how biological databases cross-reference to other databases with accession numbers to link their related knowledge together (e.g. so that the accession number stays the same even if a species name changes). Redundancy is another problem, as many databases must store the same information, e.g. protein structure databases also contain the sequence of the proteins they cover, their sequence, and their bibliographic information. Species-specific databases are available for some species, mainly those that are often used in research ( model organisms ). For example, EcoCyc
882-511: Is in an "extended", or fully stretched-out, conformation. The strands may lie parallel or antiparallel to each other, and the side-chain direction alternates above and below the sheet. Hemoglobin contains only helices, natural silk is formed of beta pleated sheets, and many enzymes have a pattern of alternating helices and beta-strands. The secondary-structure elements are connected by "loop" or "coil" regions of non-repetitive conformation, which are sometimes quite mobile or disordered but usually adopt
SECTION 20
#1732772432751924-488: Is loosely defined as a molecule produced by a living organism and essential to one or more typically biological processes . Biomolecules include large macromolecules such as proteins , carbohydrates , lipids , and nucleic acids , as well as small molecules such as vitamins and hormones. A general name for this class of material is biological materials. Biomolecules are an important element of living organisms, those biomolecules are often endogenous , produced within
966-519: Is often important as an inactive storage, transport, or secretory form of a protein. This is required, for instance, to protect the secretory cell from the activity of that protein. Apoenzymes become active enzymes on addition of a cofactor . Cofactors can be either inorganic (e.g., metal ions and iron-sulfur clusters ) or organic compounds, (e.g., [Flavin group|flavin] and heme ). Organic cofactors can be either prosthetic groups , which are tightly bound to an enzyme, or coenzymes , which are released from
1008-426: Is used when referring to those amino acids in which the amino and carboxylate functionalities are attached to the same carbon, plus proline which is not actually an amino acid). Modified amino acids are sometimes observed in proteins; this is usually the result of enzymatic modification after translation ( protein synthesis ). For example, phosphorylation of serine by kinases and dephosphorylation by phosphatases
1050-1268: The Catalogue of Life draws from 165 databases as of May 2022. Operational costs of the Catalogue of Life are paid for by the Global Biodiversity Information Facility , the Illinois Natural History Survey , the Naturalis Biodiversity Center , and the Smithsonian Institution . Some biological databases also document geographical distribution of different species. Shuang Dai et al. created a new multi-source database to document spatial/geographical distribution of 1,371 bird species in China, as existing databases had been severely lacking in spatial distribution data for many species. Sources for this new database included books, literature, GPS tracking, and online webpage data. The new database displayed taxonomy, distribution, species info, and data sources for each species. After completion of
1092-545: The basic building blocks of biological membranes . Another biological role is energy storage (e.g., triglycerides ). Most lipids consist of a polar or hydrophilic head (typically glycerol) and one to three non polar or hydrophobic fatty acid tails, and therefore they are amphiphilic . Fatty acids consist of unbranched chains of carbon atoms that are connected by single bonds alone ( saturated fatty acids) or by both single and double bonds ( unsaturated fatty acids). The chains are usually 14-24 carbon groups long, but it
1134-404: The bird spatial distribution database, it was discovered that 61% of known species in China were found to be distributed in regions beyond where they were previously known. Medical databases are a special case of biomedical data resource and can range from bibliographies, such as PubMed , to image databases for the development of AI based diagnostic software. For instance, one such image database
1176-526: The crossover at Holliday junctions during DNA replication. RNA, in contrast, forms large and complex 3D tertiary structures reminiscent of proteins, as well as the loose single strands with locally folded regions that constitute messenger RNA molecules. Those RNA structures contain many stretches of A-form double helix, connected into definite 3D arrangements by single-stranded loops, bulges, and junctions. Examples are tRNA, ribosomes, ribozymes , and riboswitches . These complex structures are facilitated by
1218-466: The data online. In addition the underlying data is usually available for download in a variety of formats. Biological data comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example: Biological knowledge is distributed among countless databases. This sometimes makes it difficult to ensure the consistency of information, e.g. when different names are used for
1260-409: The deoxynucleotides C, G, A, and T, while RNA uses the ribonucleotides (which have an extra hydroxyl(OH) group on the pentose ring) C, G, A, and U. Modified bases are fairly common (such as with methyl groups on the base ring), as found in ribosomal RNA or transfer RNAs or for discriminating the new from old strands of DNA after replication. Each nucleotide is made of an acyclic nitrogenous base ,
1302-755: The diversity of life on earth. A prominent example is the Catalogue of Life , first created in 2001 by Species 2000 and the Integrated Taxonomic Information System. The Catalogue of Life is a collaborative project that aims to document taxonomic categorization of all currently accepted species in the world. The Catalogue of Life provides a consolidated and consistent database for researchers and policymakers to reference. The Catalogue of Life curates up-to-date datasets from other sources such as Conifer Database, ICTV MSL (for viruses), and LepIndex (for butterflies and moths). In total,
STRING - Misplaced Pages Continue
1344-441: The enzyme's active site during the reaction. Isoenzymes , or isozymes, are multiple forms of an enzyme, with slightly different protein sequence and closely similar but usually not identical functions. They are either products of different genes , or else different products of alternative splicing . They may either be produced in different organs or cell types to perform the same function, or several isoenzymes may be produced in
1386-490: The fact that RNA backbone has less local flexibility than DNA but a large set of distinct conformations, apparently because of both positive and negative interactions of the extra OH on the ribose. Structured RNA molecules can do highly specific binding of other molecules and can themselves be recognized specifically; in addition, they can perform enzymatic catalysis (when they are known as " ribozymes ", as initially discovered by Tom Cech and colleagues). Monosaccharides are
1428-414: The organism but organisms usually need exogenous biomolecules, for example certain nutrients , to survive. Biology and its subfields of biochemistry and molecular biology study biomolecules and their reactions . Most biomolecules are organic compounds , and just four elements — oxygen , carbon , hydrogen , and nitrogen —make up 96% of the human body 's mass. But many other elements, such as
1470-720: The originating data of the respective experimental repositories and database resources. A large body of scientific texts ( SGD , OMIM , FlyBase , PubMed ) are parsed to search for statistically relevant co-occurrences of gene names. Biological database Biological databases can be classified by the kind of data they collect (see below). Broadly, there are molecular databases (for sequences, molecules, etc.), functional databases (for physiology, enzyme activities, phenotypes, ecology etc), taxonomic databases (for species and other taxonomic ranks), images and other media, or specimens (for museum collections etc.) Databases are important tools in assisting scientists to analyze and explain
1512-588: The predicted interaction networks can suggest new directions for future experimental research and provide cross-species predictions for efficient interaction mapping. The data is weighted and integrated and a confidence score is calculated for all protein interactions. Results of the various computational predictions can be inspected from different designated views. There are two modes of STRING: Protein-mode and COG -mode. Predicted interactions are propagated to proteins in other organisms for which interaction has been described by inference of orthology . A web interface
1554-478: The primary structural components of most plants. It contains subunits derived from p -coumaryl alcohol , coniferyl alcohol , and sinapyl alcohol , and is unusual among biomolecules in that it is racemic . The lack of optical activity is due to the polymerization of lignin which occurs via free radical coupling reactions in which there is no preference for either configuration at a chiral center . Lipids (oleaginous) are chiefly fatty acid esters , and are
1596-493: The same species or different data formats. As a consequence, inter-operability is a constant challenge for information exchange. For instance, if a DNA sequence database stores the DNA sequence along the name of a species, a name change of that species may break the links to other databases which may use a different name. Integrative bioinformatics is one field attempting to tackle this problem by providing unified access. One solution
1638-847: The simplest form of carbohydrates with only one simple sugar. They essentially contain an aldehyde or ketone group in their structure. The presence of an aldehyde group in a monosaccharide is indicated by the prefix aldo- . Similarly, a ketone group is denoted by the prefix keto- . Examples of monosaccharides are the hexoses , glucose , fructose , Trioses , Tetroses , Heptoses , galactose , pentoses , ribose, and deoxyribose. Consumed fructose and glucose have different rates of gastric emptying, are differentially absorbed and have different metabolic fates, providing multiple opportunities for two different saccharides to differentially affect food intake. Most saccharides eventually provide fuel for cellular respiration. Disaccharides are formed when two monosaccharides, or two single simple sugars, form
1680-559: The various biometals , are also present in small amounts. The uniformity of both specific types of molecules (the biomolecules) and of certain metabolic pathways are invariant features among the wide diversity of life forms; thus these biomolecules and metabolic pathways are referred to as "biochemical universals" or "theory of material unity of the living beings", a unifying concept in biology, along with cell theory and evolution theory . A diverse range of biomolecules exist, including: Nucleosides are molecules formed by attaching
1722-407: Was developed for discriminating saccharides. It successfully discriminated three brands of orange juice beverage. The change in fluorescence intensity of the sensing films resulting is directly related to the saccharide concentration. Lignin is a complex polyphenolic macromolecule composed mainly of beta-O4-aryl linkages. After cellulose, lignin is the second most abundant biopolymer and is one of
STRING - Misplaced Pages Continue
1764-563: Was developed with the goal of aiding in the development of wound monitoring algorithms. Over 188 multi-modal image sets were curated from 79 patient visits, consisting of photographs, thermal images, and 3D mesh depth maps. Wound outlines were manually drawn and added to the photo datasets. The database was made publicly available in the form of a program called WoundsDB, downloadable from the Chronic Wound Database website. An important resource for finding biological databases
#750249