What is a biological database?
A biological database is any organized data source containing information useful or relevant to biology. In the early days of bioinformatics, biological databases were used to store, manage, share, and efficiently search growing repositories of biological data. As biomedical research accelerated and received federal support in the 1980s and 1990s, the need for database administration and information retrieval increased dramatically.
Why do biological databases matter?
Basic bioinformatics tasks rely on databases. Researchers use databases to search for patterns within sequences, compute descriptive statistics, compare sequences, align multiple sequences, infer phylogenetic relationships, and predict biological structure and function.
Examples of Database-enabled Tasks
- Sequence similarity search
- Pairwise and multiple sequence alignment
- Pattern and motif discovery
- Phylogenetic analysis
- Secondary and tertiary structure prediction
Why This matters for Trainees
Without well-structured databases, the same biological data become difficult to find, difficult to compare, and difficult to reuse responsibly.
Pattern Discovery
As biological and biomedical data volumes increase, so does the need for knowledge extraction. The process for traversing from data to knowledge on a smaller scale is similar to the way we generate knowledge via the scientific method. In database research communities, this is sometimes referred to as the DIKW pyramid:
This progression is often described with the DIKW idea: data, information, knowledge, and wisdom. In bioinformatics, databases support the first three steps especially well by enabling storage, curation, comparison, and retrieval [Source].