Biological Databases, Pattern Discovery, and the Creation of NCBI

Estimated time to complete: 15–25 minutes

What is a biological database?

A biological database is any organized data source containing information useful or relevant to biology. In the early days of bioinformatics, biological databases were used to store, manage, share, and efficiently search growing repositories of biological data. As biomedical research accelerated and received federal support in the 1980s and 1990s, the need for database administration and information retrieval increased dramatically.

Core idea: Bioinformatics depends not only on generating data, but on storing that data in ways that make it searchable, interpretable, and reusable.

Why do biological databases matter?

Basic bioinformatics tasks rely on databases. Researchers use databases to search for patterns within sequences, compute descriptive statistics, compare sequences, align multiple sequences, infer phylogenetic relationships, and predict biological structure and function.

Examples of Database-enabled Tasks

  • Sequence similarity search
  • Pairwise and multiple sequence alignment
  • Pattern and motif discovery
  • Phylogenetic analysis
  • Secondary and tertiary structure prediction

Why This matters for Trainees

Without well-structured databases, the same biological data become difficult to find, difficult to compare, and difficult to reuse responsibly.

Pattern Discovery

As biological and biomedical data volumes increase, so does the need for knowledge extraction. The process for traversing from data to knowledge on a smaller scale is similar to the way we generate knowledge via the scientific method. In database research communities, this is sometimes referred to as the DIKW pyramid:

A pyramid with data on the bottom, information on the next tier, knowledge on the next tier, and wisdom as the top tier of the pyramid. The image includes a red to green transition from the base of the pyramid (red) to the top (green) labeled decission risk.

This progression is often described with the DIKW idea: data, information, knowledge, and wisdom. In bioinformatics, databases support the first three steps especially well by enabling storage, curation, comparison, and retrieval [Source].