Why this distinction matters
In bioinformatics, we often use the word search very loosely. A researcher may say they “searched a database,” but the actual task could involve:
- running a structured query against a database with a defined schema,
- typing keywords into a search box,
- retrieving articles or records ranked by relevance, or
- looking across semi-structured documents with metadata and text.
Keyword searching as motivation
A keyword search looks for words anywhere in a collection of records or documents. Keyword searching is very common in biology because users often know the concept they want, but they do not know the exact schema, field names, or query language behind the resource.
In a traditional relational database, retrieval is often done through a query language such as
SQL. In contrast, keyword searching has long been central to information retrieval systems,
especially for text-heavy collections.
Why not just use SQL?
- Every database may use a different schema.
- Most users do not know SQL.
- Even SQL varies across database systems.
- Not all data resources are relational databases.
Why keyword search is attractive
- It lowers the barrier to entry.
- It lets users begin with concepts, not field names.
- It works well for articles, abstracts, and descriptive text.
- It supports exploratory searching.
What is information retrieval (IR)?
Information retrieval (IR) is the science of obtaining information relevant to a user’s need from a collection of information resources. In practice, IR often treats information as a collection of documents rather than as rows and columns in a highly structured table.
Those documents may be unstructured or semi-structured. They may contain titles, abstracts, free text, metadata, controlled vocabulary terms, and identifiers. The goal is not simply to find exact matches in a field, but to retrieve items that are likely to be relevant.
Database querying asks: “Given this structure, which records satisfy these exact conditions?”
How is IR different from querying a database?
| Feature | Information Retrieval (IR) | Database Querying |
|---|---|---|
| Typical data | Unstructured or semi-structured documents | Structured records with defined fields |
| Organization | Collection of documents, text, metadata, identifiers | Schema-defined tables, fields, and relationships |
| User input | Keywords, phrases, concepts | Formal conditions and field-based constraints |
| Match style | Approximate or relevance-based | Exact logical conditions |
| Output | Ranked results by estimated relevance | Records that satisfy the query |
| Typical challenge | Ambiguity in terms and meaning | Knowing the schema and query language |
IR systems also handle problems that are less central in traditional database querying, such as:
- approximate searching by keywords,
- ranking results by likely relevance, and
- language variation or synonymy in the way users describe concepts.
IR in bioinformatics
In bioinformatics, IR becomes especially important when the target information is embedded in text, annotations, abstracts, article metadata, or descriptive records. Biological terms are often messy: the same concept may appear under multiple names, abbreviations, or spellings.
Example: semantic ambiguity
A user searching for information about Bordetella pertussis might search using:
pertussiswhooping coughBordetella pertussisB. pertussis
A database system and an IR system may treat these very differently unless metadata, ontology, controlled vocabulary, or query expansion are used.
This is one reason why standardized keyword systems, taxonomies, and controlled vocabularies are valuable. They reduce ambiguity and help connect related terms during search and retrieval.
Whats the Difference?
In practice, many modern websites combine both ideas. A site may store data in a structured backend database, but expose a keyword-based interface to users. The underlying system may still rely on database design but the user layer/front later behaves more like information retrieval. We don't want to expect users to know query language.
- Searching a resource and receiving a ranked list of results that you have to choose from is probably information retreival
- Getting records from a query where you type exact words, dates, identifiers or phrases is likely to be querying an actual database (or portion of it).
- Sometimes, both can happen in the same resource.