Module 6: Databases and Reproducible Data Mining in Bioinformatics

This module introduces strategies for reproducible data mining using bioinformatics databases, with emphasis on query transparency, provenance, and repeatable workflows.

← Back to all modules

Learning Objectives

Module Content

Complete the content below in order for better understanding and continuity of information.

Reading: Introduction to Databases in a Bioinformatics Context

Reading: Introduction to Databases in a Bioinformatics Context

Estimated time to complete: 5–15 minutes

Video: Database Types

Download Slides (opens in new tab)  |  Open directly in Yuja (opens in new tab)

Click to expand and watch video

Captions and transcript available within the player.

Estimated time to complete: 10–15 minutes

Placeholder: Replace links with the final slide deck folder and Yuja video.

Reflection: Query Transparency and Reuse

Reflect on the following question(s) before moving on:

  1. Think of a search you performed on an internet search engine like Google or DuckDuckGo. Would someone else searching on another computer for the same thing be able to retrieve the exact same records?
  2. Consider the same thing, but a search of PubMed. What information would you need to provide to ensure another person retrieves the same results as you? Is this feasible? Why or why not?
  3. What could change between database releases that would affect results (IDs, annotations, reference genomes, or curation)?
  4. What is one simple way you could record queries and filters to help future-you reproduce the analysis?

Estimated time to complete: 5–15 minutes

Reading: NCBI and Databases in Bioinformatics

Reading: NCBI and Databases in Bioinformatics

Estimated time to complete: 15-20 minutes

Activity: Update Publications and their Purpose

Activity: Update Publications and their Purpose

Estimated time to complete: 30-45 minutes

Reflection: Database Funding and Sustainability

Reflect on the following question(s) before moving on:

  1. Many bioinformatics databases are supported, at least initially, by grants or very strong community support. In what ways can you support the creators of these databases ensure that they continue to receive funding for these resources? (Grants are not always guaranteed!)
  2. What might happen to a database if it loses funding? What are the potential impacts on the community it serves?

Estimated time to complete: 5–15 minutes

Video: Other Platforms and Storage for Big Data

Download Slides (opens in new tab)  |  Open directly in Yuja (opens in new tab)

Click to expand and watch video

Captions and transcript available within the player.

Estimated time to complete: 10–15 minutes

Reading: Information Retrieval

Reading: Information Retrieval

Estimated time to complete: 5–15 minutes

Video: Information Management Systems

Download Slides (opens in new tab)  |  Open directly in Yuja (opens in new tab)

Click to expand and watch video

Captions and transcript available within the player.

Estimated time to complete: 10–15 minutes

Placeholder: Replace links with the final slide deck folder and Yuja video.

Reflection: Information Retrieval versus Database Query

Reflect on the following question(s) before moving on:

  1. Why might a novice user prefer keyword searching over writing an SQL query?
  2. How does relevance ranking in IR differ from exact matching in a database query?
  3. Would you describe PubMed as “just a database”? Why or why not?
  4. What kinds of biological information are easiest to retrieve with IR methods rather than strict relational queries?

Estimated time to complete: 5–15 minutes

Practice Quiz

Practice Quiz (HTML)

Estimated time to complete: 5–10 minutes

Placeholder: Link to your module practice quiz page when ready.

Next Steps
Notes for Educators

You can use this content in your classes! This work is provided under a Creative Commons Attribution– Non Commercial (CC BY-NC) license. You can use, copy, share, or adapt this material for teaching, learning, or other non-commercial purposes.

You can pick and choose what you want from this website, or you can download the entire website's worth of content in ready-to-go Canvas format here: Download as a Canvas Course

We do ask that you cite this work when you use it: Learn how.

Notes for Students

There's more on Canvas! Find more content and quizzes on the Canvas version of this course. Go back and click “Enroll in the Self-Paced Canvas Course” for more.