Overview
If you are interested in bioinformatics, you are probably familiar with the concept of statistics. You may have taken a course or two in statistics, or even done some statistics yourself. Whatever your level of comfort with statistics, take a moment to understand the notion of statistics as a discipline and a little bit of historical context. This background should provide contextual support for why statistical rigor is important in bioinformatics.
What should I know about Math already?
If we consider statistics to be a discipline within the domain of mathemetics, we can provide important context for knowing its role. Mathematics (math) as a discipline is commonly divided into theoretal/pure math and applied math [Source]. Pure mathematics seeks to understand abstract concepts and complex patterns. Applied mathematics attempts to solve real world problems based on the pure framework. A student in elementary and middle school in the United States will traverse through a series of math topics, probably including but not limited to:
- Counting, Directionality, and Inherent Properties of Values and Sets
- Basic Arithmetic Operations (Addition, Subtraction, Multiplication, Division)
- Equations and Expressions
- Problem Solving and Logical Reasoning
- Fractions and Ratios
- Measurement and Data Literacy
- Geometry and Measuring Shapes
- The Number System
- Functions
- Statistics
- Introductory Algebra
- Algebra (Equations, functions, quadratics, exponents)
- Geometry (Proofs, Trigonometry, Coordinate Systems)
- Algebra (again!) (Polynomials, more functions, complex numbers)
- Statistics (Distributions, Sampling, Inference, Probability)
- Precalculus, and possibly even Calculus I
The Map of Mathematics by Dominic Walliman. Source Link
What is Statistics?
In the field of mathematics, statistics is typically considered to be on the applied side and closely tied to probability. Statistics are used in many disciplines in today's data driven world, so familiarity is key.
For our purposes, the discipline of statistics can be defined as the science and practice of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.
Why do we need statistics?
To begin understanding why statistics is important in many disciplines, I like to use a quote from "The limits of our personal experience and the value of statistics" written by Max Roser in 2023 [Source]:
"It’s tempting to believe that we can simply rely on personal experience to develop our understanding of the world. But that’s a mistake. The world is large, and we can experience only very little of it personally. To see what the world is like, we need to rely on other means: carefully-collected global statistics."
Statistics and its application allows us to identify patterns, anomalies, and qualities that are only perceptible by the comparison of large amounts of data.