# Best biomedical and health data science books and resources

## What is biomedical data science?

Biomedical data science spans a range of biological and medical research challenges that are data-intensive and focused on the creation of novel methodologies to advance biomedical science discovery. -

Annual Review of Biomedical Data Science

Here is a listing of some resources that I have found while researching and studying the field of biomedical data science and analytics. Unfortunately many books and courses listed here are paid, but I have tried my best to list some free and open-sourced resources too. Let’s go to them!

## Table of Contents

## Notice of non-affiliation and disclaimer

## Statistics and math

A good understanding of statistics and mathematics is fundamental to any data science or machine learning analysis. The most basic and key concepts include probability distributions, statistical significance, hypothesis testing, and regression. Here are some resources dedicated to teaching you all of that (and more) with examples from biomedical sciences.

### Modern Statistics for Modern Biology

#### Susan Holmes, Wolfgang Huber

**Book ๐ | Code:
| Free: โ
| Link โ๏ธ**

The aim of this book is to enable scientists working in biological research to quickly learn many of the important ideas and methods that they need to make the best of their experiments and of other available data. The book takes a hands-on approach.

This book is not heavy on mathematics, it goes straight to the core concepts and has a lot of R code examples and exercises! It ranges from the basics of data distributions and hypothesis testing to more advanced topics like multivariate analysis and supervised learning.

### Statistics for Biomedical Engineers and Scientists

#### Andrew King, Robert Eckersley

**Book ๐ | Code: MATLAB | Free: โ | Link โ๏ธ**

Readers will learn how to understand the fundamental concepts of descriptive and inferential statistics, analyze data and choose an appropriate hypothesis test to answer a given question, compute numerical statistical measures and perform hypothesis tests “by hand”, and visualize data and perform statistical analysis using MATLAB.

This is just what you would expect from a regular undergraduate level book about probability and statistics. Not heavy on math and it has a lot of exercises.

### Applied Mathematics for the Analysis of Biomedical Data: Models, Methods, and MATLAB

#### Peter J. Costa

**Book ๐ | Code: MATLAB | Free: โ | Link โ๏ธ**

Features a practical approach to the analysis of biomedical data via mathematical methods and provides a MATLABยฎ toolbox for the collection, visualization, and evaluation of experimental and real-life data

This one is heavier on maths and assumes you are familiar with elementary differential equations, linear algebra, and statistics.

### Data-Handling in Biomedical Science

#### Peter White

**Book ๐ | Code: โ | Free: โ | Link โ๏ธ**

Packed with worked examples and problems, this book will help the reader improve their confidence and skill in data-handling.

This one is a little different from the previous ones, but it is worth listing. The book has no code examples and it is not about computational methods of data handling and analysis. It teaches basic math and statistics needed for biochemistry and microbiology experiments.

## Data engineering

As important as analyzing data, we also need to know how to design and maintain data pipelines. Biomedical data can be messy, heterogenous, and big, but fortunately, these authors are here to help us!

### Data Warehousing for Biomedical Informatics

#### Richard E. Biehl

**Book ๐ | Code: SQL
| Free: โ | Link โ๏ธ**

A step-by-step how-to guide for designing and building an enterprise-wide data warehouse across a biomedical or healthcare institution, using a four-iteration lifecycle and standardized design pattern.

This book is a gem. Classical content about data warehousing and ETL pipelines, but really focused on biomedical and healthcare data. Lots of SQL code snippets!

### Big Biomedical Data Engineering

#### Ripon Patgiri, Sabuzima Nayak

**Book chapter ๐ | Code: โ | Free: โ
| Link โ๏ธ**

This chapter exploits the role of Big Data in biomedical data engineering and its storage dilemma.

A short book chapter that discusses some scenarios of biomedical big data applications and possible future.

## Data manipulation, data analysis, and machine learning

This is where most people have fun. Let’s see how to handle, clean, analyze and extract insights from biomedical data.

### Data Science and Predictive Analytics: Biomedical and Health Applications using R

#### Ivo D. Dinov

**Book and MOOC ๐ ๐ป | Code:
| Free: โ
โ | Link โ๏ธ | Free online material โ๏ธ**

Complete and self-contained treatment of the theory, experimental modeling, system development, and validation of predictive health analytics.

A comprehensive data science book: introduction to R, data manipulation, data visualization, classification, regression, NLP, and even a little Deep Learning! All of this with well-documented R code. The book is not free, but you can find the videos, class notes, and R code on the author’s page linked above.

### Computational Learning Approaches to Data Analytics in Biomedical Applications

#### Khalid Al-Jabery Tayo Obafemi-Ajayi Gayla Olbricht Donald Wunsch

**Book ๐ | Code:
, MATLAB | Free: โ | Link โ๏ธ**

It presents insights on biomedical data processing, innovative clustering algorithms and techniques, and connections between statistical analysis and clustering.

An interesting and more theoretical approach to data preprocessing and clustering algorithms. Examples are given in pseudocode and some math knowledge is required. The last chapter has a hands-on approach using MATLAB and Python codes.

### Statistical Learning for Biomedical Data

#### James D. Malley, Karen G. Malley, Sinisa Pajevic

**Book ๐ | Code: MATLAB | Free: โ | Link โ๏ธ**

This book is for anyone who has biomedical data and needs to identify variables that predict an outcome, for two-group outcomes such as tumor/not-tumor, survival/death, or response from treatment.

Not heavy on math and does not have many code examples. Great theoretical explanations covering regression, single decision trees, and Random Forests.

### Case Studies in Neural Data Analysis

#### Mark Kramer, Uri Eden

**Book ๐ | Code:
| Free: โ
| Link โ๏ธ**

The intended audience is the practicing neuroscientist - e.g., the students, researchers, and clinicians collecting neuronal data in the hospital or lab. The material can get pretty math-heavy, but weโve tried to outline the main concepts as directly as possible, with hands-on implementations of all concepts.

Great hands-on material for neuroscientists interested in analyzing spike trains and electric fields. All notebooks are in Python and have a little explanation about the concepts and goal of the analysis.

### Neural Data Science: A Primer with MATLAB and Python

#### Erik Lee Nylen, Pascal Wallisch

**Book ๐ | Code:
, MATLAB | Free: โ | Link โ๏ธ**

A beginnerโs introduction to the principles of computation and data analysis in neuroscience, using both Python and MATLAB, giving readers the ability to transcend platform tribalism and enable coding versatility.

This book is beautifully organized and filled with images. The coolest thing about it is the MATLAB and Python code written side-by-side. The content ranges from the basics of programming to advanced techniques such as analog signal processing, biophysical modeling, clustering, and classification.

### Computational Genomics with R

#### Altuna Akalin

**Book ๐ | Code:
| Free: โ
| Link โ๏ธ**

The aim of this book is to provide the fundamentals for data analysis for genomics. We want this book to be a starting point for computational genomics students and a guide for further data analysis in more specific topics in genomics.

This book has a great introduction to genomics that will help a lot if you are not coming from a biological related field. It covers many topics such as introduction to R, statistics, exploratory data analysis, supervised learning, RNA-Seq, and more!

### Bioinformatics: The Machine Learning Approach

#### Pierre Baldi, Sรธren Brunak

**Book ๐ | Code: โ | Free: โ | Link โ๏ธ**

The book is aimed both at biologists and biochemists who need to understand new data-driven algorithms and at those with a primary background in physics, mathematics, statistics, or computer science who need to know more about applications in molecular biology.

This one is a little heavy on math, you will probably need some calculus, algebra, and probability theory. The book is really about the theoretical aspects of machine learning applied to bionformatics, including definitions of main concepts and proofs of main theorems.

### Biomedical Image Analysis in Python

#### DataCamp

**Videos and interactive code ๐ป | Code:
| Free: โ | Link โ๏ธ**

In this introductory course, you’ll learn the fundamentals of image analysis using NumPy, SciPy, and Matplotlib. You’ll navigate through a whole-body CT scan, segment a cardiac MRI time series, and determine whether Alzheimerโs disease changes brain structure.

Great content and it follows the DataCamp course structure: short videos and hands-on coding exercises directly in the browser!

## Datasets

Here are some places where you can find datasets to explore and exercise your skills:

### Synthea: Synthetic Patient Generation

#### MITRE Corporation

**Link โ๏ธ**

SyntheaTM is an open-source, synthetic patient generator that models the medical history of synthetic patients. The resulting data is free from cost, privacy, and security restrictions, enabling research with Health IT data that is otherwise legally or practically unavailable.

### PhysioNet: The Research Resource for Complex Physiologic Signals

#### MIT Laboratory for Computational Physiology

**Link โ๏ธ**

PhysioNet is a repository of freely-available medical research data, managed by the MIT Laboratory for Computational Physiology.

### Computational Biology Datasets Suitable For Machine Learning

#### Ben Lengerich

**Link โ๏ธ**

This is a curated list of computational biology datasets that have been pre-processed for machine learning.

### Kaggle: Healthcare tag

**Link โ๏ธ**

Kaggle is the world’s largest data science community with powerful tools and public datasets.

### NIH: Data Sharing Resources

#### Trans-NIH BioMedical Informatics Coordinating Committee

**Link โ๏ธ**

To help researchers locate an appropriate resource for sharing their data, as well as to promote awareness of resources where datasets can be located for reuse, BMIC maintains lists of several types of data sharing resources.

## Conclusions

That’s it! This comprehensive list covers many areas of biomedical data science and analytics, but there are many more great resources out there! Do you think I might have left out something important? Share with us in the comments!