In the past few years, “Machine Learning” has become a popular buzzword for health IT startups. The underlying nuts and bolts are based on old-school statistics, e.g. the linear regression method you learned in undergrad. Recent attention is due to a combination of i) advancements in software analytic tools, ii) expansion of computing power and cloud-based databases, and ii) the availability of large, high-quality datasets.
In the process of Machine Learning, an “Algorithm” is applied to a set of data (e.g. tumor MRI images) and to some knowledge about these data (e.g. benign or malignant status), and the system learns from the “Training Data” and builds a “Model” in order to make a prediction for new samples (e.g. classify a new image as a benign or malignant tumor).[1]
With the familiar least squares linear regression, for example, the “Model” is the set of coefficients or “Weights” linked to certain variables and the “Algorithm” is the process of finding the set of coefficients that yield the lowest total sum of squared prediction errors. A good primer of Machine Learning for Medical Imaging can be found here: https://doi.org/10.1148/rg.2017160130
Machine learning algorithms can be classified as supervised or unsupervised. Supervised learning algorithms train on pre-labeled datasets and include: support vector machine, decision tree, linear regression, logistic regression, naïve Bayes, k-nearest neighbor, random forest, AdaBoost, and neural network methods. With unsupervised learning, the algorithm system is not provided with labeled data, but itself determines how to separate groups. The newly popular “Deep Learning” approach refers to the use of complex neural networks with many “hidden layers” which require large computing power and can be trained in a supervised or unsupervised manner.
Machine Learning techniques applied to large data sets of medical information have the ability to generate new insights undetectable by human observation. In the fields of radiology and pathology, Machine Learning can act as a powerful screening tool or aid the current interpretation workflow, making the work of doctors more accurate and faster. One popular example is an iPhone app which classifies a photo of your mole using a machine learning model trained on a database of labeled moles.
In a recent study by Google published in Nature Medicine, researchers applied machine learning to a National Institutes of Health dataset of 42,290 CT scans of people at high-risk for lung cancer whose lung cancer diagnoses were known.[2] They created a neural network, with multiple layers of processing, and trained and tuned it with 85% of the CT scans. On a test dataset of 6,716 cases, the model achieved an area under the receiver operating characteristic (AUROC) of 94.4% for diagnosing lung cancer based on the CT scans – rivaling expert radiologists.
Machine Learning applied to large “Real World” datasets, such as Electronic Medical Record (EMR) data, has the ability to generate highly personalized diagnoses, prognoses, and treatment recommendations, as well as advance basic and applied medical research. Clinical decision support tools can become highly specific by offering advice based on the outcomes of thousands of very similar cases. Payors and academics have applied statistical techniques to CMS and commercial claims data for years. The combination with more detailed EMR data and modern analytical tools and computing power will provide a more nuanced understanding of population risk, health care utilization, cost, and outcomes. Real-World Data and Real-World Evidence will be discussed further in a future blog.
The barriers to the advancement of machine learning and big data in the medical field include important patient privacy and ethical considerations, data security, lack of interoperability, inconsistent data sets, and the expense of manual data curation. These barriers are surmountable over time. Stay tuned to this space as the potential is tremendous.
[1]Erickson, Bradley et al. “Machine Learning for Medical Imaging” RadioGraphics Vol. 37, No. 2, Feb 17, 2017 https://doi.org/10.1148/rg.2017160130
[2]Ardila, Diego et al. “End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography” Nature Medicine May 20, 2019 https://www.nature.com/articles/s41591-019-0447-x