top of page

Online Datasets

Online medical imaging and healthcare datasets freely available on the web

General Data Science Datasets

  • Kaggle is a platform for predictive modeling and analytics competitions and hosts more than 8,000 publicly available datasets

  • The UC Irvine Machine Learning Repository maintains 438 publicly available datasets as a service to the machine learning community.


General Healthcare Datasets


Patient Level Data

  • PhysioNet contains over 90,000 recordings, or over 4 terabytes of digitized physiologic signals and time series, organized in over 80 databases (PhysioBank). PhysioBank databases are made available under the ODC Public Domain Dedication and License v1.0.

  • MIMIC-III is a large, freely available database comprising deidentified health-related data associated with over 40,000 patients who stayed in critical care units of the Beth Israel Deaconess Medical Center between 2001 and 2012.


Medical Imaging Datasets

  • The National Library of Medicine - MedPix is a free open-access online database of medical images, teaching cases, and clinical topics.

  • The SICAS Medical Image Repository is a freely accessible repository containing medical research data including medical images, surface models, clinical data, genomics data and statistical shape models. The data can freely be organized and shared on SMIR and made publicly accessible with a DOI. Dedicated data sets are organized as collections of anatomical regions (e.g Cochlea).

  • SpineWeb is an online collaborative platform for everyone interested in research on spinal imaging and image analysis. They collect, host and provide useful information and spine imaging resources and make them publicly available.

  • has made a dataset of 491 CT scans with 193,317 slices publicly available so that others can compare and build upon the results they were able to achieve with their deep learning algorithm for detection of critical findings in head CT scans, described in this paper.


Medical Imaging Search Tools

  • Open Access Biomedical Image Search Engine of the National Library of Medicine enables search and retrieval of abstracts and images (including charts, graphs, clinical images, etc.) from the open source literature, and biomedical image collections.

  • Yottalook is a free medical imaging search engine that queries a variety of radiology sources. Yottalook Web has been designed to search online radiology sources only. Yottalook Images has specially been designed to search radiology images from various peer-reviewed online sources and currently has access to over 800,000 images. Yottalook Journals allows you to search PubMed as well as only radiology journals. Yottalook Books search filters Google Books search results to provide links to radiology or imaging related books.

  • GoldMiner ARRS (does not work in Chrome, only IE) helps users find images and articles from peer-reviewed biomedical journals. Although intended primarily for health professionals and students, it is available to all users without charge.


Natural Language Processing (NLP) Datasets

  • Informatics for Integrating Biology & the Bedside (i2b2) will provide sets of fully deidentified patient notes (~1500) from the Research Patient Data Repository to the community for general research purposes.

  • Jacob Eisenstein's talks all things NLP, in particular focuses on a core subset of the natural language processing, unified by the concepts of learning and search.  


Continue to explore



Programming for Data Analytics


Healthcare Operations Research

More Resources

on the Web

bottom of page