Robert D. and Patricia E. Kern Institute for the Transformation of Medical Education

KERN Data Science Lab

The Data Science lab focuses on partnering with Kern Pillars and labs to support the process of aligning curriculum objectives with outcome measures to ensure important outcomes of interest are not missed or captured in a way that will bias conclusions. The lab supports scholars and conducts research.
Data Science Lab_Image Card Component

Data Science Lab Focuses:

Design and Implementation
Designing surveys and research studies with sensitive measurable outcomes is vital to understand the associations between variable, make causal claims and test theories. The goal is to help in all this from aligning curriculum objectives to with outcome measures that can detect subtle differences. The data lab helps support this process and ensure important outcomes of interest are not missed or captured in a way that will bias conclusions.
Building Data Pipelines
Building pipelines to get data out of data capture systems and into the hands of researchers, learners and teachers is an essential goal. The data science lab works to create frictionless systems, sharing data back with all types of users through access to data for research purposes in a secure and ethical manner. In this, one of the goals of the lab is to use open-source tools and share code back to these communities to contribute and evolve the field with good data 'hygiene' practices.
The ability to link datasets can provide valuable insight that could not be achieved if the datasets are kept separate. It is not feasible to capture all of the data in one study given time, money and practical constraints. To link data, important linkage variables (often referred to as unique and/or direct identifiers) are required at the time of data collection. The Kern data science lab helps ensure the language of primary educational data collection will facilitate data linkage through 'second-order' data analyses following the most update to date legislative guidelines, security and privacy standards.
Representing data graphically aids interpretation and provides an accessible way to discover patterns by leveraging innate cognitive and perceptual strategies. One goal is to support behavioral change and life-long learning using effective, accurate and digestible visualization strategies. This requires aligning cognitive and instructional design principles with relevant and important data points.

The KERN Data Science Mission

Innovation Through Experimentation and Play

  • This involves creating a safe environment to experiment with different pedagogical and curriculum strategies and understand its impact on learners and those that teach them.

  • Support the education of physicians and other health professionals as individuals, in groups and across the UME-GME-CME continuum.

  • Identify optimal ways to communicate relevant and important data to learners and teachers in an accessible and user-friendly manner. This includes visualization using cognitive, instructional design and learning principles.

Statistical Learning and Development

  • Evolve the understanding of statistics, research methodologies, and psychometrics (measurement).

  • Increase the use and application of statistical theory in practice by minimizing barriers. This includes bringing back Generalizability Theory, and the use of Modeling techniques to see if theory holds true with data.

  • Facilitate the use of correct analytic techniques for Likert data (e.g., pearson vs. polychoric correlations).

  • Contribute and evolve statistical theory using proof of concept and simulation to understand the value, biases and limits of particular statistical practices in small, institutional and population level studies.
  • Implementation of natural language processing in the analysis of open-ended text content.

  • Translate statistical theory into computational algorithms and open-source code so users can apply it to their own work.

Scholarship and Open-Source Sharing

  • Scholarly collaborations with other data scientists and institutions engaged in this work.

  • Develop and share data pipelines from commonly used systems (e.g., Qualtrics and Redcap), surveys/ measurement tools and data wrangling techniques through open-source platforms.

Make 'Data Reign' With Educational Data Repositories and Facilitate Data Linkage

  • Develop and manage an educational data repository of UME and GME learner data over time to support curriculum and program evaluation and research in medical education. This repository will support the transformative and evidence-based changes needed in the education of medicals students, improvement in physician health and wellbeing and ultimately patient outcomes.

  • Provide access and availability of data from the educational repository to researchers an anonymized fashion through a careful data access request process.

  • Examine and unpack student learning trajectories as they progress through a curriculum to understand how to help students who are struggling with some of the material and help facilitate their learning, evaluate curriculum components, and understand the role teachers play so they can continue to be impactful in the right ways.

  • Develop a data access request process to access data from the educational repository and for researchers to include their data for others to access. This requires creating appropriate technological systems for data linkage.

  • Conduct data linkage as a trusted third-party entity following ethical, privacy and secure data practices.

Data Science Lab Members


Tavinder K. Ark, PhD

Director, Data Science Lab, Kern Institute


Andrew Gleave

Software Engineer


Libby Ellinas, MD

Director, Center for the Advancement of Women in Science and Medicine; Associate Dean, Women's Leadership; Professor, Anesthesiology