4th Workshop on Data Mining for Medical Informatics:
Causal Inference for
Health Data Analytics

Nov 4, 2017, Washington, DC

To be held in conjunction with the AMIA 2017 Annual Symposium

DMMI 2017 workshop is sponsored by AMIA Knowledge Discovery and Data Mining Working Group 

The biomedical sciences and healthcare are contributing significantly to the big data revolution through advances in genomic sequencing technology and imaging, clinical and personally-generated data. Data mining and machine learning techniques have played an increasingly important role in medical informatics with the goal of discovering knowledge and insights from various data sources. Causal inference is an important methodological pool from which one can draw powerful techniques for knowledge discovery and data-driven insights. Causal discovery methods were developed to address the financial and ethical concerns associated with randomized controlled trials. An attestation to their significance is that they have been recognized with the Turing Award in computer science and the Nobel Prize in economics, and have led to exciting interdisciplinary research in statistics, philosophy, social sciences, and neuroscience. Discovery of causality is a major goal in basic, translational and clinical science. In computational biology, neuroscience, epidemiology and biomedicine one often faces the daunting task of finding causal relationships in very large-dimensional data. This highlights the necessity to develop and evaluate algorithms and tools to improve the current state of the art in causal discovery from experimental, quasi-experimental and non-experimental (i.e., observational) data.

The main theme of the workshop this year is causal inference for health data analytics, which aims to address both the theoretical and experimental underpinnings of these methods. This includes development and applications of the methods and discussions on how to make them practically useful to clinicians, patients and other healthcare stakeholders. This topic is timely and has received a lot of interest recently. We would like to invite researchers from both academia and industry who are interested in this topic to participate in this workshop, share their opinions and experience, as well as discuss future directions.

This year's DMMI workshop will be co-located with the 2017 American Medical Informatics Association (AMIA) Annual Symposium.
For more information on the prior DMMI Workshops click here (2014, Washington, DC), here (2015, San Francisco, CA), or here (2016, Chicago, IL).

Topics and Scope

Topic areas for the workshop include (but are not limited to) the following: 
  • Research design for causal inference in real-world data
  • Causal structure discovery in large-scale, observational data
  • Intersection of machine learning and causal inference
  • Real-world medical and health applications of causal analysis
  • Causal inference from personally-generated data and surrogate data sources
  • Causal inference software and tools

Paper Submission and Format Guidelines

We encourage a diverse range of submissions and demonstrations from academic, healthcare organizations, and industry that addresses any of the topics listed above. Submissions can be for (1) paper / podium presentations, or (2) abstract / podium presentations.
  1. Paper submissions must be no more than six pages in length, inclusive of figures and references. 
  2. Abstract submissions are limited to two pages.
Papers should be formatted in AMIA format styles. Manuscripts must be submitted as Adobe Portable Document Format (PDF) files.
Other file formats will not be accepted.

Full papers and abstracts must be submitted electronically through the EasyChair system at this link:

Selected submissions will be invited to the Journal of Health Informatics Research, the journal website is here.

Important Dates

Deadline for submission: August 31, 2017
Notification of acceptance: September 15, 2017
Camera-ready Papers Due: October 20, 2017
Workshop: November 4, 2017

Workshop Organizing Committee






Kenney Ng
IBM Research
Bisakha Ray
New York University
SiSi Ma
University of Minnesota
Kun Zhang
Carnegie Mellon University
Fei Wang
Cornell University

Program Committee

  • Zach Shahn, IBM Research
  • Cao Xiao, IBM Research
  • Erich Kummerfeld, University of Minnesota
  • Chih-Lin Chi, University of Minnesota
  • Narges Razavian, NYU School of Medicine
  • Himanshu Grover, NYU School of Medicine

Workshop Location

  • Date/Time: 8:30 AM - 4:30 PM, November 4, 2017
  • Location: International Ballroom East, Washington Hilton (floorplan)
  • Links:
    • https://www.amia.org/amia2017/workshops
    • https://amia2017.zerista.com/event/member/388947

Workshop Schedule

W07: Data Mining for Medical Informatics (DMMI) – Causal Inference for Health Data Analytics
(sponsored by the Knowledge Discovery and Data Mining Working Group)

 Type     Time Presenter Title
Opening08:30-08:35Eileen KoskiKDDM WG Opening Remarks
Welcome08:35-08:45Kenney NgWelcome
Invited Talk08:45-09:30Miguel HernanAn algorithm for causal inference from observational data
Long Paper09:30-09:45Erich KummerfeldA New Method for Estimating Causal Model Learning Accuracy
Long Paper09:45-10:00Mahdi NaeiniAn Assessment of the Calibration of Causal Relationships Learned Using RFCI and Bootstrapping
Long Paper10:00-10:15Zach ShahnSelf-Controlled Structural Nested Mean Models
Coffee Break10:15-10:30 Coffee Break
Invited Talk10:30-11:15David DanksCausal discovery from time series data: From theory to application
Long Paper11:15-11:30Yanick BriceMeaningful Use of Electronic Health Record and Patient Utilization Outcomes within 30 Days of Hospital Discharge
Long Paper11:30-11:45Jinghe ZhangExploring the Causal Relationships between Initial Opioid Prescriptions and Outcomes
Short Paper11:45-12:00Ryan SandeferDiscovering relationships in consumer use of personal health information: A causal modeling approach 
Lunch Break12:00-13:00 Lunch Break
Invited Talk13:00-13:45Constantin AliferisCausal Feature Selection
Long Paper13:45-14:00Scott MalecUsing the Literature to Construct Causal Models for Pharmacovigilance
Long Paper14:00-14:15Subramani ManiCausal Discovery from Pediatric Infectious Disease Protein Biomarker Data
Long Paper14:15-14:30Ali BakhtiariInferring immune cell subtype interactions from gene expression of immune related genes in tumor microenvironment
Coffee Break14:30-15:00 Coffee Break
Short Paper15:00-15:15Victor CastroIdentifying Causal Effects of Medication Exposures from Electronic Health Record Data with i2b2
Long Paper15:15-15:30Jeremy EspinoThe Causal Modeling and Discovery Software Suite
Invited Talk15:30-16:15Gregory CooperGraphical Causal Discovery from Big Biomedical Data
Closing16:15-16:20Kenney NgClosing Remarks

Invited Speakers




Prof. Miguel HernanAn algorithm for causal inference from observational dataMaking decisions among several courses of action requires knowledge about the causal effects of each action. Randomized experiments are the preferred method to quantify those causal effects. When randomized experiments are not feasible or available, causal effects are estimated from non-experimental or observational databases. Therefore, causal inference from observational databases can be viewed as an attempt to emulate a hypothetical randomized experiment—the target experiment or target trial—that would quantify the causal effect of interest. This talk outlines a general algorithm for causal inference using observational databases that makes the target trial explicit. This causal framework channels counterfactual theory for comparing the effects of sustained treatment strategies, organizes analytic approaches, provides a structured process for the criticism of observational analyses, and helps avoid common methodologic pitfalls.

Prof. David DanksCausal discovery from time series data: From theory to application Many biomedical and scientific investigations aim to understand a system that dynamically changes over time, such as neural processes in the brain, development of cancerous tissues, or health trajectories for diverse patients. Moreover, we frequently need to learn causal models of the underlying dynamical systems, as we want to not only predict their behavior, but also design effective interventions--actions, policies, modifications, and so forth--to control them and achieve desired health outcomes. In this talk, I will discuss different strategies for causal discovery from dynamical or time series data, with a particular focus on learning from complex types of data or datasets. There are important considerations that must be resolved in advance of the causal discovery process, but these can be encapsulated in a few key questions. Finally, I will provide some biomedical examples of causal discovery from time series data.

Prof. Constantin Aliferis

Causal Feature SelectionCausality is not only important for designing interventions that will steer a system of interest to a desired state, but also has a central importance for feature selection for predictive modeling. This talk will discuss Markov Boundary inference as a solution to the vanilla feature selection problem. We will first describe theoretical foundations, then describe algorithmic approaches and finally we will examine empirical results from a variety of domains that test how well theoretical expectations are reflected in real-world data analysis. We will also contrast causal with non-causal feature selection both theoretically and empirically.

Prof. Gregory Cooper

Graphical Causal Discovery from Big Biomedical DataScience is centrally concerned with the discovery of causal relationships in nature. In the past 25 years there has been tremendous progress in the development of graphical methods for representing and discovering causal relationships from data, including big biomedical data. The Center for Causal Discovery (CCD) is developing and making available state-of-the-art graphical causal discovery software that is capable of analyzing very large biomedical datasets. This talk will present a brief overview of the CCD, an introduction to graphical causal discovery methods, and several examples of the use of these methods in analyzing biomedical data.

Post Workshop Survey

Please complete the following survey after the workshop to provide feedback to improve future workshops: https://www.surveymonkey.com/r/AMIA2017_W07