3rd Workshop on Data Mining for Medical Informatics: Learning Health

Nov 12, 2016, Chicago, IL

To be held in conjunction with AMIA 2016 Annual Symposium

DMMI 2016 workshop is sponsored by AMIA Knowledge Discovery and Data Mining Working Group 

The life and biomedical sciences are massively contributing to the big data revolution, due to advances in genome sequencing technology and digital imaging, growth of clinical data warehouses, increased role of the patient in managing their own health information and rapid accumulation of biomedical knowledge. Under this context, data mining and machine learning techniques, with the goal of knowledge discovery and deriving data driven insights from various data sources, has played a more and more important role in medical informatics. Effective data mining approaches have been applied in many medical problems including drug development, personalized medicine, disease modeling, cohort study, comparative effectiveness research, etc. The main theme of the workshop this year is learning health, which aims to derive actionable and timely insights based on the real-world experience of millions of patients, and make them useful to clinicians, patients and all other healthcare stakeholders. This topic has received a lot of interests and debating recently. We would like to invite the researchers from both academia and industry who are interested in this topic to participate in this workshop, share their opinions and experience, as well as discuss future directions.

KDDM WG Data Competition. This is a new one-hour session that will be included in this year’s workshop: the KDDM WG Data Competition winner presentation. The task for the competition is surgical site infection prediction with a dataset extracted from a cohort of 7725 patients undergoing gastrointestinal surgery, with a total of more than 4.5 million blood tests. The data sponsor is University hospital of North Norway (UNN). The data will contain all blood tests performed on these patients close in time to the time of surgery, including their numerical or categorical value. Eighty percent of the data (training data) will be released to the participants for model development; and the rest of the data will be held out for evaluation purpose. Participants will use the training data to construct a predictive model for identification of high-risk patients susceptible to SSI. The performance of the participants will be evaluated through quantitative predictive performance on an evaluation dataset and qualitative clinical relevance. The winner will be announced at the workshop along with their presentations.
This year's DMMI workshop will be co-located with the 2016 American Medical Informatics Association (AMIA) Annual Symposium. For more information on the 1st or 2nd DMMI Workshop click here (2014, Washington, DC) or here (2015, San Francisco, CA). 

Topics and Scope

Topic areas for the workshop include (but are not limited to) the following: 

 Comparative study of different data mining methodologies in learning health

• Text mining and natural language processing in learning health

• Visual analytics and learning health

• Novel architectures for learning health systems

• Data quality assessment and improvement

• Pattern detection and hypothesis generation from observational data

• Privacy and security issues in learning health systems

• Information fusion and knowledge transfer in healthcare

• Evaluation and validation of learning health methods

• Mining temporal data for guiding timely decision making

• Methods for personalized diagnosis and treatment

Paper Submission and Format Guidelines

We encourage a diverse range of submissions and demonstrations from academic, healthcare organizations, and industry that addresses any of the topics listed above. Submissions can be for (1) paper / podium presentations, or (2) abstract / podium presentations.
  1. Paper submissions must be no more than six pages in length, inclusive of figures and references. 
  2. Abstract submissions are limited to two pages.
Papers should be formatted in AMIA format styles. Manuscripts must be submitted as Adobe Portable Document Format (PDF) files. Other file formats will not be accepted.

Full papers and abstracts must be submitted electronically through the EasyChair system at this link.

Selected submissions will be invited to International Journal of Big Data and Analytics in Healthcare (IJBDAH), the journal website is here and Journal of Health Informatics Research, the journal website is here .

Important Dates

Deadline for submission: September 30th, 2016
Notification of acceptance: October 10th, 2016
Camera-ready Papers Due: October 21st, 2016
Workshop: November 12th, 2016

Workshop Chairs






Fei Wang
Cornell University
Gregor Stiglic
University of Maribor
Mihaela van der Schaar
University of California, Los Angeles
David Sontag
New York University
Christopher C. Yang
Drexel University

Program Committee

  • Marzyeh Ghassemi. MIT.
  • Joyce Ho. Emory University.
  • Xia Hu. Texas A&M University.
  • Ying Li. IBM T. J. Watson Research Center.
  • Zitao Liu. Pinterest.
  • Inci M. Baytas. Michigan State University.
  • Robert Moskovitch. Ben-Gurion University.
  • Loakeim Perros. Georgia Institute of Technology.
  • Narges Razavian. New York University.
  • Yiye Zhang. Cornell University.
  • Jiayu Zhou. Michigan State University.

Workshop Schedule

 Type TimePresenter  Title
  8:30-8:35Jianying HuKDD WG Opening Remark
Invited Talk 8:35-9:20Jane SnowdonThe Power of Data in the Era of Cognitive Computing: 
The Next Frontier for Healthcare (Slides)
Long Paper Presentation 9:20-9:35Shao Fen Liang, Talya Porat, Archana Tapuria, Brendan Delaney and Vasa CurcinA Dynamic Medical Terminology Mapping System – MeTMapS
Long Paper Presentation 9:35-9:50Carlo Combi, Pietro Sala and Matteo MantovaniApproximate Functional Dependencies for expressing Trend-Event correlations: proposal and applications in the clinical domain
Long Paper Presentation 9:50-10:05Fabrício Kury and Olivier BodenreiderDesiderata for Drug Classification Systems for their Use in Analyzing Large Drug Prescription Datasets
Break 10:05-10:30BreakBreak
Invited Talk 10:30-11:15Justin StarrenMining Clinical Data: Why integrated repositories are the future
Long Paper Presentation 11:15-11:30Joseph Finkelstein and In Cheol JeongMining Tempotal Telemonitoring Data for Advanced Prediction of Asthma Exacerbations
Long Paper Presentation 11:30-11:45Naresh Sundar Rajan, Ramkiran Gouripeddi and Julio FacelliMeasuring Validity of Phenotyping Algorithms across Disparate Data using a Data Quality Assessment Framework
Lunch 11:45-13:00Lunch BreakLunch Break
Invited Talk 13:00-13:45Rema PadmanPaving the COWPath: Data-driven Service Innovations in Healthcare Delivery
Data Competition Presentation 13:45-13:50Eileen KoskiIntroduction of the Data Challenge
Data Competition Presentation 13:50-14:05Prabhu RV Shankar, Anupama Kesari, Kamalashree N, Priya Shalini, Charan Bharadwaj, Nitika Raj, Sowrabha Srinivas, Manu Shivkumar, MS, Anand Raj Ulle, MTech, Nagabhushan TagadurPredictive Modeling of Surgical Site Infections Using Sparse Laboratory Data 
Data Competition Presentation  14:05-14:20Prathyusha Mandagani, Shaun Coleman, Anam Zahid, Annie Pugel Ehlers, Senjuti Basu Roy, Martine De CockMachine Learning Models for Surgical Site Infection Prediction 
Data Competition Presentation  14:20-14:35Kendall ParkEvolving clinically-relevant decision trees to predict surgical site infections 
Data Competition Presentation  14:35-14:45All Audience and Presenters Q & A
Break  14:45-15:15BreakBreak
Short Paper Presentation  15:15-15:25Lisiane Pruinelli, Bonnie Westra, Karen Monsen and Gyorgy SimonA Novel Clustering Methodology to Address Liver Transplant Population Heterogeneity
Short Paper Presentation  15:25-15:35Bisakha RayAutomated Topic Detection of Messages in Online Health Forums
Short Paper Presentation  15:35-15:45Bo Jin, Haoyu Yang, Cao Xiao, Ping Zhang, Xiaopeng Wei and Fei WangMultitask Dyadic Prediction and Its Application in Prediction of Adverse Drug-Drug Interaction
Invited Talk  15:45-16:30Nitesh ChawlaLeveraging big healthcare data to answer important population health management questions

Invited Talks

Dr. Jane L. Snowdon is the Director, Watson Health Partnerships, for IBM.  She is responsible for building a partner ecosystem that aims to transform the medical field, and improve both patient care and individual wellness by creating new solutions using Watson Health, Apple Research Kit and Apple Health Kit.  She is an Advisory Board member for The Georgia Institute of Technology.

Prior to this role, Jane L. Snowdon was Chief Innovation Officer, IBM U. S. Federal Government, in Washington DC. She was responsible for developing and driving innovation strategy and defining offerings that combine client mission requirements with IBM products, services, IBM Research's technology investments, and Federal Systems Integrator partners. Jane was the Director of IBM's Federal Cloud Innovation Center in Washington DC. She co-chaired the Cyber Security Education and Workforce Development Working Group with the Department of Homeland Security (DHS) and the National Institute of Standards and Technology (NIST).  Jane was a member of the Intelligence and National Security Alliance (INSA) Council on Technology and Innovation and DHS's Innovation in Acquisitions Working Group.  Jane served as an Advisory Board member for the Center of Innovation and Entrepreneurship at George Mason University.
Title: The Power of Data in the Era of Cognitive Computing: The Next Frontier for Healthcare

Researchers and clinicians are increasingly aware that speeding up the quest for cures may hinge on the ability to make sense of vast, complex, and ever-changing information. Diagnosis and treatments require a tremendous understanding of medical literature, population health trends, patient histories, genetics, social determinants, and more. Cognitive systems can empower researchers and clinicians to deliver insights to their patients – faster and easier than previously possible. This talk will (a) introduce the basic concepts of cognitive computing and informatics in healthcare decision support, and (b) describe case studies where cognitive computing assists doctors in developing individualized, evidence-based treatment options for patients; enhances clinicians’ ability to find clinical trials for which their patient may be eligible; and discover ways to help oncologists and radiologists quickly and accurately analyze medical images to improve diagnosis and treatment.
Dr. Nitesh Chawla, Ph.D., is Frank M. Freimann Professor of Computer Science & Engineering and 
Director of The Interdisciplinary Center for Network Science & Applications (iCeNSA) at the University of Notre Dame. He is passionate about Big Data for the Common Good. His research is making fundamental advances in network science and data science, especially in the areas of link prediction and co-evolution in networks, inter-genre networks, anomaly detection, learning from imbalanced data, non-stationary data, and evaluation issues for machine learning and data mining algorithms. His research is bridging disciplinary boundaries for transformative applications in healthcare, education, environment, and national security --- technology meets society to augment human intelligence and creativity.
Title: Leveraging big healthcare data to answer important population health management questions

Abstract: The availability of big data in healthcare and medicine is presenting unprecedented opportunities to advance in both personalized healthcare and population health management. In this talk, I will provide two examples of leveraging electronic medical records and claims data to draw insights into population health from both resource management and procedures perspective. I will discuss a network-based analysis drawing on nationwide healthcare data, which includes a novel metric to identify diagnosis comorbidity pairs between two generalized population subgroups, which can be particularly valuable in providing resource planning and targeted care for individuals from specific populations. Secondly, I will demonstrate how aggregate population-level Medicare data can provide value for physicians themselves, demonstrating how big data can be aggregated from multiple sources to provide insights into highly complex matters such as procedure choice.

 Dr. Rema Padman is a Professor of Management Science & Healthcare Informatics, Heinz College of Carnegie Mellon University. Professor Padman's research addresses problems at the interface of healthcare, information technology and management science, particularly healthcare information systems, operational planning and management, and data mining and decision support methods. Her current research in the healthcare domain investigates data mining methods for healthcare decision support; evaluating the use and impact of information technology and systems in healthcare environments, particularly for point-of-care disease management; and, examining tradeoffs between access and confidentiality in large multidimensional public-use and healthcare databases. Her research on these topics has been funded by the National Science Foundation, National Library of Medicine, DARPA, and the Army Research Office.Title: “Paving the COWPath: Data-driven Service Innovations in Healthcare Delivery”

Abstract: Addressing clinical challenges in assessing and responding to many patients’ risks of chronic diseases and related complications and their progression are complex, high-dimensional, information processing problems faced by time-constrained clinicians. This talk presents recent research on data-driven service innovations that indicate promising potential to deliver substantial cognitively-guided information to clinicians and patients for improving health care delivery and outcomes. We combine statistical machine learning, information visualization, and electronic health data to find (1) informative, contextualized, two-dimensional projections of disease risk assessment, (2) longitudinal trajectories of disease progression, and, (3) clinical pathways of the co-progression of multiple clinical events that are associated with chronic disease management. Insights from these studies can potentially result in new evidence to support clinicians in providing patient-centered treatment approaches and empower patients with chronic conditions to better manage the disease and its complications
Dr. Justin Starren is the Chief of Preventive medicine-health and Biomedical Informatics in the Department of Preventive Medicine, Associate Professor of Preventive Medicine (Health and Biomedical Informatics)and Medical Social Sciences, Northwestern University Feinberg School of Medicine. His current research continues to focus on new ways to make health care computing more useful. This includes developing intuitive, novel Human Computer Interfaces (HCI) for health care, including working the design of graphical icons for clinical applications, addressing data overload for clinicians and issues in affective computing. A related line of research is developing methods for the integration of clinic research computing into clinical care.
Title: Mining Clinical Data:  Why integrated repositories are the future.
Many institutions are creating data warehouses to support research and clinical operations.  In most instances research warehouses are partial copies of the operational data.  Alternately, researchers can only access the operational data by working through clinical IT staff.  The Northwestern Medicine Enterprise Data Warehouse uses a different model.  It is a single, integrated repository of clinical and research data on six million patients.  From its initial design, it has served as a single, common repository for both research and clinical operations.  This talk will discuss the structure and governance of this unusual model.  We will present the benefits and challenges of this model in practice. Having a common repository allows research results to move more rapidly into practice.  This talk will also discuss a number of projects that demonstrate model.  

Fei Wang,
Nov 10, 2016, 9:57 PM
Fei Wang,
Nov 12, 2016, 12:14 PM