1st Workshop on Data Mining for Medical Informatics: Electronic Phenotyping

Nov 15, 2014, Washington, DC

To be held in conjunction with AMIA 2014 Annual Symposium


The 2014 Workshop on Data Mining for Medical Informatics provides an opportunity for participants to discuss state-of-the-art data mining techniques and review how such techniques can be applied to clinical data.
The main theme we identified this year is electronic phenotyping, which aims to define inclusion and exclusion criteria that algorithmically select sets of patients based on stored clinical data. This is a broad topic and there has been a lot of interests and debating on it recently. The objectives of this workshop are to:
  • Bring together researchers (from both academia and industry) as well as practitioners to present their experience and ideas.
  • Attract healthcare providers who have access to interesting sources of data and problems but lack the expertise in data mining to use the data effectively. 
  • Enhance interactions between data mining and medical informatics communities working on problems from medicine and healthcare.
This year's DMMI workshop will be co-located with the 2014 American Medical Informatics Association (AMIA) Annual Symposium.

Topics and Scope

Topics of interests include but are not limited to:
  • Discussion on different data mining techniques for electronic phenotyping
  • Text mining - mining free text in electronic medical records 
  • Visual analytics for high throughput phenotype discovery 
  • Novel architectures for facilitating high-throughput electronic phenotyping
  • Cost-benefit analyses of electronic phenotype identification 
  • Data quality assessment and improvement 
  • Pattern detection and hypothesis generation from observational data 
  • Privacy and security issues in healthcare 
  • Information fusion and knowledge transfer in healthcare 
  • Evolutionary and longitudinal patient and disease models 
  • Evaluation and validation of electronic phenotypes

Paper Submission and Format Guidelines

We encourage a diverse range of submissions and demonstrations from academic, healthcare organizations, and industry that addresses any of the topics listed above. Submissions can be for (1) paper / podium presentations, or (2) abstract / podium presentations.
  1. Paper submissions must be no more than six pages in length, inclusive of figures and references. 
  2. Abstract submissions are limited to two pages.
Papers should be formatted in AMIA format styles. Manuscripts must be submitted as Adobe Portable Document Format (PDF) files. Other file formats will not be accepted.

Full papers and abstracts must be submitted electronically through the EasyChair system at the following link: https://easychair.org/conferences/?conf=dmmi2014

Important Dates

Deadline for submission: September 7th, 2014
Notification of acceptance: October 2nd, 2014
Camera-ready Papers Due: October 17th, 2014
Workshop: November 15th, 2014

Workshop Chairs

Niels Peek
University of Manchester
Nigam Shah
Stanford University
Gregor Stiglic
University of Maribor
Fei Wang
IBM T.J. Watson Research Center

Program Committee

Mohsen Bayati, Stanford University

Riccardo Bellazzi, University of Pavia

Adam Davey, Temple University

Joydeep Ghosh, University of Texas, Austin

Tudor Groza, The University of Queensland

Joyce Ho, University of Texas, Austin

John Holmes, University of Pennsylvania

Jianying Hu, IBM T.J. Watson Research Center

Siddhartha Jonnalagadda, Northwestern University

Jin-Dong Kim, Database Center for Life Science

Robert Moskovitch, Columbia University

Zoran Obradovic, Temple University

Mykola Pechenizkiy, Eindhoven University of Technology

Mattia Prosperi, University of Manchester

Lucia Sacchi, University of Pavia

Suchi Saria, Johns Hopkins University

Nicholas Tatonetti, Columbia University

Workshop Schedule

Nov 15, Saturday

8:30 – 8:40

Workshop Opening [Slides]

8:40 – 9:20


Joshua Denny: The success, challenge, and promise of EHR Phenotypes for medical and genomic research [Slides]

9:20 – 10:45

State of the Art (invited talks and panel discussion)

Shawn Murphy, Patrick Ryan, Jyoti Pathak, Maryan Zirkle, Joshua Denny


Shawn Murphy: Instrumenting the Healthcare Enterprise with High Quality Phenotypes [Slides]

Patrick Ryan: Standardizing the definition and implementation of phenotypes to enable systematic observational analysis: Lessons from OHDSI [Slides]

Jyoti Pathak: EHR-driven high-throughput phenotyping: The role of standards and metadata [Slides]

Maryan Zirkle: PCORnet: Managing EHR Phenotypes [Slides]

10:45 – 11:00

Coffee Break

11:00 – 12:10

Mining Careflows of Breast Cancer Patients [Paper] [Slides]

Lucia Sacchi, Arianna Dagliati and Riccardo Bellazzi

Risk-Associated Temporal Clinical pathways in T2D Patients [Paper] [Slides]

Arianna Dagliati, Lucia Sacchi, Daniele Segagni, Paola Leporati, Luca Chiovato and Riccardo Bellazzi


Using narratives as a source to automatically learn phenotype models [Paper] [Slides]

Vibhu Agarwal, Paea Lependu, Tanya Podchiyska, Rick Barber, Mary Boland, George Hripcsak and Nigam Shah


Automated Extraction of Date of Cancer Diagnosis from EMR Data Sources [Paper] [Slides]

Jeremy Warner, Lucy Wang, Ravi Atreya, Pam Carney, Joe Burden and Mia Levy

12:10 – 13:10

Lunch Break

13:10 – 14:15

Prediction of Clinical Procedures via Time Intervals Mining [Paper]

Robert Moskovitch, Colin Walsh, George Hripcsak, and Nicholas Tatonetti


Using Anchors to Estimate Clinical State without Labeled Data [Paper]

Yoni Halpern, Youngduck Choi, Steven Horng and David Sontag


High-throughput Phenotyping on Electronic Health Records using Multi-Tensor Factorization [Paper] [Slides]

Jimeng Sun, Joydeep Ghosh, Abel Kho, Joshua Denny and Bradely Malin


Computational discovery of physiomes in critically ill children using deep learning [Paper] [Slides]

David Kale, Zhengping Che and Yan Liu

14:15 – 15:15

Discussion on open problems and future directions

George Hripcsak: The Physics of the Medical Record

15:15 – 15:30

Coffee Break

15:30 – 16:15


Iain Buchan: Data-responsive Phenotyping for Healthcare [Slides]

16:15 – 16:30

Closing Remarks

Invited Speakers

Iain Buchan, University of Manchester

Iain Buchan is Professor in Public Health Informatics at the University of Manchester, where he founded and leads the Centre for Health Informatics, and directs the MRC Health eResearch Centre (www.herc.ac.uk) for North England. He also co-directs the UK’s national Farr Institute for Health Informatics Research (www.farrinstitute.org), which is building capacity in Health Data Science to enable large scale research over linked health data.
Iain holds qualifications in clinical medicine, pharmacology, public health and computational statistics. His research focuses on harnessing linked health data in statistically comprehensive ways to scale up and speed up scientific research and care service development in tandem. He is also interested in the use of mobile and ubiquitous technologies to generate richer longitudinal phenotypes and support preventive and self-care interventions in everyday life.
He champions interdisciplinary problem solving, for example helping clinicians, statisticians, epidemiologists, informaticians and software engineers to work together in a loop of inductive (with machine learning) and deductive (with biostatistical modelling) approaches to research questions.
Iain has also written widely used a statistical package (www.statsdirect.com) and takes a hands on approach to developing new methods and tools. He is a strong advocate of training and capacity development to grow Informatics approaches to solving public health problems.

Joshua Denny, Vanderbilt University

Josh Denny, M.D., M.S., FACMI is an associate professor in the Departments of Biomedical Informatics and Medicine. He completed an internal medicine residency as a Tinsley Harrison Scholar at Vanderbilt. His interest in medical informatics began while in medical school with the development of a concept-based curriculum database to improve medical education. Other interests include natural language processing, accurate phenotype identification from electronic medical record data, and using the electronic medical record to discover genome-phenome associations to better understand disease and drug response, including the development of the EMR-based phenome-wide association (PheWAS).  Nationally, he is part of the Electronic Medical Records and Genomics (eMERGE) Network and eMERGE Coordinating Center, Pharmacogenomics Research Network (PGRN), and the Pharmacogenomics of very large populations (PGPop) network.  At Vanderbilt, he is also part of the PREDICT (Pharmacogenomic Resource for Enhanced Decisions in Care and Treatment) program, which prospectively genotypes patients to tailor drug response. Dr. Denny serves on several local committees and remains active in teaching medical students and clinical roles.  He received the Homer Warner award in 2008 and as a co-investigator in 2009.  He received the AMIA New Investigator Award in 2012 and was elected into the American College of Medical Informatics in 2013.

George Hripcsak, Columbia University

George Hripcsak, MD, MS, is Vivian Beaumont Allen Professor and Chair of Columbia University’s Department of Biomedical Informatics, Director of Medical Informatics Services for NewYork-Presbyterian Hospital, and Senior Informatics Advisor at the New York City Department of Health and Mental Hygiene. Dr. Hripcsak is a board-certified internist with degrees in chemistry, medicine, and biostatistics. He led the effort to create the Arden Syntax, a language for representing health knowledge that has become a national standard. Dr. Hripcsak’s current research focus is on the clinical information stored in electronic health records. Using data mining techniques such as machine learning and natural language processing, he is developing the methods necessary to support clinical research and patient safety initiatives. As Director of Medical Informatics Services, he oversees a 7000-user, 2.5-million-patient clinical information system and data repository. He is currently co-chair of the Meaningful Use Workgroup of HHS’s Office of the National Coordinator of Health Information Technology; it defines the criteria by which health care providers collect incentives for using electronic health records. Dr. Hripcsak was elected fellow of the American College of Medical Informatics in 1995 and served on the Board of Directors of the American Medical Informatics Association (AMIA). As chair of the AMIA Standards Committee, he coordinated the medical-informatics community response to the Department of Health and Human Services for the health-informatics standards rules under the Health Insurance Portability and Accountability Act of 1996. Dr. Hripcsak chaired the National Library of Medicine’s Biomedical Library and Informatics Review Committee, and he is a fellow of the American College of Medical Informatics and the New York Academy of Medicine. He has published over 200 papers.

Shawn Murphy, Harvard University

Shawn Murphy, MD, Phd is the Director of Research Information Systems and Computing at Partners Healthcare, is an Associate Professor of Neurology at Harvard Medical School, and serves as Associate Director for the Laboratory of Computer Science at the Massachusetts General Hospital.  Dr. Murphy developed the Research Patient Data Registry (RPDR) for Partners Healthcare. Dr. Murphy is also chief of software development for the NIH-sponsored Informatics for Integrating Biology and the Bedside (i2b2), an open source project that integrates data from the hospital medical record and the bioinformatics community into a common software platform, with over 120 operating installations worldwide. The work of i2b2 is focused on strengthening the understanding of the metabolic and genetic underpinnings of complex diseases by developing an informatics framework to bridge data for clinical research using electronic health records.

Patrick Ryan, Jannsen Research and Development

Patrick Ryan, PhD is the Head of Epidemiology Analytics at Janssen Research and Development, where he has leading efforts to develop and apply analysis methods to better understand the real-world effects of medical products. He is currently a collaborator in Observational Health Data Sciences and Informatics (OHDSI),  a multi-stakeholder, interdisciplinary collaborative to create open-source solutions that bring out the value of observational health data through large-scale analytics.  He served as a principal investigator of the Observational Medical Outcomes Partnership (OMOP), a public-private partnership chaired by the Food and Drug Administration.  As part of OMOP, he led methodological research to assess the appropriate use of observational health care data to identify and evaluate drug safety issues. Patrick received his undergraduate degrees in Computer Science and Operations Research at Cornell University, his Master of Engineering in Operations Research and Industrial Engineering at Cornell, and his PhD in Pharmaceutical Outcomes and Policy from University of North Carolina at Chapel Hill. Patrick has worked in various positions within the pharmaceutical industry at Pfizer and GlaxoSmithKline, and also in academia at the University of Arizona Arthritis Center.