5th Workshop on Data Mining for Medicine and Healthcare

May 7, 2016, Miami, FL

To be held in conjunction with 16th SIAM International Conference on Data Mining (SDM 2016)


In virtually every country, the cost of healthcare is increasing more rapidly than the willingness and the ability to pay for it. At the same time, more and more data is being captured around healthcare processes in the form of Electronic Health Records (EHR), health insurance claims, medical imaging databases, disease registries, spontaneous reporting sites, and clinical trials. As a result, data mining has become critical to the healthcare world. On the one hand, EHR offers the data that gets data miners excited, however on the other hand, is accompanied with challenges such as 1) the unavailability of large sources of data to academic researchers, and 2) limited access to data-mining experts. Healthcare entities are reluctant to release their internal data to academic researchers and in most cases there is limited interaction between industry practitioners and academic researchers working on related problems.

The objectives of this workshop are:

1. Bring together researchers (from both academia and industry) as well as practitioners to present their latest problems and ideas.

2. Attract healthcare providers who have access to interesting sources of data and problems but lack the expertise in data mining to use the data effectively.

3. Enhance interactions between data mining, text mining and visual analytics communities working on problems from medicine and healthcare.

Topics of Interest

Topic areas for the workshop include (but are not limited to) the following: 

• Statistical analysis and characterization of healthcare data
• Text mining - mining free text in electronic medical records
• Visual analysis and exploration of longitudinal clinical trial data
• Meaningful use of healthcare data for improved patient care and cost-reduction
• Data quality assessment and improvement: preprocessing, cleaning, missing data treatment etc.
• Pattern detection and hypothesis generation from observational data
• Visualization of prescriptions drugs and interactions
• Privacy and security issues in healthcare
• Information fusion and knowledge transfer in healthcare
• Evolutionary and longitudinal patient and disease models
• Medical fraud detection
• Help with ICD 9 to ICD 10 conversions
• Health Information exchanges

Program Committee

Mohamed Ghalwash, Temple University
Andreas Holzinger, Medical University Graz
Robert Moskovitch, Columbia University
Mykola Pechenizkiy, Eindhoven University of Technology
Niels Peek, University of Manchester
Igor Pernek, Research Studios Austria
Chandan K. Reddy, Wayne State University
Stein Olav Skrøvseth, University Hospital of North Norway
Cristina Soguero Ruiz, Rey Juan Carlos University
Suzanne Tamang, Stanford University
Ping Zhang, IBM T.J. Watson Research
Jiayu Zhou, Michigan State University

Important Dates

Paper Submission: 30 Sep, 2016 

Notification of Acceptance: 10 Oct, 2016

Camera Ready Paper Due: 21 Oct, 2016

Workshop: 12 Nov, 2016

Submission Information

All submissions must be made electronically at https://easychair.org/conferences/?conf=sdmdmmh2016.

Papers submitted to this workshop must not have been accepted or be under review by another conference with a published proceedings or by a journal. The work may be either theoretical or applied.

The workshop accepts short (4-6 pages) and long papers (up to 9 pages) with US Letter (8.5" x 11") paper size (single-spaced, 2 column, 10 point font, and at least 1" margin on each side). Papers must have an abstract with a maximum of 300 words and a keyword list with no more than 6 keywords.

We would like to encourage you to prepare your paper in LaTeX2e. Papers should be formatted using the SIAM SODA macro, which is available through the SIAM website. You can access it at http://www.siam.org/proceedings/macros.php. The filename is soda2e.all. Make sure you use the macros for SODA and Data Mining Proceedings; papers prepared using other proceedings macros will not be accepted.

For Microsoft Word users, please convert your document to the PDF format.  Since there is no Microsoft Word Template, please visit http://www.siam.org/proceedings/ to view the format of previous papers.

All submissions should clearly present the author information including the names of the authors, the affiliations and the emails.

Workshop Schedule

May 7, Saturday

8:30 – 8:40

Workshop Opening

8:40 – 9:30

Invited talk I (Mykola Pechenizkiy, Eindhoven University of Technology)

9:30 – 10:00

Coffee Break

10:00 – 12:00

Paula Lauren, Guangzhi Qu and Feng Zhang

Discriminant Word Embeddings on Clinical Narratives


Flavio Bertini, Giacomo Bergami, Danilo Montesi and Paolo Pandolfi

Predicting frailty in elderly people using socio-clinical databases


Milan Vukicevic, Sandro Radovanović, Gregor Stiglic, Boris Delibašić, Sven Van Poucke and Zoran Obradovic

A Data and Knowledge Driven Randomization Technique for Privacy-Preserving Data Enrichment in Hospital Readmission Prediction


Wei Ye, Bianca Wackersreuther, Christian Boehm, Michael Ewers and Claudia Plant

IDEA: Integrative Detection of Early-stage Alzheimer’s disease


*Giulia Toti, Ricardo Vilalta, Peggy Lindner and Daniel Price

Effect of the Definition of Non-Exposed Population in Risk Pattern Mining


12:00 – 13:30

Lunch Break (on your own)

13:30 – 14:20

Invited talk II (Mitsunori Ogihara, University of Miami)

14:20 – 15:00

*Stephanie L. Hyland, Theofanis Karaletsos and Gunnar Rätsch

Knowledge Transfer with Medical Language Embeddings


*Arman Cohan, Luca Soldaini and Nazli Goharian

Identifying Significance of Discrepancies in Radiology Reports


15:00 – 15:15

Coffee Break

15:30 – 16:25

Jialiang Jiang, Sharon Hewner and Varun Chandola

Exploiting Hierarchy in Disease Codes - A Healthcare Application of Tree Structured Sparsity-Inducing Norms


Xiaoli Liu, Peng Cao, Dazhe Zhao and Arindam Banerjee

Multi-task Spare Group Lasso for Characterizing Alzheimer's Disease


*Thomas Quisel, Luca Foschini and Alessio Signorini

Behavioral Phenotyping of Digital Health Tracker Data


16:25 – 17:15

Invited talk III (Rave Harpaz, Oracle Health Sciences)

17:15 – 17:20


*Short papers will have 15 minutes for presentation and 5 min for questions (long papers 20 + 5).

Invited Speakers

Mykola Pechenizkiy
Eindhoven University of Technology

Title: Predictive Analytics that Works!?

Application-driven research in predictive analytics contributes to the massive automation of the data-driven decision making and decision support.  As data mining researchers and data scientists we often have a (false) believe that our techniques are immediately applicable for solving real problems, and have no bad intents; and thus we can keep our focus on developing novel techniques pushing for higher and higher accuracy of predictive models. Some of us study how to make them more robust or adaptive to changes in known and hidden contexts, others – how to facilitate privacy-preserving or privacy-aware analytics. In the first part of my talk, I will overview some of such practical issues that matter in real applications and relate them to the current state of the art in predictive analytics research.
However, recent reports as e.g. 2014 Whitehouse Review of Big Data argue that "big data technologies can cause societal harms beyond damages to privacy”, that data-driven decisions could have discriminatory effects even in the absence of discriminatory intent, that there are threats of opaque decision-making and call for a thorough studying of these threats and of methods to address them. In the second part of my talk I will revisit these concerns in the context of the personalized medicine research with the goal to highlight why the general public, domain experts or policy makers may consider predictive analytics as a thread. I will present my subjective view on what questions need to be included into the data science research agendas for gaining a deeper understanding what it means for predictive analytics to be ethics-aware and accountable and how we can achieve this. 

Mykola Pechenizkiy is Associate Professor in Predictive Analytics at the Department of Computer Science, Eindhoven University of Technology (TU/e), the Netherlands. He received his PhD in Computer Science from the University of Jyvaskyla, Finland in 2005. Since June 2013 he is also Adjunct Professor in Data Mining for Industrial Applications there. His expertise and research interests are in predictive analytics and knowledge discovery from evolving data, and in their application to real-world problems in industry, commerce, medicine and education. He develops generic frameworks and effective approaches for designing adaptive, context-aware predictive analytics systems. He has actively collaborated on this with industry. He has co-authored over 100 peer-reviewed publications and co-organized several workshops, conferences, special issues, and tutorials in these areas. He served as the chair of the steering committee of Computer-Based Medical Systems (CBMS) conference series in 2012-2016. As a panelist and an invited speaker he has been advocating for the ethics-aware predictive (learning) analytics research at several recent events, including the FATML@ICML 2015 and NSF IRB Privacy and Big Data workshops and the EDM 2015 conference.

Mitsunori Ogihara
University of Miami

Title: Computational Analysis of Biological, Social, and Medical Data

In the Center for Computational Science at the University of Miami a number of collaborative research projects are ongoing which demands new algorithms for data exploration. In this talk I will present some of the projects for analyzing biological, social, or medical data.  In particular I will present onoging research on: a) large-scale RNA-seq data analysis, b) log-note analysis of community service program brokers, and c) anonymized medical record data.

Mitsunori Ogihara received a Ph.D. in Information Sciences from Tokyo Institute of Technology in 1993. From 1994 to 2007 he was a faculty member in the Department of Computer Science at the University of Rochester, Rochester, NY, where he received tenure in 1998, became a full professor in 2002, and served as department chair from 1999 to 2007. He is currently a professor of Computer Science at the University of Miami, Coral Gables, FL. At the University of Miami he is also serving as Director of Data Mining in the Center for Computational Science, a university-wide organization to promote computing. Since 2012 he is serving as Associate Dean for Digital Library Innovation in the Otto G. Richter Library and in the College of Arts and Sciences. Ogihara has published three books and over 170 articles in journals, books, and conference proceedings. He is a recipient of an NSF CAREER Award and two patents. He serves on the editorial board for Theory of Computing Systems (Springer) and International Journal of Foundations of Computer Science (World Scientific Press).

Rave Harpaz
Oracle Health Sciences

Title: Enhanced Detection of Adverse Drug Reactions using Multiple Data Sources

The increasing harm and monetary burden associated with adverse drug reactions (ADRs) has made the need to strengthen post-marketing drug safety surveillance (DSS) a top priority for health systems worldwide. A key element for improving DSS is the development of enhanced computational approaches for ADR detection. A confluence of recent technological and scientific developments has made new kinds of observational, experimental, and knowledge-based data available for DSS applications, including ADR detection.
In the first part of the presentation I will discuss the opportunities, challenges, and methodologies associated with ADR detection using some of these new data sources, such as:  electronic health records, the biomedical literature, the logs of health information seeking activities on the Web, and FDA's adverse event reporting system.
It is well-appreciated that sequential use of data may not provide the benefits that come from combined use of data. In the second part of the presentation I will cover ongoing research on developing sensor fusion approaches for ADR detection, and demonstrate their potential benefits.

Rave Harpaz is a Senior Research Scientist at Oracle Health Sciences. Previously, Rave was a Research Scientist at Stanford University, a post-doctoral fellow at Columbia University, and a quantitative risk modeling analyst at Merrill Lynch. Rave holds a PhD in Computer Science from the City University of New York specializing in the area of unsupervised Machine Learning, and holds a Law degree from Tel-Aviv University, Israel. Rave’s current research is focused on computational methodologies to identify novel adverse drug reactions using diverse data sources.

Workshop Chairs

 Honorary Chair

Zoran Obradovic
Temple University

 Workshop Chairs

Nitesh Chawla
University of Notre Dame
Gregor Stiglic
University of Maribor
Fei Wang
University of Connecticut

Note: for inquiries please send e-mail to gregor.stiglic@um.si and fei_wang@engr.uconn.edu

Previous DMMH Workshops

First Workshop on Data Mining for Medicine and Healthcare was organized at KDD 2011 conference in San Diego, CA. The workshop was implemented as a full-day workshop with 2 invited speakers, 6 full papers and 4 short papers.

Keynote lectures and the panel are available at http://videolectures.net/datamining2011_san_diego/.

Information on the 2nd, 3rd and 4th Workshop on Data Mining for Medicine and Healthcare can be found at DMMH-SDM 2013DMMH-SDM 2014 and DMMH-SDM 2015 websites.