In virtually every country, the cost of healthcare is increasing more rapidly than the willingness and the ability to pay for it. At the same time, more and more data is being captured around healthcare processes in the form of Electronic Health Records (EHR), health insurance claims, medical imaging databases, disease registries, spontaneous reporting sites, and clinical trials. As a result, data mining has become critical to the healthcare world. On the one hand, EHR offers the data that gets data miners excited, however on the other hand, is accompanied with challenges such as 1) the unavailability of large sources of data to academic researchers, and 2) limited access to data-mining experts. Healthcare entities are reluctant to release their internal data to academic researchers and in most cases there is limited interaction between industry practitioners and academic researchers working on related problems.
The objectives of this workshop are:
1. Bring together researchers (from both academia and industry) as well as practitioners to present their latest problems and ideas.
2. Attract healthcare providers who have access to interesting sources of data and problems but lack the expertise in data mining to use the data effectively.
3. Enhance interactions between data mining, text mining and visual analytics communities working on problems from medicine and healthcare.
• Statistical analysis and characterization of healthcare data
• Text mining - mining free text in electronic medical records
• Visual analysis and exploration of longitudinal clinical trial data
• Meaningful use of healthcare data for improved patient care and cost-reduction
• Data quality assessment and improvement: preprocessing, cleaning, missing data treatment etc.
• Pattern detection and hypothesis generation from observational data
• Visualization of prescriptions drugs and interactions
• Privacy and security issues in healthcare
• Information fusion and knowledge transfer in healthcare
• Evolutionary and longitudinal patient and disease models
• Medical fraud detection
• Help with ICD 9 to ICD 10 conversions
• Health Information exchanges
Mohamed Ghalwash, Temple University
Andreas Holzinger, Medical University Graz
Robert Moskovitch, Columbia University
Mykola Pechenizkiy, Eindhoven University of Technology
Niels Peek, University of Manchester
Igor Pernek, Research Studios Austria
Chandan K. Reddy, Wayne State University
Stein Olav Skrøvseth, University Hospital of North Norway
Cristina Soguero Ruiz, Rey Juan Carlos University
Suzanne Tamang, Stanford University
Ping Zhang, IBM T.J. Watson Research
Jiayu Zhou, Michigan State University
Paper Submission: 27 Jan, 2016 (extended)
Notification of Acceptance: 5 Feb, 2016
Camera Ready Paper Due: 12 Feb, 2016
Workshop: May 7, 2016
All submissions must be made electronically at https://easychair.org/conferences/?conf=sdmdmmh2016.
Papers submitted to this workshop must not have been accepted or be under review by another conference with a published proceedings or by a journal. The work may be either theoretical or applied.
The workshop accepts short (4-6 pages) and long papers (up to 9 pages) with US Letter (8.5" x 11") paper size (single-spaced, 2 column, 10 point font, and at least 1" margin on each side). Papers must have an abstract with a maximum of 300 words and a keyword list with no more than 6 keywords.
We would like to encourage you to prepare your paper in LaTeX2e. Papers should be formatted using the SIAM SODA macro, which is available through the SIAM website. You can access it at http://www.siam.org/proceedings/macros.php. The filename is soda2e.all. Make sure you use the macros for SODA and Data Mining Proceedings; papers prepared using other proceedings macros will not be accepted.
For Microsoft Word users, please convert your document to the PDF format. Since there is no Microsoft Word Template, please visit http://www.siam.org/proceedings/ to view the format of previous papers.
All submissions should clearly present the author information including the names of the authors, the affiliations and the emails.
*Short papers will have 15 minutes for presentation and 5 min for questions (long papers 20 + 5).
Eindhoven University of Technology
Title: Predictive Analytics that Works!?
Application-driven research in predictive analytics contributes to the massive automation of the data-driven decision making and decision support. As data mining researchers and data scientists we often have a (false) believe that our techniques are immediately applicable for solving real problems, and have no bad intents; and thus we can keep our focus on developing novel techniques pushing for higher and higher accuracy of predictive models. Some of us study how to make them more robust or adaptive to changes in known and hidden contexts, others – how to facilitate privacy-preserving or privacy-aware analytics. In the first part of my talk, I will overview some of such practical issues that matter in real applications and relate them to the current state of the art in predictive analytics research.
However, recent reports as e.g. 2014 Whitehouse Review of Big Data argue that "big data technologies can cause societal harms beyond damages to privacy”, that data-driven decisions could have discriminatory effects even in the absence of discriminatory intent, that there are threats of opaque decision-making and call for a thorough studying of these threats and of methods to address them. In the second part of my talk I will revisit these concerns in the context of the personalized medicine research with the goal to highlight why the general public, domain experts or policy makers may consider predictive analytics as a thread. I will present my subjective view on what questions need to be included into the data science research agendas for gaining a deeper understanding what it means for predictive analytics to be ethics-aware and accountable and how we can achieve this.
Mykola Pechenizkiy is Associate Professor in Predictive Analytics at the Department of Computer Science, Eindhoven University of Technology (TU/e), the Netherlands. He received his PhD in Computer Science from the University of Jyvaskyla, Finland in 2005. Since June 2013 he is also Adjunct Professor in Data Mining for Industrial Applications there. His expertise and research interests are in predictive analytics and knowledge discovery from evolving data, and in their application to real-world problems in industry, commerce, medicine and education. He develops generic frameworks and effective approaches for designing adaptive, context-aware predictive analytics systems. He has actively collaborated on this with industry. He has co-authored over 100 peer-reviewed publications and co-organized several workshops, conferences, special issues, and tutorials in these areas. He served as the chair of the steering committee of Computer-Based Medical Systems (CBMS) conference series in 2012-2016. As a panelist and an invited speaker he has been advocating for the ethics-aware predictive (learning) analytics research at several recent events, including the FATML@ICML 2015 and NSF IRB Privacy and Big Data workshops and the EDM 2015 conference.
University of Miami
Title: Computational Analysis of Biological, Social, and Medical Data
In the Center for Computational Science at the University of Miami a number of collaborative research projects are ongoing which demands new algorithms for data exploration. In this talk I will present some of the projects for analyzing biological, social, or medical data. In particular I will present onoging research on: a) large-scale RNA-seq data analysis, b) log-note analysis of community service program brokers, and c) anonymized medical record data.
Mitsunori Ogihara received a Ph.D. in Information Sciences from Tokyo Institute of Technology in 1993. From 1994 to 2007 he was a faculty member in the Department of Computer Science at the University of Rochester, Rochester, NY, where he received tenure in 1998, became a full professor in 2002, and served as department chair from 1999 to 2007. He is currently a professor of Computer Science at the University of Miami, Coral Gables, FL. At the University of Miami he is also serving as Director of Data Mining in the Center for Computational Science, a university-wide organization to promote computing. Since 2012 he is serving as Associate Dean for Digital Library Innovation in the Otto G. Richter Library and in the College of Arts and Sciences. Ogihara has published three books and over 170 articles in journals, books, and conference proceedings. He is a recipient of an NSF CAREER Award and two patents. He serves on the editorial board for Theory of Computing Systems (Springer) and International Journal of Foundations of Computer Science (World Scientific Press).
Oracle Health Sciences
Title: Enhanced Detection of Adverse Drug Reactions using Multiple Data Sources
The increasing harm and monetary burden associated with adverse drug reactions (ADRs) has made the need to strengthen post-marketing drug safety surveillance (DSS) a top priority for health systems worldwide. A key element for improving DSS is the development of enhanced computational approaches for ADR detection. A confluence of recent technological and scientific developments has made new kinds of observational, experimental, and knowledge-based data available for DSS applications, including ADR detection.
In the first part of the presentation I will discuss the opportunities, challenges, and methodologies associated with ADR detection using some of these new data sources, such as: electronic health records, the biomedical literature, the logs of health information seeking activities on the Web, and FDA's adverse event reporting system.
It is well-appreciated that sequential use of data may not provide the benefits that come from combined use of data. In the second part of the presentation I will cover ongoing research on developing sensor fusion approaches for ADR detection, and demonstrate their potential benefits.
Rave Harpaz is a Senior Research Scientist at Oracle Health Sciences. Previously, Rave was a Research Scientist at Stanford University, a post-doctoral fellow at Columbia University, and a quantitative risk modeling analyst at Merrill Lynch. Rave holds a PhD in Computer Science from the City University of New York specializing in the area of unsupervised Machine Learning, and holds a Law degree from Tel-Aviv University, Israel. Rave’s current research is focused on computational methodologies to identify novel adverse drug reactions using diverse data sources.
First Workshop on Data Mining for Medicine and Healthcare was organized at KDD 2011 conference in San Diego, CA. The workshop was implemented as a full-day workshop with 2 invited speakers, 6 full papers and 4 short papers.
Keynote lectures and the panel are available at http://videolectures.net/datamining2011_san_diego/.