In virtually every country, the cost of healthcare is increasing more rapidly than the willingness and the ability to pay for it. At the same time, more and more data is being captured around healthcare processes in the form of Electronic Health Records (EHR), health insurance claims, medical imaging databases, disease registries, spontaneous reporting sites, and clinical trials. As a result, data mining has become critical to the healthcare world. On the one hand, EHR offers the data that gets data miners excited, however on the other hand, is accompanied with challenges such as 1) the unavailability of large sources of data to academic researchers, and 2) limited access to data-mining experts. Healthcare entities are reluctant to release their internal data to academic researchers and in most cases there is limited interaction between industry practitioners and academic researchers working on related problems.
The objectives of this workshop are:
In addition to the more classical data mining approaches, this workshop aims to include two new topic fields – i.e. visual analytics and text mining in medicine and healthcare. By this extension, we aim to foster interactions among multiple communities that work at the intersections of data mining, medicine and healthcare.
Topic areas for the workshop include (but are not limited to) the following:
David Gotz, IBM T.J. Watson Research Center
Nigam Shah, Stanford University
Gregor Stiglic, University of Maribor
Fei Wang, IBM T.J. Watson Research Center
Note: for inquiries please send e-mail to email@example.com and firstname.lastname@example.org
Sophia Ananiadou, University of Manchester
David Buckeridge, McGill University
Nitesh Chawla, University of Notre Dame
Rave Harpaz, Stanford University
Andreas Holzinger, Medical University Graz
Jin-Dong Kim, Database Center for Life Science
Peter Kokol, University of Maribor
Nada Lavrac, Institute Jozef Stefan
Zoran Obradovic, Temple University
Mykola Pechenitzky, Technical University Eindhoven
Niels Peek, University of Amsterdam
Igor Pernek, University of Maribor
Martijn Schuemie, Erasmus University Medical Center
David Sontag, New York University
Jimeng Sun, IBM T.J. Watson Research Center
Jieping Ye, Arizona State University
Ping Zhang, IBM T.J. Watson Research Center
Paper Submission: January 14, 2013 (extended)
Notification of Acceptance: January 25, 2013
Camera Ready Paper Due: February 6, 2013
Workshop: May 4, 2013
All submissions must be made electronically at: https://www.easychair.org/conferences/?conf=sdmdmmh2013
Papers submitted to this workshop must not have been accepted or be under review by another conference with a published proceedings or by a journal. The work may be either theoretical or applied.
The workshop accepts short (4-6 pages) and long papers (up to 9 pages) with US Letter (8.5" x 11") paper size (single-spaced, 2 column, 10 point font, and at least 1" margin on each side). Papers must have an abstract with a maximum of 300 words and a keyword list with no more than 6 keywords.
We would like to encourage you to prepare your paper in LaTeX2e. Papers should be formatted using the SIAM SODA macro, which is available through the SIAM website. You can access it at http://www.siam.org/proceedings/macros.php. The filename is soda2e.all. Make sure you use the macros for SODA and Data Mining Proceedings; papers prepared using other proceedings macros will not be accepted.
For Microsoft Word users, please convert your document to the PDF format. Since there is no Microsoft Word Template, please visit http://www.siam.org/proceedings/ to view the format on previous papers.
All submissions should clearly present the author information including the names of the authors, the affiliations and the emails.
Nitesh Chawla, University of Notre Dame
Nitesh Chawla is the Frank Freimann Collegiate Chair and Associate Professor of Computer Science and Engineering. He started his tenure-track position at Notre Dame in 2007, was promoted and tenured in 2011, and recognized with the Frank Freimann Collegiate Chair in 2012. He is the Director of the Notre Dame Interdisciplinary Center for Network Science and Applications (iCeNSA) and Data Inference Analytics and Learning Lab (DIAL).
Dr. Chawla's research interests lie primarily in data mining and network science, specifically his research is in the areas of imbalanced data, concept drift and dataset shift, adversarial learning, evaluation issues, heterogeneous networks, co-evolving networks, link prediction. He is also at the frontier of interdisciplinary applications with innovative work and key contributions in patient-centered healthcare, social networks, analytics, and climate/ environmental sciences. He is the recipient of multiple awards for research and teaching innovation including outstanding teacher awards (2008 and 2011), National Academy of Engineers New Faculty Fellowship, and number of best paper awards and nominations. He is the recipient of IBM Watson Faculty Award in 2012. He has over 150 publications, and serves as PI/Co-PI on over $10 Million Dollars in research funding with research supported from federal agencies including NSF, DOD, DARPA, ARL, DOE, and a number of industry partners. He is the former chair of the IEEE CIS Data Mining Technical Committee. He also serves on a number of editorial boards and organizing/program committees of conferences. Dr. Chawla is also the founder of Aunalytics, Inc.
Big Data and Patient-centered Healthcare
Proactive personalized medicine is expected to bring fundamental changes, offering recommendations of lifestyle adjustments and treatments to avoid diseases a patient has high risk for developing in the future. No matter how unique our medical experiences, chances are that other patients among millions have experienced genetic and environmental risk factors that closely mirror ours. These medical experiences, risks, symptoms are tapped in the vast repositories of electronic medical records. Can we then take a data-driven approach to discover nuggets of knowledge and insight from the Big Data in healthcare for patient-centered outcomes and personalized healthcare? Can we answer the question: What are my disease risks? This talk will focus on our work that takes the data and networks driven thinking to personalized healthcare and patient-centered outcomes.
Joydeep Ghosh, University of Texas at Austin
Joydeep Ghosh is currently the Schlumberger Centennial Chair Professor of Electrical and Computer Engineering at the University of Texas, Austin. He joined the UT-Austin faculty in 1988 after being educated at, (B. Tech '83) and The University of Southern California (Ph.D’88). He is the founder-director of IDEAL (Intelligent Data Exploration and Analysis Lab) and a Fellow of the IEEE. Dr. Ghosh has taught graduate courses on data mining and web analytics every year to both UT students and to industry, for over a decade. He was voted as "Best Professor" in the Software Engineering Executive Education Program at UT.
Dr. Ghosh's research interests lie primarily in data mining and web mining, predictive modeling / predictive analytics, machine learning approaches such as adaptive multi-learner systems, and their applications to a wide variety of complex real-world problems. He has published more than 300 refereed papers and 50 book chapters, and co-edited over 20 books. His research has been supported by the NSF, Yahoo!, Google, ONR, ARO, AFOSR, Intel, IBM, and several others. He has received 14 Best Paper Awards over the years, including the 2005 Best Research Paper Award across UT and the 1992 Darlington Award given by the IEEE Circuits and Systems Society for the overall Best Paper in the areas of CAS/CAD. Dr. Ghosh has been a plenary/keynote speaker on several occasions such as MICAI'12, KDIR'10, ISIT'08, ANNIE’06 and MCS 2002, and has widely lectured on intelligent analysis of large-scale data. He served as the Conference Co-Chair or Program Co-Chair for several top data mining oriented conferences, including SDM'13, SDM''12, KDD 2011, CIDM’07, ICPR'08 (Pattern Recognition Track) and SDM'06. He was the Conf. Co-Chair for Artificial Neural Networks in Engineering (ANNIE)'93 to '96 and '99 to '03 and the founding chair of the Data Mining Tech. Committee of the IEEE Computational Intelligence Society. He has also co-organized workshops on high dimensional clustering, Web Analytics, Web Mining and Parallel/ Distributed Knowledge Discovery.
Dr. Ghosh has served as a co-founder, consultant or advisor to successful startups (Stadia Marketing, Neonyoyo and Knowledge Discovery One) and as a consultant to large corporations such as IBM, Motorola and Vinson & Elkins.
Predictive Modeling of Large Healthcare Data under Privacy Constraints
As medical records move to the digital age and the bedside gets increasingly instrumented, a wealth of information is being acquired, with the potential of providing unprecedented insights into the cause, prevention, treatment and management of illness. Analyses of such data also promises numerous opportunities for much more effective and efficient delivery of healthcare. However (valid) privacy concerns and restrictions pose a major impediment to realizing this potential. In this talk I will outline two approaches that we have recently and successfully taken that provide privacy-aware predictive modeling with little degradation in model quality despite restrictions on what can be shared or analyzed. The first approach focuses on extracting predictive value from data that has been aggregated at various levels due to privacy concerns, while the second introduces a novel, non-parametric Gibbs sampler that can generate "realistic but not real" data given a dataset that cannot be shared as is.
[Joint work with Yubin Park and Shankar Mallikarjun]
Marc Suchard, University of California, Los Angeles
Marc A. Suchard is Professor in the Departments of Biostatistics, of Biomathematics and of Human Genetics in the UCLA School of Public Health and David Geffen School of Medicine at UCLA. He earned his Ph.D in biomathematics from UCLA in 2002 and continued for a MD degree which he received in 2004. Dr. Suchard is a leading Bayesian statistician who focuses on inference of stochastic processes in genomics and for massive datasets in healthcare. His training in both Medicine and Applied Probability help bridge the gap of understanding between statistical theory and clinical practicality. He has been awarded several prestigious statistical awards such as the 2003 Savage Award, the 2006 and 2011 Mitchell Prizes, as well as a 2007 Alfred P. Sloan Research Fellowship in computational and molecular evolutionary biology and a 2008 Guggenheim Fellowship to further computational statistics. Recently, he received the 2011 Raymond J. Carroll Young Investigator Award for a leading statistician within 10 years post-Ph.D.
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this talk, I discuss how high-performance statistical computation, including graphics processing units, can enable complex inference methods in these massive datasets. I focus on algorithm restructuring through techniques like block relaxation (Gibbs, cyclic coordinate descent, MM) to exploit increased data/parameter conditional independence within traditional serial structures. I find orders-of-magnitude improvement in overall run-time fitting models involving tens of millions of observations.
These approaches are ubiquitous in high-dimensional biological problems modeled through stochastic processes. To drive this point home, I conclude with a seemingly unrelated example developing nonparametric models to study the genomic evolution of infectious diseases. These infinite hidden Markov models (HMMs) generalize both Dirichlet process mixtures and the usual finite-state HMM to capture unknown heterogeneity in the evolutionary process. Data squashing strategies, coupled with massive parallelization, yield novel algorithms that bring these flexible models finally within our grasp.
[Joint work with Subha Guha, David Madigan, and Steve Scott]
Jimeng Sun, IBM T.J. Watson Research Center
Jimeng Sun is a research staff member at Healthcare Analytic Department of IBM TJ Watson Research Center. He leads research projects of medical informatics, especially in developing large-scale predictive and similarity analytics on healthcare applications.
Sun has extensive research track records on core and applied data mining research: specialized in healthcare analytics, big data analytics, similarity metric learning, social network analysis, predictive modeling and visual analytics. He has published over 70 papers, filed over 20 patents (4 granted). He has received ICDM best research paper in 2007, SDM best research paper in 2007, and KDD Dissertation runner-up award in 2008.
Sun received his B.S. and M.Phil. in Computer Science from Hong Kong University of Science and Technology in 2002 and 2003, and PhD in Computer Science in Carnegie Mellon University in 2007, specialized on data mining on streams, graphs and tensor data.
Heterogeneous and large volume of Electronic Health Records (EHR) data are becoming available in many healthcare institutes. Such EHR data from millions of patients serve as huge collective memory of doctors and patients over time. How to leverage that EHR data to help caregivers and patients to make better decisions in future? How to efficiently use these data to help clinical and pharmaceutical research?
My research focuses on developing large-scale algorithms and systems for healthcare analytics. First, I will describe our healthcare analytic research framework, which provides an intuitive collaboration mechanism across interdisciplinary teams and an efficient computation framework for handling heterogeneous patient data. Second, I will present a core component of this framework, patient similarity learning that answers the following questions:
- How to leverage physician feedback into the similarity computation?
- How to integrate multiple patient similarity measures into a single consistent similarity measure?
- How to incrementally update the existing patient similarity functions as new data or feedback arrive?
- How to present the similarity results in an intuitive and interactive way to users?
I will illustrate the effectiveness of our proposed algorithms for patient similarity learning in several different healthcare scenarios. I will demonstrate an interactive visual analytic system that allows users to cluster patients and to refine the underlying patient similarity metric. Finally, I will highlight some current/future work that I am pursuing.
First Workshop on Data Mining in Medicine and Healthcare was organized at KDD 2011 conference in San Diego, CA. The workshop was implemented as a full-day workshop with 2 invited speakers, 6 full papers and 4 short papers.
Keynote lectures and the panel are available at http://videolectures.net/datamining2011_san_diego/.
Information about SDM-DMMH 2013 is also available at KDnuggets.