Wilson Hall
371 Wilson Boulevard
Rochester, Michigan 48309-4486
(map)
(248) 370-2762
(248) 370-4111
research@oakland.edu
Center for Data Science and Big Data Analytics

Center for Data Science and Big Data Analytics

Center for Data Science and Big Data Analytics

A collaborative research forum for multiple units within Oakland University, the Center for Data Science and Big Data Analytics facilitates multidisciplinary data science research that uses big data analytics techniques. The center combines the expertise of scientists from biological and biomedical sciences, and researchers in mathematics/statistics, engineering, business and finance. These experts use cutting-edge analytics, informatics and computing methodologies to conduct research and develop innovative solutions to address high-impact problems across disciplines. In addition to serving as a research center emphasizing quantitative data based research in various disciplines, the experts in this center work with researchers from other disciplines by providing them with analytic support. Experts from this center are also available to consult with external industries and businesses.

Research Areas

The research focus of this center is on healthcare operations analytics, industrial and financial analytics, genome and evolutionary biology research, sensor networks and the internet of things. The center and its faculty researchers also partner closely with industry and other institutions to address current and trending issues.


Healthcare Operations Analytics

Traditional healthcare analytics involves using patient as well as operational data to conduct statistical and quantitative analysis, build explanatory and predictive models, and fact-based management to drive healthcare decisions and actions. It is broadly concerned with the use, study, creation or synthesis of information artifacts such as databases, knowledge bases, mathematical/statistical models, data integration and transformation tools and entire decision support systems.

The primary aim of healthcare analytics is to improve managerial decision making through access to better information. However, the amount of medical data generated and the heterogeneity of that data makes traditional analytics inefficient particularly given the fact that much of the data is non-numerical. For example, notes written by physicians and nurses, images, and videos contain valuable information that need to be factored into the analysis. However, current tools do not have adequate mechanisms to integrate different types of data.

As part of this research stream, the center would focus on developing an infrastructure for acquiring, integrating and analyzing the healthcare data to support decision making.

Anticipated Outputs:

  • Architecture for healthcare data acquisition and integration from disparate sources
  • Privacy preserving data analytics methods
  • Proof of concept prototype demonstration

Industrial Analytics

With auto industry and its primary and secondary supplier industries around, there are streams of big datasets that await the analysis. While larger companies have some sort of technical research centers, albeit inadequate for the purpose, smaller industries completely lack resources or manpower to handle their big data. The Center for Data Science and Big Data Analytics at Oakland University would act as a bridge between different disciplines and industries and provide analytics services.

Anticipated Outputs:

  • Collaboration with auto and other allied industry on research problems of shared interests.
  • A university based consulting service center guiding the auto industry with statistical experimentation, data analytics and quantitative methods.
  • Develop short training programs in quantitative data analysis for local and national industries.
  • Develop student internship programs in collaboration with industries.

Financial Analytics

This research stream focuses on using the multivariate and Bayesian methods in big data problems with special reference to finance. These datasets are huge, over thousands of stocks, mutual funds, Exchange traded funds and other financial instruments and are collected over years at the intervals of day, hour or minute and even at further higher frequencies. The sheer volume of data on various financial instruments and indexes collected over years on per minute frequency or on per stock price change (commonly called tick) basis and the interconnectedness of various stock price changes thereof pose a great challenge. The complexity is further compounded by events such as stock splits, mergers, stocks leaving the space as some companies die and new stocks entering the space as new companies are formed. The challenges of studying the market behavior or predictions can only be handled by looking at the data together rather than doing so on per stock basis. Such correlated data can be analyzed only through appropriate techniques and in view of the complexity of the data, these techniques are bound to be computer intensive. With Bayesian and Markov Chain Monte Carlo methods of modelling these data, new ways of analyzing these data would be developed. Further, such problems inevitably require special expertise, intensive computational power and special analytics. A specific objective of this research is the efficient and effective analysis of financial data.

Anticipated Outputs:

  • Development of new techniques which can reliably predict financial market’ behavior.
  • Development of techniques for statistical arbitrage- where one can make decision as to when and which financial instruments are going to perform better.
  • Providing the financial advice to outside firms.
  • Development of new courses – possibly cross listed between the department of mathematics and statistics and SBA and exploration of developing new degree or certificate programs.

Genome Research

This research stream focuses on studying/identifying gene mutations that lead to cardiovascular diseases. Specifically, this research will use mouse sensitized whole genome ENU mutagenesis screens to identify genes involved in the pathogenesis of several cardiovascular diseases including venous thromboembolism, heart attacks and other vascular occlusive diseases such as sickle cell anemia. The whole genome sequencing is used to identify the mutations, thus this research generates terabytes worth of genomic sequencing data per experiment. The high volume of genomic sequence data produced necessitates computationally intensive analyses and data storage. The Center for Data Science and Big Data Analytics aims to conduct cutting edge research in genomics and provide critical research and training opportunities for OU faculty and students.

Anticipated Outputs:

  • The cardiovascular genome research project will produce genes involved in cardiovascular diseases. This information will be used for the improved diagnosis and treatment of cardiovascular diseases.
  • New methodologies for the analysis of whole genome sequencing data will likely be developed.
  • The proposed work will result in publications and applications for external funding.
  • Intellectual property is likely to be generated as a result of this work, enabling the university to gain the revenue necessary to provide funding for future research endeavors or centers.

Evolutionary Biology Research

This stream focuses broadly on fully computational evolutionary research using large datasets. It consists of two applied research areas supported by strong theoretical investigations. One of the major goals of this research is to explore evolution of life through phylogenetic trees, with special attention to evolution of early microbial life. It requires working with thousands of sequenced genomes that are obtained from available databases and reconstructing large evolutionary histories. The second goal of this stream of research is to explore the correlation between genotype and phenotype with particular attention to pathogenic species. This requires the use of fully sequenced genomes and of techniques to reconstruct past evolutionary steps (ancestral state reconstruction), which can be calculated by intense computational applications. Both these goals are supported by large-scale simulations that allow testing of the accuracy of obtained estimates within a controlled environment and the optimization of methodologies and software implementations. This research would greatly benefit from a venue in which expertise from other Big Data scientists could be tapped to design new and innovative ways to analyze/visualize data, and statistically evaluate the significance of the results.

Anticipated Outputs:

  • Accurate reconstruction of early stages of life to understand how microbial communities interact with large-scale environmental changes, such as those happening during early Earth’s history.
  • Deciphering the elements within a genome that contribute to specific phenotypes, such as a pathogenic lifestyle. The focus will be on the agent of malaria (Plasmodium) because of the worldwide interest in this disease and the detailed genomic information available.
  • Theoretical support leading to the optimization of algorithms and software implementations as well as simulated datasets to explore the effect of multiple parameters and provide guidance to the scientific community on best practices for phylogenetic and genomic analyses.

Sensor Network Applications

Healthcare: For improving the quality of life, wireless and wired sensor network technologies are considered as one of the key research areas in computer science and healthcare application industries. The amount of data collected from patients in a region with many hospitals is huge and it grows every minute. Analysis of the sensor data (ECG, blood test results, ailments, treatment, allergy etc.) of a patient from the past few years and current value, identifying similar patients and comparison with their treatment and response is important for quality health care. Our research includes, design and development of small size, low power consuming and accurate digital sensors, collection of various data at appropriate interval, storing in the proper and secure format in large repositories, data mining algorithms, testing and implementation.

Internet of Things: Internet of Things (IoT) is the network of physical objects, devices, vehicles, buildings and other objects which are embedded with electronics, software, sensors, and network connectivity. These objects collect and exchange data. At present, industries are developing IOT enabled objects like connecting front door locks to internet, garage door opener, refrigerator etc. to internet and every object possible. This evolution is pushed into us and we need to worry not only about security of this technology, but also the huge amount of data this generates every micro second. Some of the data generated by IOT devices need to be stored for historical reasons and for analysis of trend or behavior of the customer. Automotive manufacturers are moving towards connected cars. These cars will communicate with each other and also communicate with internet and store various data from the cars in a huge data base. The data base may be maintained separately by individual car company like GM, FORD etc. to analyze their car's performance under varying road, weather, and traffic conditions. This requires big data analytics.

Cloud based Manufacturing: In manufacturing plants, various machines are connected to internet and the sensor values from these machines are stored in the cloud with time stamp. This allows the study of behavior of machines, the status of these machines etc. The analysis is used for Condition based maintenance of the machines, product quality analysis, and downtime analysis to improve productivity. We do research on the Cloud based manufacturing and data analysis this big data.

Anticipated Outputs:

  • Our Sensor network research will enhance the Quality of health care and also provide faster response to save lives. The research will lead to journal publications and PhD awards.
  • The IOT research will be useful for enhancing security in embedded system, prevent ID theft, help prevent automotive accidents and traffic congestion. This is useful for Industries in Michigan and around the world. This research will lead to journal publications and PhD awards.
Researchers

Co-Directors

Vijayan Sugumaran, Ph.D.

Vijayan Sugumaran, Ph.D.
Professor of Management Information Systems and Chair of the Department of Decision and Information Sciences in the School of Business Administration.

More information about Dr. Sugumaran

Dr. Sugumaran received his Ph.D in Information Technology from George Mason University, Fairfax, Virginia, USA. His research interests are in the areas of Big Data Management and Analytics, Ontologies and Semantic Web, Intelligent Agent and Multi-Agent Systems. He has published over 150 peer-reviewed articles in Journals, Conferences, and Books. He has edited twelve books and serves on the Editorial Board of eight journals. He has published in top-tier journals such as Information Systems Research, ACM Transactions on Database Systems, Communications of the ACM, IEEE Transactions on Big Data, IEEE Transactions on Engineering Management, IEEE Transactions on Education, and IEEE Software. Dr. Sugumaran is the editor-in-chief of the International Journal of Intelligent Information Technologies. He is the Chair of the Intelligent Agent and Multi-Agent Systems mini-track for Americas Conference on Information Systems (AMCIS 1999 - 2016). Dr. Sugumaran has served as the program co-chair for the 14th Workshop on E-Business (WeB2015) as well as the International Conference on Applications of Natural Language to Information Systems (NLDB 2008, NLDB 2013 and NLDB 2016). He also regularly serves as a program committee member for numerous national and international conferences.

Ravindra Khattree, Ph.D.

Ravindra Khattree, Ph.D.
Professor of Statistics in the Mathematics and Statistics Department, College of Arts and Sciences.

More information about Dr. Khattree

Dr. Khattree received his Ph.D in statistics from the University of Pittsburg. He is a member of the Center for Biomedical Research at Oakland University. He is winner of the Young Statistician Award in 2002 given by the International Indian Statistical Association. He is a Fellow of the American Statistical Association, inducted in 2003. He is an elected Member of International Statistical Institute (elected in 2004). He is also the recipient of the 2008 Oakland University Research Excellence Award. Dr. Khattree’s areas of interest are Multivariate Analysis, Experimental Designs, Statistical Quality Control, Biostatistics, Classification Problems, and Bioequivalence. Professor Khattree is a theoretical statistician who has developed modelling tools for evaluation and processing of large data and information systems and has published more than 80 peer-reviewed articles.

Founding Researchers

Fabia U. Battistuzzi, Ph.D.

Fabia U. Battistuzzi, Ph.D.
Assistant professor of Biological Sciences in the College of Arts and Sciences.

More information about Dr. Battistuzzi

Dr. Battistuzzi has 10+ years of experience in large evolutionary analyses. Her research group focuses on the evolution of microbial life with particular interests in early life and the origin of human pathogens. Since joining OU in 2012, she have started a strong research program involving undergraduate and graduate students that has been supported by OU funds, the Center for Biomedical Research, and the Michigan Space Grant Consortium. Additionally, a grant submitted to NASA to perform Big Data analyses has been recently selected. She has also been involved in many outreach activities in local school districts. With her research, teaching, and outreach activities her aim is to promote the importance of strong fundamentals in data analysis to high school, undergraduate, and graduate students with particular attention to women and minorities that are poorly represented groups in STEM. She believes that this center can create the necessary environment to provide faculty and students an opportunity to exchange ideas and learn from each other with practical research problems to solve. It will be an excellent first step to establish OU as one of the major players in data analysis in Michigan and will be a strong recruitment point for future talented faculty and students.

Joseph Callaghan, Ph.D.

Joseph Callaghan, Ph.D.
Professor of Accounting in the School of Business Administration.

More information about Dr. Callaghan

Dr. Callaghan graduated from the University of Detroit-Mercy with a B.S. in Accounting and special joint J.D.-M.B.A. degrees. His doctorate is from the University of Illinois at Urbana-Champaign in Accountancy. Joe joined the OU faculty in 1989 from Michigan State University. In 1995 he was promoted to Associate Professor with tenure and in 2002 to Professor of Accounting. Professor Callaghan teaches in the Financial, Managerial and Accounting Information Systems (AIS) areas and integrates model-based application development in support of these disciplines. On behalf of the SBA, he has developed and incorporated advanced information technologies throughout the Accounting curriculum. His teaching interests are in the areas of Financial, Managerial, and Accounting Information Systems,. His research interests are Big Data Management, Medical, Systems, Financial Markets, and Valuation Methods.

Subramaniam Ganesan, Ph.D.

Subramaniam Ganesan, Ph.D.
Professor of Electrical and Computer Engineering in the School of Engineering and Computer Science.

More information about Dr. Ganesan

Dr. Ganesan has over 30 years of teaching and research experience in Digital Computer systems. He was the chair of the CSE department from1991 to 98. He has published over 100 journal papers, more than 200 papers in conference proceedings, and 3 books. He published a book on Java in 2003. He developed a custom DSP board with software for his DSP book. He is a senior member of IEEE, IEEE Computer Society Distinguished Visiting Speaker, IEEE Region 4 technical activities member and Fellow of ISPE. He received Lifetime Achievement award from ISAM, Lloyd L. Withrow Distinguished Speaker award from SAE, Best Teacher award from ASEE, and Oakland University. He has organized many international conferences. He is the editor in chief of an International Journal of Embedded system and Computer Engineering and International journal of Sensors and applications. He is the session organizer on “Systems engineering” at SAE world congress for the past 15 years. His research interests are in Real time system, parallel architectures and computer systems, Automotive embedded systems security and signal processing.

Randal Westrick, Ph.D.

Randal Westrick, Ph.D.
Associate professor of Management Information Systems in the School of Business Administration.

More information about Dr. Westrick

Dr. Westrick has 20 years of expertise in hemostasis and vascular biology research as a technician, graduate student, postdoctoral fellow, and an assistant professor. His research group focuses on using mouse forward genetic screens combined with NextGen sequencing (NGS) to identify genes involved in cardiovascular disease. Thus, he has extensive experience in using Big biological datasets such as whole genome DNA and RNAseq data. He is also incorporating Big Data biomedical imaging into his cardiovascular disease studies. The approaches used in his research program make him uniquely qualified to successfully participate in the building of the OU Center for Data Science and Big Data Analytics. The emphasis that his laboratory places on cardiovascular disease based research will also enable development of collaborative research projects with the OUWB School of Medicine. Indeed, two medical students have already worked on research project in his lab. One student received the prestigious Howard Hughes Medical Institute Summer Medical Fellows program funding for his summer research experience and the other student will apply to similar programs during her tenure in the lab. In addition, during his 2.5 years at Oakland University he has trained 5 undergraduate, two Masters and one PhD student. All of them have attended short courses on Big Data analysis techniques and are excited about continuing to develop their expertise through the Center activities. His students have presented at national and international genetics and cardiovascular disease conferences and won numerous awards and procured their own independent funding.

Center Events

The Center will hold events periodically to bring industry and academia together.

Upcoming Events

Check back for upcoming events as they are planned.


December 1, 2016: Strength in Numbers forum

Presentations for this event are available in PDF unless otherwise noted.

Introduction

Center Goals and Objectives
Ravindra Khattree, Ph.D., Department of Mathematics and Statistics

Academic Presentations

Big data approaches in cardiovascular genomic research
Randal Westrick, Ph.D., Department of Biological Sciences,

Earth, genomes, and time: a big data approach to integrative evolutionary histories
Fabia U. Battistuzzi, Ph.D., Department of Biological Sciences

The Next Digital Revolution
Joseph Tan, Ph.D., McMaster University

Detecting the Different States of Emergency Events Using Web Resources
Vijayan Sugumaran, Ph.D., Decision & Information Sciences

IOT, Connected Cars and Big Data Analytics
Subramaniam Ganesan, Ph.D., Department of Electrical and Computer Engineering

Statistics of Leveraged Funds
Ravi Khattree, Ph.D., Department of Mathematics and Statistics

Industry Presentations

Big Data … Are you ready to change your corporate culture?
Chuck Brooks, Ph.D., VP of Reporting and Analytics, Comerica Bank,

FCA Advanced Analytics Overview (presentation not available)
Corey Hardcastle, Business Relationship Manager, Advanced Analytics, Fiat Chrysler Automotive

Analytics: A Practitioner's Perspective
Jason Harper, CEO, RXA

Real Time Data...A Strategic Imperative (presentation not available)
Jack Weiner, President, JW Healthcare Concepts

Closing Remarks
Seeking Industry Partnerships
Vijayan Sugumaran