Data Science encompasses the statistical and computational techniques, technologies and methodologies developed to enable the most recent paradigm of discovery, i.e., data-driven discovery (aka Big Data), complementing the previous discovery paradigms of theory, experimentation and simulation. Data science is currently deployed as one of the main approaches for making discovery in a wide set of applications in almost all academic disciplines and industry sectors, including health informatics, precision and personalized medicine, engineering, business analytics, geosciences, social studies, intelligent transportation, and cybersecurity, to name a few. In turn, the demand for data science expertise has created a vast workforce shortage, reflecting the need for hundreds of thousands of data scientists in U.S. alone.
The goal of the CU Denver Data Science Symposium 2019 is to bring together data science communities from CU Denver, whether they be the communities that pursue research in core data science, applications of data science in various fields of research, or education programs in data science. During this symposium, we will learn about various aspects of data science and share the current state of data science inside and outside CU Denver.
Registration is free but required, and on a first-come-first-serve basis. Claim your seat asap here. Lunch and coffee will be provided.
Event Information
Date: Friday, August 30, 2019
Time: 8:00 am - 5:00 pm
Location: Student Commons Building amphitheater, Room 2600
Address: 1201 Larimer St, Denver, CO 80204
Transportation and Parking:The CU Denver campus is accessible by all forms of public transportation (bus, shuttle, and light-rail).
For those commuting by car, paid parking can be found near the Student Commons Building at the Tivoli Parking garage, 7th Street Parking Garage, or at the Spruce and Dogwood parking lots (all $6.75 per day). You can find a link to the campus parking map here.
Save to Outlook Calendar:
Program
Start | End | Event |
---|---|---|
8:00 | 9:00 | Registration |
9:00 | 9:15 | Opening |
9:15 | 10:15 | Keynote 1: Kevin Koy, IDEO |
10:15 | 10:30 | Coffee break and poster session |
10:30 | 11:30 | Panel discussion on Data Science Education |
11:30 | 12:30 | Data Science at CU Denver: lightning talk session |
12:30 | 1:30 | Lunch break and poster session |
1:30 | 2:30 | Keynote 2: Fernando Perez, UC Berkeley |
2:30 | 3:30 | Panel discussion on Applications of Data Science |
3:30 | 3:45 | Coffee break and poster session |
3:45 | 4:45 | Panel discussion on Data Science Research |
4:45 | 5:00 | Closing |
Keynotes
Keynote 1: Kevin Koy, IDEO
Title: Enabling Discovery and Innovation through Collaborative Data Science
Abstract: Today’s grand challenges require expertise in a wide array of areas that no single discipline can solve alone. Deep domain and method knowledge, that can bring light to complex problems, are often built on decades of research that have been conducted in the silos of each individual field without taking advantage of the insights that others may bring.
A collaborative data science approach to problem solving offers a unique opportunity to bring together the best of methodological and domain thinking, while also taking advantage of the incredible advances that modern computation and software infrastructure now provide.
In this talk, I will share what I have learned in helping to build engaging data science environments that aim to bridge the gaps between our disciplines and encourage the collaborative opportunities that can enable us to address our most pressing challenges today and into the future.
Bio: Kevin is Director of Data Science at IDEO San Francisco where he develops and leads data science initiatives that enable people and organizations to better understand and solve complex challenges. He brings two decades of experience working with data and researchers in a wide variety of settings. Kevin served as the founding Executive Director of UC Berkeley’s Berkeley Institute for Data Science, establishing an active interdisciplinary environment that connects people with the data, methods, and tools to advance research and discovery. At Stanford University, Kevin was Managing Director of Data Science and AI Affiliates Programs where he helped to build collaborations between researchers and corporate partners, and developed new campus-wide collaborative data science programs. Kevin also has extensive experience in geospatial research with a focus on analysis and visualization of spatial data for natural systems. He served as director of UC Berkeley’s Geospatial Innovation Facility and led the development of Cal-Adapt.org, a web resource providing enhanced access to climate change data produced by the state’s scientific community. At the American Museum of Natural History and the Smithsonian Institution, Kevin conducted conservation research in Myanmar, Vietnam, and Laos, using data from sensors and satellites to better understand the history and needs of unique and endangered ecosystems. Kevin earned a BA in environmental studies and anthropology from the University of Pennsylvania and an MS in biology from George Mason University. He has also completed advanced graduate work in geography at the City University of New York.
Keynote 2: Fernando Perez, UC Berkeley
Title: Scientific Open Source Software: meat and bits but not papers. Is it real work?
Abstract: Open source software is now the backbone of computation across the sciences and increasingly education. Yet the creation of scientific software is not well recognized as part of the enterprise of science in terms of training, career paths, intellectual recognition, organizational support, or funding. In this talk, I’ll explore the challenges of this contradictory situation, from the perspective of someone who has spent almost 20 years building open source software and communities. I have lived (often precariously) a dual life of “real academic” and of open source developer and advocate, working on IPython, Project Jupyter and the Scientific Python ecosystem since 2001.
I will provide an overview of Project Jupyter, including its intellectualbackbone, the open source community context that surrounds it, and some of theimpact it has had in recent years. This will help frame the second part of thetalk, where I'll try to open a conversation on the social and organizational challenges of creating and sustaining open, collaborative communities in the structure of research and education. The scientific, technical and community dynamics of projects like Jupyter presents interesting challenges in the context of traditional scientific incentives (funding, publishing, hiring and promotion, etc.) I’ll briefly outline some of these but will mostly focus on some ideas that I hope can move the conversation forward in productive ways.
Bio: Fernando Pérez is an assistant professor in Statistics at UC Berkeley and a Faculty Scientist in the Department of Data Science and Technology at Lawrence Berkeley National Laboratory. After completing a PhD in particle physics at the University of Colorado at Boulder, his postdoctoral research in applied mathematics centered on the development of fast algorithms for the solution of partial differential equations in multiple dimensions. Today, his research focuses on creating tools for modern computational research and data science across domain disciplines, with an emphasis on high-level languages, interactive and literate computing, and reproducible research. He created IPython while a graduate student in 2001 and co-founded its successor, Project Jupyter. The Jupyter team collaborates openly to create the next generation of tools for human-driven computational exploration, data analysis, scientific insight and education. He is a National Academy of Science Kavli Frontiers of Science Fellow and a Senior Fellow and founding co-investigator of the Berkeley Institute for Data Science. He is a co-founder of the NumFOCUS Foundation, and a member of the Python Software Foundation. He is the recipient of the 2012 Award for the Advancement of Free Software from the Free Software Foundation.
Panels
Data Science Education Panel (Chair: Cyrus Dioun)
● Jim Costello, Assistant Professor, Department of Pharmacology, Anschutz School of Medicine
● Dawn Gregg, Professor and Discipline Director, Information Systems, Business School, CU Denver
● Larry Hunter, Professor and Director of Computational Bioscience Program, School of Medicine, CU Anschutz Medical Campus
● Rafael Moreno-Sanchez, Associate Professor, Department of Geography & Environmental Sciences, College of Liberal Arts and Sciences (CLAS), CU Denver
Applications of Data Science Panel (Chair: Shea Swauger)
● Paul Teske, Dean of the School of Public Affairs, CU Denver
● Katerina Kechris, Professor, Department of Biostatistics and Informatics, Colorado School of Public Health, CU Anschutz Medical Campus
● Austin Troy, Professor and Chair, Department of Urban and Regional Planning, College of Architecture and Planning, CU Denver
● Timothy Vis, Software Engineering at Google
● Brian Keegan, Assistant Professor, Department of Information Science, CU Boulder
Data Science Research Panel (Chair: Farnoush Banaei-Kashani)
● Audrey Hendricks, Assistant Professor, Department of Mathematical and Statistical Sciences, College of Liberal Arts and Sciences, CU Denver, and Department of Biostatistics and Informatics, School of Public Health, CU Anschutz Medical Campus
● Caleb Phillips, Researcher and Data Scientist at Computational Science Center, National Research and Energy Lab (NREL)
● Dave Mundo, Senior Director, Strategic Analytics & New Data Science Initiatives, Oracle Data Cloud, Oracle
● Fernando Perez, Assistant Professor in Statistics and a Faculty Scientist in the Department of Data Science and Technology at Lawrence Berkeley National Laboratory, UC Berkeley
Lightning Talks
Session Chair: Julien Langou
Hamilton Bean, Associate Professor, Director of International Studies Program, and Director of Strategic Communication, Department of Communication, College of Liberal Arts and Sciences (CLAS), CU Denver
Title: Thwarting Russian Online Disinformation: A Rhetorical Analysis / Artificial Intelligence Pilot Study
Abstract: Stakeholders do not know which types of rhetorical appeals predominate in Russian disinformation (as well as how these appeals change over time), making identification and containment difficult. Stakeholders also do not know how social media gatekeepers (content moderators) attempt to distinguish Russian disinformation (if at all) from U.S. leftist or rightest messages, or whether it is even possible for content moderators to do so. Therefore, in this study, we scrutinize thousands of known instances of Russian online disinformation in an attempt to categorize types of rhetorical appeals along a continuum of political action (from apathy to violence), exploring whether and how such an approach may be amenable to machine learning and Artificial Intelligence techniques.
Katie Colborn, Assistant Professor, Department of Biostatistics and Informatics, Colorado School of Public Health, CU Anschutz Medical Campus
Title: Development of a model for automated surveillance of surgical site infections using the knockoff filter
Abstract: Using the American College of Surgeons National Surgical Quality Improvement Program (NSQIP) complication status of patients who underwent an operation at the University of Colorado Hospital, we developed an automated framework for identifying surgical site infections using electronic health record data (EHR) intended for broad, generalizable implementation outside of our institution. We used a binomial generalized linear model with a lasso penalty and applied false discovery rate (FDR) correction using the knockoff filter of Barber and Candes to carry out variable selection. The knockoff filter is an improvement over the lasso penalty because it controls Type-I error. By controlling the FDR, the models are parsimonious and are more likely to be generalizable outside of our institution.
Jan Mandel, Professor, Department of Mathematical and Statistical Sciences, College of Liberal Arts and Sciences (CLAS), CU Denver
Title: Integrating Satellite Fire Detections with Coupled Fire-Weather-Smoke Forecasting by Machine Learning
Collaborators: Kyle Hilburn (Colorado State University), Adam Kochanski, Derek Mallia (University of Utah), Ned Nikolov (USDA Forest Service), Martin Vejmelka (CEAI, Inc.), Angel Farguell, James Haley, Lauren Hearn (University of Colorado Denver), and others
Abstract: We are building an interactive online wildfire forecasting system. The core of the system is WRF-SFIRE, a numerical weather prediction code coupled with a fire spread model, smoke transport, and fuel moisture model. Data inputs are downloaded from databases, cached, and processed on demand for a given simulation time and space domain, including topography, fuels maps, 3D weather forecasts, fuel moisture data from sensors on weather station, and fire detections from polar-orbiting satellites. A Python system manages the simulation starting from an initialization by a user on a map interface, acquires and processes all data, sets up the simulation automatically, runs the model on a supercomputing cluster, interfacing with the queuing system, and streams animations to a cloud server. Aside from ongoing validation on case studies and support of experimental burns and a research aircraft, subprojects currently under active development include automatic initialization of the fire simulation from fire detections and assimilation of incoming satellite detection data into a running model by machine learning, downscaling surface winds by a microscale mass-consistent model within WRF-SFIRE, parameterization of winds within forest canopies, outputs to GIS, utilization of drone data, and transition to operational deployment at Forest Service Predictive Services. The operational Colorado Fire Prediction System and the Israel national fire prediction and fire danger system MATASH are built on earlier versions of WRF-SFIRE. This project is supported by the NSF and NASA.
Michael Rosenberg, Assistant Professor, Medicine-Cardiology and Cardiac Electrophysiologist, School of Medicine, CU Anschutz Medical Campus
Title: Going Long: Making the transition towards quantitative individualized medicine
Abstract: The Individualized Data Analysis Organization (IDAO) was created in 2017 to develop methods for the analysis of individual-level data in a manner that informs healthcare and lifestyle behaviors on a single-person level. In contrast to standard population-level analytical methods, the focus of the IDAO is on methods that can be applied to high-density personal data, such as would be collected by wearable activity monitors, longitudinal diet and symptom logs, and implantable medical devices, in order to draw individual-level inferences about disease recurrence patterns to guide prevention. To meet the practical needs of such an approach, our group has ongoing projects focused on mobile app development, advanced data analysis, and clinical investigation. The IDAO is part of the recently formed Colorado Center for Personalized Medicine, which is a partnership among the University of Colorado Denver, University of Colorado Health, Children’s Hospital Colorado, and the University of Colorado School of Medicine. Website: analyzemydata.org
Ashis Biswas, Assistant Professor, Department of Computer Science and Engineering, College of Engineering, Design and Computing (CEDC), CU Denver
Title: Characterizing brain tumor subtypes from MRI images -- a computational approach
Abstract: Brain tumors are the second most common malignancy in childhood after leukemia. Magnetic resonance imaging (MRI) is a popular clinical method to diagnose brain tumors due to the fact that it is a non-invasive, painless procedure without any ionizing radiation. The standard pipeline for diagnosis after generating MRI scans require clinicians' expert examination to pinpoint location, size, and types of brain tumor. To assist in the examination, there are also several proprietary tools exist that offer basic image analysis, including segmentation based on histograms, etc. Automated secondary analysis on the images is still performed manually by a clinician. There is a need to develop a tool to help reducing clinicians' average investigation time on an image by developing a prediction algorithm leveraging the power of the deep neural network to determine the four subtypes of brain tumor: medulloblastomas, DIPG, ependymomas, and edema.
Amy Roberts, Assistant Professor, Department of Physics, College of Liberal Arts and Sciences (CLAS), CU Denver
Title: Increasing access to science data: using data-description standards as a bridge to community tools when analyzing custom, binary-format data
Abstract: Scientists who have data stored in a standard format like comma-separated-values or HDF5 are lucky - they can jump into data analysis with community-supported tools such as python's pandas package. But many scientists have data in a custom binary format and must first write code to de-serialize their data. Few have the training to do this, and data can become inaccessible after its steward has moved on. This talk will discuss existing data-description languages and efforts to extend tools that provide basic data-reading utilities based on a description of the data format to interface with community-supported analysis tools. This may be an easier way for scientists with custom-formatted data to access and analyze their data.
Haadi Jafarian, Assistant Professor, Department of Computer Science and Engineering, College of Engineering, Design and Computing (CEDC), CU Denver
Title: An Adversarial Machine Learning Technique for Blackbox Model Inversion
Abstract: In recent years, a novel class of attacks on machine learning (ML) have emerged that attempt to deceive or learn targeted ML models through careful input generation and perturbation. These attacks which are studied in the research field of adversarial machine learning have different types and goals. For example, model evasion techniques aim to deceive a targeted model to misclassify an input (e.g. image) by adding a small but carefully crafted noise to it, whereas model inversion attacks attempt to learn training data of ML models. In this talk, we present an overview of different adversarial machine learning types and techniques. We also present our current research in this area, which investigates feasibility and effectiveness of model inversion in the black box setting where gradient of the targeted model is not accessible.
Fuyong Xing, Assistant Professor, Department of Biostatistics and Informatics,Colorado School of Public Health, CU Anschutz Medical Campus
Title: KiNet: A Novel Deep Model for Ki67 Labeling Index Assessment in Gastrointestinal and Pancreatic Neuroendocrine Tumors
Abstract: Neuroendocrine tumors (NETs) are one heterogeneous type of cancer affecting most organ systems, and they need to be correctly graded to ensure proper treatment and patient management. Ki67 labeling index (Ki67 LI) is a biomarker for gastrointestinal and pancreatic NET grading. Measuring this index from pathology images requires accurate cell identification, i.e., quantification of immunopositive tumor, immunonegative tumor and non-tumor cells. Current Ki67 image analysis tools have a number of drawbacks: 1) They are still based on manual or semi-automated methods; 2) Most methods use a multi-stage image processing pipeline; 3) Algorithm design does not take into consideration Ki67 image characteristics such that it has technical difficulty in differentiating tumor from non-tumor cells. To address these challenges, we develop a novel end-to-end deep model, namely KiNet, for Ki67 LI assessment in gastrointestinal and pancreatic NETs. It incorporates a pixel-to-pixel neural network into a single-stage processing framework for simultaneous cell localization and classification. In addition, KiNet learns from another auxiliary task to assist the cell identification.
Troy Butler, Associate Professor, Department of Mathematical and Statistical Sciences, College of Liberal Arts and Sciences (CLAS), CU Denver
Title: Supercharge your teaching and research with Jupyter notebooks and hubs
Abstract: How do you develop engaging, interactive, and beautiful activities in computationally-oriented lectures? How do you quickly prototype research ideas, visualize results, and simultaneously document and archive successes and failures with your collaborators? And, how do you do this in an environment that makes for automatic reproducibility across all system platforms? With Jupyter notebooks and hubs of course! This talk will give a short overview on faculty in the Department of Mathematical and Statistical Sciences are utilizing these amazing (and easy to use) tools to supercharge our teaching and research. The FCQs and publications don't lie; you should try Jupyter today!
Registration
Registration is free, but required and on first-come-first-serve basis. Please RSVP here.
Call for Posters
As part of the CU Data Science Symposium we are holding a poster session to present data scientific research pursued by students and faculty across the two campuses of CU Denver. Students working on data scientific research or using data science in their research are invited to submit a poster proposal via this form here. If your poster proposal is accepted for presentation, the notification of acceptance will be sent to you by Tuesday, August 27th. Please note that poster presenters are also required to register for the symposium (see Registration above).
Sponsors
● CU Denver College of Engineering, Design, and Computing (CEDC)
● CU Denver College of Liberal Arts and Sciences (CLAS)
● CU Denver Business School
● CU Denver Office of Research Services
Organizers
● Julien Langou (co-chair), Professor and Chair, Department of Mathematical and Statistical Sciences
● Farnoush Banaei-Kashani (co-chair), Assistant Professor, Department of Computer Science and Engineering
● Cyrus Dioun (co-chair), Assistant Professor of Management, Business School
Webmaster
● Shahab Helmi, shahab.helmi@ucdenver.edu