Talks and presentations

Responsible AI Practices

September 02, 2022

Talk, CU Denver, Data Science Symposium, Denver, Colorado, USA.

The fourth industrial revolution fuses Artificial Intelligence (AI) into the advancement of automation technologies from numerous disciplines, thereby impacting various aspects of people’s lives and the society at large. It is, therefore, important to design, build and deploy AI systems responsibly to ensure fairness, inclusiveness, reliability, transparency, privacy, accountability and understanding of limitations. The talk illustrates the responsible AI system design principles and the “think-before-you-code” practices to make an impact.

Data-blind Machine Learning:– only a piece of puzzle building models in oblivious settings

July 26, 2022

Talk, Microsoft Visiting Data Science Educator Program (Summer 2022), Cohort 2, Remote

Supervised machine learning models are, by definition, data-sighted, requiring to view all or most parts of the training dataset which are labeled. This paradigm presents two bottlenecks which are intertwined: risk of exposing sensitive data samples to the third-party site with machine learning engineers, and time-consuming, laborious, bias-prone nature of data annotations by the personnel at the data source site. In this paper we studied learning impact of data adequacy as bias source in a data-blinded semi-supervised learning model for covid chest X-ray classification. Data-blindedness was put in action on a semi-supervised generative adversarial network to generate synthetic data based only on a few labeled data samples and concurrently learn to classify targets. We designed and developed a data-blind COVID–19 patient classifier that classifies whether an individual is suffering from COVID–19 or other type of illness with the ultimate goal of producing a system to assist in labeling large datasets. However, the availability of the labels in the training data had an impact in the model performance, and when a new disease spreads, as it was COVID9-19 in 2019, access to labeled data may be limited. Here, we studied how bias in the labeled sample distribution per class impacted in classification performance for three models: a Convolution Neural Network based classifier (CNN), a semi-supervised GAN using the source data (SGAN), and finally our proposed data-blinded semi-supervised GAN (BSGAN). Data-blind prevents machine learning engineers from directly accessing the source data during training, thereby ensuring data confidentiality. This was achieved by using synthetic data samples, generated by a separate generative model which were then used to train the proposed model. Our model achieved comparable performance, with the trade–off between a privacy–aware model and a traditionally–learnt model of $0.05$ AUC–score, and it maintained stable, following the same learning performance as the data distribution was changed.

Data Science Competitions: A know-how to participate

August 27, 2021

Talk, CU Denver, Data Science Symposium, Denver, Colorado, USA.

It is problematic to find that there is a skewed availability of data science related learning contents vs. contents leading to what one is supposed to do with the learned concepts. Most teaching materials in Data Science, especially, Machine Learning and Deep Learning in an academic setting struggle to engage the pupils in applying the knowledge to solve everyday problems. There is a “believable gap” between graduating from a relevant course and applying the learned ideas in a real world impactful problem solving. Participating at the competitive data science platforms like Kaggle, DrivenData etc. put a participant in a position to utilize the concepts in a more practical way which is both encouraging and constructive. In this talk, the audience will learn about the importance of participating at the competitions, how to start participating at one of the venue, Kaggle and possessing a competitive mindset to improve the submission entry little-by-little amongst the thousands of experts in the world and eventually become successful in their career in Data Science.

Artificial Intelligence and its key contributions in the medical imaging field

March 09, 2020

Talk, Colorado State University (CSU) Animal Imaging Workshop, Fort Collins, Colorado, USA.

Driven by the combination of easy, inexpensive access to huge volume of data, computational infrastructure and advanced algorithms, Artificial Intelligence (AI) has entered into the mainstream of technological innovations including machine learning, deep learning, natural language processing, robotics, and image data analysis which are not only helping us to live and maintain an improved lifestyle 24/7, but also are being the key contributors in the achievement of many of the recent notable scientific works in health science. One of the most impactful areas of health innovations is the application of AI in medical imaging. In this talk, recent advancements of AI will be introduced including automated image processing and interpretations of the analysis. Basic terminologies commonly used while discussing AI applications will be explained with illustrations, and finally the three questions around the “when”, “what” and “how” AI can be integrated into a medical imaging (more specifically radiological) workflow will be illustrated with examples.

Characterizing brain tumor subtypes from MRI images – a computational approach

August 30, 2019

Talk, CU Denver, Data Science Symposium, Denver, Colorado, USA.

Brain tumors are the second most common malignancy in childhood after leukemia. Magnetic resonance imaging (MRI) is a popular clinical method to diagnose brain tumors due to the fact that it is a non-invasive, painless procedure without any ionizing radiation. The standard pipeline for diagnosis after generating MRI scans require clinicians’ expert examination to pinpoint location, size, and types of brain tumor. To assist in the examination, there are also several proprietary tools exist that offer basic image analysis, including segmentation based on histograms, etc. Automated secondary analysis on the images is still performed manually by a clinician. There is a need to develop a tool to help reducing clinicians’ average investigation time on an image by developing a prediction algorithm leveraging the power of the deep neural network to determine the four subtypes of brain tumor: medulloblastomas, DIPG, ependymomas, and edema.