Current Research Projects

Topic 1: Data-Driven Authentication and Authorization

Data-driven Authentication Synopsis:

Simple password based authentication are widely used and easy to deploy. But, as an individual user now-a-days is involved in hundreds of online resources, s/he needs to manage a lot of passwords which is difficult, and also insecure as they can be easily guessed or cracked by hackers. Alternatives like two-factor authentication (2FA), multi-factor authentication (MFA), biometrics like iris scanning, facial recognition, fingerprint matching techniques are around for some time, but the adaptation rate is low due to some issues in terms of implementation and usability, and even these alternatives can also be compromised. Data driven authentication research will focus on highlighting the shortcomings of these techniques, and introduce novel techniques involving gait signature, facial expressions, gaze, behavioral attributes and other dynamic profiles of the individuals. These in turn will eliminate the need to remember passwords, and make the authentication process safe and secure.

Authorization Rule Extraction, Synthesis and Refinement:

Once an individual is authenticated into a system, s/he is subjected to an authorization process to determine whether s/he should be permitted access to a protected resource. There are many different types of access control models used by the organizations, but each has its shortcomings over the other due to dynamic nature of the cyber systems. One primary class of such models are role-based access controls in which authorization decisions are made by assigning roles to users and permissions to roles. A related body of work called role-mining has been focused on extracting meaningful roles from organizational data including individual user permissions. This problem is complicated, especially when the problem domain size is large. We use recent advances in big data analytics to mine contextually meaningful roles. Another primary example is attribute-based access control (ABAC) which has gained in popularity in recent years. In this authorization model, access control rules are defined based on conditional statements on attributes of subjects, objects, and actions. Extracting the explicit and hidden attributes of entities requires a thorough data-driven understanding of the subjects (users), and objects (resources) in the system. Little research has been dedicated to extraction and identification of these attributes from the contextual data, investigation of how an optimal set of rules could be defined based on attributes, or transition from older access control models such as discretionary access control (DAC) to ABAC.

Topic 2: Pattern Mining from Logs

Detecting Zero-Day APT Attacks:

Virtually all systems and software applications generate timestamped log entries in their respective log files exposing tons of useful information about the systems to the administrator if mined properly. The problem of analysis, aggregation, correlation of these distributed and heterogeneous logs to extract rare, malicious or abnormal patters in communication or activities are central to the practice of threat analytics. The goal of this project is to build upon the recent developments in unsupervised deep learning in detection, prediction, or discovering the root-cause of cyber threats. Discovering stealthy advanced and persistent threats through correlation and analysis of communication logs, DNS activities, filesystem activities, available repositories (e.g. STIX, CAPEC) on cyber threats.

Data-driven Brute-Force Attack and Defense:

While existing approaches evaluate a password quality on its own, little effort has been put into understanding goodness of passwords collectively and in the context of other passwords. The goal of this project is to understand how attackers could potentially use data-driven techniques on corpuses of passwords using text mining or association rule mining to generate better password guessing algorithms. Another relative project is to extract and understand correlation between emails and organizational names. Studies show that email address is a very important piece of information which in many cases is not supposed to be public knowledge, especially for personal uses. Understanding the correlation between public and personal email addresses of users, and also guessing user email addresses from public corporate records are two interesting directions in this research theme. The third project in this area is using existing corpus of public organizational data to understand and enumerate the domain names of a target organization. The goal of this project is to understand how attackers can use existing public resources of an organization to identify its public domain names. This information is critical to cyber attackers especially in IPv6 domain where the abundance of IP addresses forces information gatherers to rely on DNS rather than blind scanning to identify and enumerate hosts in the target network.

Topic 3: Pattern Mining and Querying in Cyber Graphs

Real-time and Scalable Graph Mining Operators for Cybersecurity

Graph-theoretic analysis are one of the core components in analysis of large-scale cyber threats. Graphs provides an intuitive data structure for modeling system entities and their attributes as well as representing interconnections among different entities. Some important examples are attack graphs for representing potential intrusion paths in a network or system, communication graphs for representing network-level flows among machines, graph-based representation of social networks, and control flow graphs for representing execution paths within a program. While various approaches have been proposed over time for structural analysis of different types of graphs, mostly for intrusion detection, these techniques are only useful for certain graph types, have narrow application, and usually have low scalability and composability. The goal of this research is to use recent advances in network embedding and graph mining to define a set of generic operators for real-time and scalable query on graph data structures geared toward cybersecurity needs. We show how these operators could be used in different security domains to identify intrusions, perform forensics analysis, and investigate the status of cyber-systems.

Topic 4: Secure Infrastructures

Protecting cyber infrastructures that are usually built on top of TCP/IP networks such as healthcare, financial infrastructures, and power grids are of paramount importance. These infrastructures have been targeted by advanced, usually state-sponsored, adversaries. Countering these attacks are difficult due to enormity and diversity/heterogeneity of involved systems which results in susceptibility to a high number of potential intrusion paths and attack models. The goal of this project is to use advances in big data analytics to design security models for these large-scale heterogeneous infrastructures. This research will be focused on specific types of cyber infrastructures, including Smart Grids, but the intuitions and methodologies will be generalized to other domains of cybersecurity. The smart grid (SG), a network of electricity transmission lines between the electricity providers and its customers nowadays are being monitored and managed by autonomous and intelligent sensing technology. There is a need to develop defense monitoring techniques that can reliably mine the complex sensor log data in the evaluation of the system status and failures, attack identification and backtrack to detect source, prediction of future threats, suggestion of remediation while reliably ensuring adequate operational security provisions of the grid.

Topic 5: IoT and Cyber Physical System Security

We now live in the era of the Internet of Things (IoT), where digitally connected devices are intruding every aspects of our lives, including our homes, offices, vehicles, bodies, and many other cyber-physical systems (CPS). With the exponential growth of active IoT devices, cybercrimes on the IoT devices are also increasing. More connected devices imply more possibilities for an attacker to target an individual, but also imply more attack vectors to protect against such attacks. In this regard, machine learning techniques can be developed to come up with network based defensive solutions, as well as detecting group attacks in a CPS, data falsification attack in ad-hoc networks like V2X, VANETs, data stealing or other adversarial attacks in body sensor devices.

Topic 6: Trend Prediction for Non-copy-cat and Novel Threats

While new instances of cyber-attacks occur every minute across the globe, majority of these attacks rely on variants of same techniques and tactics used in prior attacks. The goal of this project is to define characteristics of novel cyber-attacks, and develop machine learning methods for predicting novel cyber-attacks (based on their techniques, exploits, targeted systems, etc.) and their trend analysis in a cyber system, thereby predicting present and future trends of attacks and threats in terms of complexity, number of affected nodes, strategies and so on. This will help us to estimate when we may expect the next game-changing attack in the cyber domain considering the numerous technological advancements and Moore's law and its limit. We will also study how to develop a real-time automated advisory composition to alert/advice various levels of users in the system based on system logs, user profile data that needs to be mined by machine learning (and deep learning) methods which implicitly will make the resources in the system secure, and educate an user to be a safer actor in the cyber system.