Research Projects

Stable Robust Inductive Matrix Completion (SRIMC)

Backgrounds: A large number of long intergenic non-coding RNAs (lincRNAs) are linked to a broad spectrum of human diseases. The disease association with many other lincRNAs still remain as puzzle. Validation of such links between the two entities through biological experiments are expensive. However, a plethora lincRNA-data are available now, thanks to the High Throughput Sequencing (HTS) platforms that opens the opportunity for cutting-edge machine learning approaches to extract meaningful relationships among lincRNAs and diseases. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of them utilizes side information of both the entities simultaneously in a single framework.

Methods: The recently developed Inductive Matrix Completion (IMC) technique provides with a recommendation platform among two entities considering respective side information about them. But, the formulation of IMC is incapable of handling noise and outliers that may present in the dataset, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve the two issues. As a remedy, we propose Stable Robust Inductive Matrix Completion (SRIMC) that utilizes the l2,1 norm based regularization to optimize the objective function with a unique 2-step stable solution approach.

Results: We applied SRIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. The method performs better than the state-of-the-art methods in terms of precision@k and recall@k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that SRIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs.

Conclusions: By exploring the underlying connections between the lincRNAs and disease will enrich the current perspective of diagnosis of human diseases and facilitate the corresponding treatment plans. In this study, through the proposal of a stable and robust inductive matrix completion with a framework to integrate a diverse collection of side information available about each of the lincRNAs and the human diseases, we presented the hidden links among the two entities. The proposed method could successfully identify the existing connections as well as discovering new implications, the feature absent in most state-of-the-art algorithms.