Speaker
Description
Abstract:
In the era of traditional toxicology, a large amount of data was collected, and summarised in various databases (also called "small data"). On this basis, many QSAR (Quantitative Structure-Activity Relationship) models have been developed to extend the knowledge of toxicity to a larger chemical space. On the other hand, we have a large body of "big data", i.e. high-throughput screening results and –omics results. It is a particular challenge to compress the "big and small" data and integrate it into a single model.
The focus is on the presentation of 'big data' (omic data and high-throughput screening data) and the application of Counter-propagation Artificial Neural Networks (CPANN). CPANN represent one of the fundamental algorithms of Artificial Intelligence (AI). The focus is on its clustering and classification capabilities.
As a first example, we present chemometric studies performed on proteomic data. The aim is to investigate the correlation between proteomic data and different in vitro endpoints and to gain insights into the mechanisms of action [1]. The second example is the analysis of genomic data, where we present the clustering of Zika, MERS, SARS and Covid-19 virus data with respect to the geographical origin of the viruses [2]. The third example is the modelling of binding affinity to thyroid receptors for a large data set (over 7000 compounds). The CPANN method is used for QSAR modelling and clustering of this large dataset.
References:
1. M. Vračko, S.C. Basak & F. Witzmann, Chemometrical analysis of proteomics data obtained from three cell types treated with multiwalled carbon nanotubes and TiO2 nanobelts. SAR&QSAR Environ. Res. 2018, 29, 567-577.
2. M. Vračko, S. C. Basak, T Dey, A. Nandy, Cluster Analysis of Coronavirus Sequences using Computational Sequence Descriptors: With Applications to SARS, MERS and SARS-CoV-2 (CoVID-19). Curr. Comput.Aided Drug Des. 2021,17, 936-945.