Neural networks in chemistry – search for potential drugs for COVID-19
Project reference: 2203
Neural networks (NN) and deep learning are two success stories in modern artificial intelligence. They have led to major advances in image recognition, automatic text generation, and even in self-driving cars. NNs are designed to model the way in which the brain performs tasks or functions of interest. NNs can perform complex computations with ease.
Computational chemistry provides powerful tools for investigation of molecular properties and their reactions. Rapid development of HPC has paved the road for chemists to utilize computational chemistry in their everyday work, i.e. to understand, model, and predict molecular properties and their reactions, properties of materials at nano scale, and reactions and processes taking place in biological systems.
How can we transform the structure of molecules into a form that neural networks understand? In this project, we want to replace the computationally expensive “docking” of molecules into the cavity of a target protein by machine learning and neural network methods. The target protein under study is 3CLpro SARS-CoV-2 (6WQF), which plays a key role in SARS-CoV-2 virus replication. How successful are these methods in preselecting large drug databases to select potential drugs for COVID-19?
This will be achieved by using several available molecular descriptors to account for correlation between structures (i.e. atomic positions) of chemical compounds under investigation and docking score from external computational sources. NNs will be implemented using widely adopted TensorFlow library in Python. For generation of molecular descriptors of chemical systems, we apply DScribe library , which can be incorporated as a module directly in Python code. Next to the aforementioned “application part” of the project, we also plan to (in)validate the widely accepted fact, that GPGPUs are superior execution platform for NNs to CPUs.
 L. Himanen, M.O.J. Jäger, E.V. Morooka et al., DScribe: Library of descriptors for machine learning in materials science, Computer Physics Communications (2019)
Project Mentor: Ing. Marián Gall, PhD.
Project Co-mentor: Doc. Mgr. Michal Pitoňák, PhD.
Site Co-ordinator: Mgr. Lukáš Demovič, PhD.
Student will learn about Neural Networks, molecular descriptors, TensorFlow, GPUs and HPC in general.
Student Prerequisites (compulsory):
Basic knowledge of Python and elementary chemistry/physics background.
Student Prerequisites (desirable):
Advanced knowledge of Python, TensorFlow and Keras libraries and other HPC tools. Basic knowledge neural networks and quantum chemistry background.
Week 1: training;
Weeks 2-3: introduction to neural networks, TensorFlow, molecular descriptors, quantum chemistry and efficient implementation of algorithms;
Weeks 4-7: implementation, optimization, and extensive testing/benchmarking of the codes;
Week 8: report completion and presentation preparation
Final Product Description:
Expected project result is Python implementation of (selected) neural network algorithm, applied to quantum chemistry problem. Parallel code will be benchmarked and compared to GPU implementation.
Adapting the Project: Increasing the Difficulty:
Writing own NN algorithm using Python and TensorFlow or build own molecular descriptor for molecular characterization.
Adapting the Project: Decreasing the Difficulty:
Applying existent NN implementation to quantum chemistry problems.
Student will have access to the necessary learning material, as well as to our local x86 / GPU Infiniband clusters. The software stack we plan to use is open source.
CC SAS-Computing Centre, Centre of Operations of the Slovak Academy of Sciences