EMPIRICAL ANALYSIS FOR CLASSIFICATION AND PREDICTION OF PROTEIN FAMILY USING MACHINE LEARNING Authors: Rashmi TS , VEENA M R, KAMARAJ R AND JYOTHI NM
ABSTRACT
Proteins are fundamental to life, and understanding their structures is crucial for deciphering their
functions. Despite the efforts that have unveiled around 100,000 unique protein structures, this
represents a small fraction of the vast protein sequence space. The laborious and time-consuming
process of determining a protein's structure has been a bottleneck. To bridge this gap and enable large-
scale structural bioinformatics, computational methods are essential. The challenge of predicting a
protein's three-dimensional structure from its amino acid sequence, known as the 'protein folding
problem,' has persisted for over five decades. Existing methods have limitations, especially when there
are no structurally similar proteins as references. Recently, a groundbreaking machine learning
approach was introduced, capable of consistently predicting protein structures with atomic accuracy,
even in cases with no structural homologs. This approach leverages both physical and biological
knowledge about protein structure and incorporates multiple sequence alignments into the machine
learning algorithm's design. This research focuses on empirical analysis of protein structure and
classification and prediction of protein family using machine learning algorithm and attained high
accuracy of 95%.
Keyword: Protein family, DNA, protein sequence, K-Nearest Neighbors Publication date: 15/12/2023 https://ijbpas.com/pdf/2023/December/MS_IJBPAS_2023_DECEMBER_SPCL_1073.pdfDownload PDFhttps://doi.org/10.31032/IJBPAS/2023/12.12.1073