Master’s Thesis at the College of Information Technology Explores Prediction Consensus (Binding Sites) with Transcription Factor Protein from DNA-Seq Data Using Machine Learning

By : Duhaa Fadill Abbas
Date : 26/8/2025
Views : 114

Master’s Thesis at the College of Information Technology Explores Prediction Consensus (Binding Sites) with Transcription Factor Protein from DNA-Seq Data Using Machine Learning

Duhaa Fadill Abbas
The Department of Software at the College of Information Technology, University of Babylon, witnessed the defense of a Master’s thesis presented by student Shahad Raed Hadi, entitled “Prediction Consensus (Binding Sites) with Transcription Factor Protein from DNA-Seq Data Using Machine Learning” The thesis defense was supervised by Asst. Prof. Dr. Sura Zaki Naji and took place at 9:00 a.m. on Tuesday, August 26, 2025, in the college’s conference hall.

The study addressed gene expression regulation as a fundamental process in biological systems, carried out through interactions between transcription factors and specific DNA sequences known as transcription factor binding sites. These sites, often located in promoter regions, play a critical role in either activating or repressing gene transcription.

The researcher highlighted that traditional experimental methods for identifying transcription factor binding sites face major challenges, including high costs, extensive manual labor, and the need for significant computational resources. In contrast, computational approaches offer efficient, accurate, scalable, and low-cost alternatives for analyzing large-scale genomic data.

The thesis employed a deep learning model to predict transcription factors and their binding sites within the DNA sequences of Arabidopsis thaliana, using data derived from the AGRIS database. The dataset underwent preprocessing to remove noise and redundancy, and was represented using techniques such as k-mer encoding and One-Hot Encoding.

The model was built upon a Convolutional Neural Network (CNN) integrated with an Attention Mechanism to improve prediction accuracy by enabling the model to focus on biologically significant regions. The model architecture comprised two main components: one for predicting binding sites, and the other for identifying the corresponding transcription factors. Through extensive experimentation and optimization, ten representative binding sites were selected from a total of 471 unique sites for training and evaluation, ensuring a balance between diversity and generalization capability.

تاسماء اعضاء لجنة المناقشةاللقب العلميالاختصاص الدقيقمكان العملالمنصب
1د. توفيق عبد الخالق عباساستاذمعالجة صورجامعة بابل / كلية تكنولوجيا المعلوماترئيساً
2د. عادل عبدالوهاب غيداناستاذ مساعدمعلوماتية الحيويةجامعة ديالى / كلية العلومعضوا
3د. مهند محمد جاسممدرسذكاء اصطناعي و معلوماتية الحيويةجامعة بابل / كلية تكنولوجيا المعلوماتعضوا
4د. سرى زكي ناجياستاذ مساعدذكاء اصطناعي و معلوماتية الحيويةجامعة بابل / كلية تكنولوجيا المعلوماتعضوا و مشرفا

photo:

Scientific branch news
Events