Discussion of a PhD Dissertation at the College of Information Technology, University of Babylon on Improving Hand Gesture Recognition Using Graph Neural Networks

By : Duhaa Fadill Abbas
Date : 09/3/2026
Views : 145

Discussion of a PhD Dissertation at the College of Information Technology, University of Babylon on Improving Hand Gesture Recognition Using Graph Neural Networks

Duhaa Fadill Abbas
In a scholarly atmosphere that reflects the growing research momentum in the fields of artificial intelligence and advanced computational technologies, the College of Information Technology at the University of Babylon witnessed the discussion of a distinguished PhD dissertation in the Department of Software entitled “Improving Hand Gesture Recognition in Sign Language Using Graph Neural Networks.” The dissertation was presented by the PhD candidate Ishraq Abdulameer Yahya, under the supervision of Dr. Mahdi Abadi Mani and Dr. Nidaa Abdulmohsen Abbas. The discussion took place at 9:00 a.m. on Monday, February 9, 2026, in the conference hall of the College, in the presence of a number of faculty members, researchers, and postgraduate students.

The dissertation addresses the development of an advanced methodology for recognizing static hand gestures within a multilingual context, with a particular focus on American Sign Language (ASL) and Arabic Sign Language (ArSL). Despite the remarkable progress achieved by deep learning techniques in computer vision tasks, the classification of static gestures still faces significant challenges due to the high visual similarity between gesture categories, variations in hand poses, and differences in image acquisition conditions such as lighting and background. Additionally, linguistic ambiguities arise because some gestures across different sign languages may share visually similar configurations.

To tackle these challenges, the research adopts a dual-strategy framework that combines the construction of a robust baseline model based on convolutional neural networks with the proposal of an advanced graph-based learning framework designed to enhance discriminative capabilities. In the first stage, a high-performance baseline was established through an ensemble of heterogeneous convolutional neural network models using an average voting mechanism. This ensemble integrates a custom-designed CNN with transfer learning models such as VGG16, VGG19, and MobileNet. The purpose of this design is to reduce performance variance and enhance stability compared with individual models, thereby providing a rigorous benchmark against which subsequent improvements can be reliably evaluated.

The second stage of the research introduces a proposed framework referred to as a Hybrid Graph Deep-Learning Model, which utilizes adaptive graph convolutional networks to construct a feature-based graph representation without relying on anatomical landmarks or keypoint detectors. In this design, nodes are formed from multi-domain descriptors, while edges are established based on similarity relationships by connecting nearest neighbors through the K-Nearest Neighbors (KNN) algorithm. This relational structure allows the model to exploit structural dependencies among features in order to improve classification accuracy.

Furthermore, the proposed framework integrates multiple complementary representation channels, including coefficients extracted through Principal Component Analysis (PCA) to capture discriminative variance, frequency descriptors derived from the Fast Fourier Transform (FFT), and texture-based descriptors such as Tamura features, in addition to embeddings extracted from convolutional neural networks. This multi-perspective representation enriches the feature space and enhances the system’s ability to distinguish between gestures that differ only by subtle structural or textural variations.
Subsequently, graph convolution operations are employed to propagate information across connected nodes, thereby strengthening class separation. To further improve prediction consistency and decision boundary stability, sequential optimization units such as GRU and LSTM are incorporated into the framework. These components contribute to refining the final predictions and enhancing the robustness of the overall recognition system.

This dissertation represents part of the continuous research efforts undertaken by the College of Information Technology at the University of Babylon to advance studies in artificial intelligence and computer vision. It also reflects the College’s commitment to leveraging modern technologies to develop innovative solutions that contribute to societal needs, particularly in facilitating communication and interaction with individuals with hearing impairments.

تاسماء اعضاء لجنة المناقشةاللقب العلميالاختصاص الدقيقمكان العملالمنصب
1د. ايمان صالح صكبانأستاذذكاء اصطناعي ومعلومات حيويةجامعة بابل / كلية تكنولوجيا المعلوماترئيساً
2د. لؤي إدوار جورجأستاذمعالجة صور رقميةجامعة بغداد / كلية العلومعضواً
3د. هضاب خالد عبيسأستاذذكاء اصطناعيجامعة بابل / كلية تكنولوجيا المعلوماتعضواً
4د. وائل جبار عبدأستاذتكنولوجيا المعلومات والاتصالاتجامعة القاسم الخضراء / مركز الحاسبة الالكترونيةعضواً
5د. صفا سعد عباسأستاذ مساعدوسائط متعددة وأمنية معلوماتجامعة بابل / كلية تكنولوجيا المعلوماتعضواً
6د. نداء عبدالمحسن عباسأستاذذكاء اصطناعيجامعة بابل / كلية تكنولوجيا المعلوماتعضواً ومشرفاً أولاً
7د. مهدي عبادي مانعأستاذأمنية شبكات وتنقيب بياناتجامعة بابل / كلية تكنولوجيا المعلومات - جامعة المستقبل الاهليةعضواً ومشرفاً ثانياً

photo:

Scientific branch news
Events