PhD Dissertation at the College of Information Technology Discusses the Optimal K Value Estimation for the K-Means Algorithm

By : Duhaa Fadill Abbas
Date : 14/3/2025
Views : 158

PhD Dissertation at the College of Information Technology Discusses the Optimal K Value Estimation for the K-Means Algorithm in Data Stream Clustering

Duhaa Fadill Abbas
The Software Department at the College of Information Technology, University of Babylon, held a PhD dissertation defense titled "Optimal K Value Estimation for the K-Means Algorithm in Data Stream Clustering." The dissertation was presented by researcher Abeer Mahmoud Hassan Ahmed under the supervision of Dr. Saad Talib Hassoun on Thursday, March 13, 2025, in the College Conference Hall.

The dissertation addresses a fundamental challenge in data stream clustering, focusing on enhancing the K-Means algorithm, which is widely used due to its simplicity and efficiency. The research proposes novel models to determine the optimal K value for clustering continuous data streams, which are characterized by high velocity, large volume, heterogeneity, and real-time generation from multiple sources.

Proposed Models and Methodologies

The study introduces advanced clustering techniques, including:

- Adaptive Dynamic Diameter and Boundary Threshold (ADDBT), which utilizes a Probability Density Function (PDF) instead of the conventional Gaussian distribution.
- Prototype Multi-Channel (P-M-C) Model, which estimates the optimal K value based on prototype selection and frequency distribution.
- Experimental Evaluation and Real-World Data Collection

The proposed models were evaluated using eight streaming datasets, including four benchmark datasets:


- Iris Dataset
- Household Electricity Consumption Stream
- Global Traffic Signal Data Stream
- KDD Cup Data Stream

Additionally, real-time sensor-based data were collected and analyzed, including:

- Weather data recorded for 10 months in Hilla, Iraq.
- Human physiological data collected using wearable sensors over four months.
- Child health data monitored over four months across six children.
- Medical patient data gathered from 150 individuals in a pharmacy setting.
- Key Findings and Contributions

The results demonstrate significant improvements in clustering quality compared to traditional algorithms such as CMFT Kernel-K-Means, CluStream, and D-Stream. The Silhouette Score metric indicated an average enhancement of 10%, with performance gains ranging from 45% to 200% when compared to KM-STREEM++, IAPKM, and other baseline approaches.

This research presents a major advancement in real-time data clustering, offering robust, scalable, and adaptive techniques that enhance clustering accuracy for dynamic, high-velocity data streams. The findings contribute to various fields, including Artificial Intelligence, Internet of Things (IoT), medical data analysis, and environmental monitoring, providing a foundation for intelligent data-driven decision-making systems.

تاسماء اعضاء لجنة المناقشةاللقب العلميالاختصاص الدقيقمكان العملالمنصب
1د. نضال خضير العبادياستاذمعالجة صورجامعة المستقبلرئيساً
2د. رفاه محمد كاظماستاذحوسبة ضبابيةجامعة بابل / قسم الدراسات و التخطيطعضوا
3د. عماد عيسى عبدالكريماستاذذكاء اصطناعيالجامعة المستنصرية / كلية التربيةعضوا
4د. احمد حبيب سعيداستاذ مساعدانظمة معلومات و تقنيات ويبجامعة بابل / كلية تكنولوجيا المعلوماتعضوا
5د.صفا سعد عباساستاذ مساعدوسائط متعددة و امنية المعلوماتجامعة بابل / كلية تكنولوجيا المعلوماتعضوا
6د. سعد طالب حسوناستاذمحاكاة الحاسوبجامعة بابل / كلية تكنولوجيا المعلوماتعضوا و مشرفا

photo:

Scientific branch news
Events